CN113361595A

CN113361595A - Sample matching degree calculation optimization method, device, medium and computer program product

Info

Publication number: CN113361595A
Application number: CN202110621677.7A
Authority: CN
Inventors: 吴玙; 范涛; 马国强; 魏文斌; 谭明超; 陈天健; 杨强
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2021-06-03
Filing date: 2021-06-03
Publication date: 2021-09-07

Abstract

The application discloses a sample matching degree calculation optimization method, which comprises the following steps: acquiring each first local sample ID, and mapping a first hash value corresponding to each first local sample ID to a preset value interval to obtain each first hash mapping value; determining global sampling ordering values corresponding to the second devices based on a first sampling number corresponding to a first sampling Hash mapping value set selected from the first Hash mapping values and a second sampling number corresponding to a second sampling Hash mapping value set sent by the second devices, and selecting global sampling Hash mapping values corresponding to the global sampling ordering values from the first sampling Hash mapping value set and the second sampling Hash mapping value set; and respectively calculating the sample matching degree between the first equipment and each second equipment based on each global sampling sorting value and each global sampling Hash mapping value. The method and the device solve the technical problem of low efficiency of calculating the sample matching degree in federal learning.

Description

Sample matching degree calculation optimization method, device, medium and computer program product

Technical Field

The present application relates to the field of artificial intelligence in financial technology (Fintech), and in particular, to a method, device, medium, and computer program product for computing and optimizing a sample matching degree.

Background

With the continuous development of financial science and technology, especially internet science and technology, more and more technologies (such as distributed technology, artificial intelligence and the like) are applied to the financial field, but the financial industry also puts higher requirements on the technologies, for example, higher requirements on the distribution of backlog in the financial industry are also put forward.

With the continuous development of computer software, artificial intelligence and big data cloud service application, the application of the federal learning technology is more and more extensive, and in the federal learning, the sample matching degree between each participator in the federal learning is generally required to be determined, at present, the sample matching degree between each participator is generally estimated through a filter, such as a bloom filter, but generally, the sample IDs of each participator need to be compared one by one when the filter estimates the sample matching degree between each participator, and when the number of samples of each participator is large, the calculation amount and the calculation complexity when the sample IDs of each participator are compared one by one are extremely high, thereby causing the extremely low efficiency of calculating the sample matching degree in the federal learning.

Disclosure of Invention

The present application mainly aims to provide a method, an apparatus, a medium, and a computer program product for calculating and optimizing a sample matching degree, and aims to solve the technical problem of low efficiency of calculating a sample matching degree in federal learning in the prior art.

In order to achieve the above object, the present application provides a method for computing and optimizing a sample matching degree, where the method is applied to a first device, and the method includes:

acquiring each first local sample ID, and mapping a first hash value corresponding to each first local sample ID to a preset value interval to obtain a first hash mapping value corresponding to each first local sample ID;

selecting a first sampling Hash mapping value set from each first Hash mapping value, and receiving a second sampling Hash mapping value set sent by each second device, wherein the second sampling Hash mapping value set is selected from second Hash mapping values corresponding to each second local sample ID by the second device;

determining a global sampling ordering value corresponding to each second device based on a first sampling number corresponding to the first sampling Hash mapping value set and a second sampling number corresponding to each second sampling Hash mapping value set, and selecting a global sampling Hash mapping value corresponding to each global sampling ordering value from the first sampling Hash mapping value set and each second sampling Hash mapping value set;

and respectively calculating the sample matching degree between the first equipment and each second equipment based on each global sampling sorting value and each global sampling Hash mapping value.

The application provides a sample matching degree calculation optimization method, which is applied to second equipment and comprises the following steps:

acquiring second local sample IDs (identities), and mapping second hash values corresponding to the second local sample IDs to a preset value interval to obtain second hash mapping values corresponding to the second local sample IDs;

and sending the selected second sampling Hash mapping value set in each second Hash mapping value to first equipment, so that the first equipment can respectively calculate the sample matching degree between the first equipment and each second equipment based on the second sampling Hash mapping value set sent by each second equipment and the first sampling Hash mapping value set generated based on each first local sample ID.

The present application further provides a sample matching degree calculation optimization device, the sample matching degree calculation optimization device is a virtual device, and the sample matching degree calculation optimization device is applied to the first device, the sample matching degree calculation optimization device includes:

the mapping module is used for acquiring each first local sample ID, mapping a first hash value corresponding to each first local sample ID to a preset value interval, and acquiring a first hash mapping value corresponding to each first local sample ID;

a receiving module, configured to select a first sampling hash mapping value set from each first hash mapping value, and receive a second sampling hash mapping value set sent by each second device, where the second sampling hash mapping value set is selected by the second device from second hash mapping values corresponding to each second local sample ID;

a selecting module, configured to determine a global sampling ordering value corresponding to each second device based on a first number of samples corresponding to the first sampling hash mapping value set and a second number of samples corresponding to each second sampling hash mapping value set, and select a global sampling hash mapping value corresponding to each global sampling ordering value from the first sampling hash mapping value set and each second sampling hash mapping value set;

and the calculating module is used for respectively calculating the sample matching degree between the first equipment and each second equipment based on each global sampling sorting value and each global sampling Hash mapping value.

The present application further provides a sample matching degree calculation optimization device, the sample matching degree calculation optimization device is a virtual device, and the sample matching degree calculation optimization device is applied to the second device, the sample matching degree calculation optimization device includes:

the mapping module is used for acquiring second local sample IDs (identities), mapping second hash values corresponding to the second local sample IDs to a preset value interval, and acquiring second hash mapping values corresponding to the second local sample IDs;

and the sending module is used for sending the selected second sampling Hash mapping value set in each second Hash mapping value to first equipment so that the first equipment can respectively calculate the sample matching degree between the first equipment and each second equipment based on the second sampling Hash mapping value set sent by each second equipment and the first sampling Hash mapping value set generated based on each first local sample ID.

The present application further provides a sample matching degree calculation optimization device, where the sample matching degree calculation optimization device is an entity device, and the sample matching degree calculation optimization device includes: a memory, a processor, and a program of the sample matching degree calculation optimization method stored on the memory and executable on the processor, which when executed by the processor, can implement the steps of the sample matching degree calculation optimization method as described above.

The present application also provides a medium which is a readable storage medium having stored thereon a program for implementing the sample matching degree calculation optimization method, the program implementing the sample matching degree calculation optimization method as described above when executed by a processor.

The present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method for computational optimization of sample matching as described above.

The application provides a method, a device, a medium and a computer program product for optimizing the calculation of sample matching degree, compared with the technical means of estimating the sample matching degree between participants through a filter in the prior art, the application firstly obtains each first local sample ID, maps the first hash value corresponding to each first local sample ID to a preset value-taking interval, obtains the first hash mapping value corresponding to each first local sample ID, further selects the first sampling hash mapping value set from each first hash mapping value, and receives the second sampling hash mapping value set sent by each second device, wherein the second sampling hash mapping value set is selected by the second device from the second hash mapping values corresponding to each second local sample ID, further based on the first sampling number corresponding to the first sampling hash mapping value set and the second sampling number corresponding to each second sampling hash mapping value set, determining global sampling ordering values corresponding to the second devices, selecting global sampling hash mapping values corresponding to the global sampling ordering values from the first sampling hash mapping value set and the second sampling hash mapping value set, and further respectively estimating the sample matching degree between the first device and the second devices based on the global sampling ordering values and the global sampling hash mapping values, so that the purpose of calculating the sample matching degree between the first device and the second devices based on partial hash values selected from the first device and the second devices is achieved, the sample matching degree does not need to be calculated by comparing the sample IDs of the first device and the second devices one by one, the calculation amount and the calculation complexity for calculating the sample matching degree in federal learning are reduced, and therefore, the problem that when the number of samples of each party is large is overcome, the calculation amount and the calculation complexity when the sample IDs of all the participants are compared one by one are extremely high, so that the technical defect that the efficiency of calculating the sample matching degree in the federal learning is extremely low is caused, and the efficiency of calculating the sample matching degree in the federal learning is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a schematic flow chart of a first embodiment of a sample matching degree calculation optimization method according to the present application;

fig. 2 is a schematic flowchart illustrating interactive calculation of intersection sample numbers by a first device and a second device in the sample matching degree calculation optimization method according to the present application;

FIG. 3 is a flowchart illustrating a second embodiment of the sample matching degree calculation optimization method according to the present application;

FIG. 4 is a flowchart illustrating a third embodiment of the sample matching degree calculation optimization method according to the present application;

fig. 5 is a schematic device structure diagram of a hardware operating environment related to a sample matching degree calculation optimization method in an embodiment of the present application;

fig. 6 is a hardware architecture diagram of federal learning according to an embodiment of the present application.

The objectives, features, and advantages of the present application will be further described with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In a first embodiment of the sample matching degree calculation optimization method of the present application, referring to fig. 1, the sample matching degree calculation optimization method is applied to a first device, and the sample matching degree calculation optimization method includes:

step S10, acquiring each first local sample ID, and mapping a first hash value corresponding to each first local sample ID to a preset value-taking interval to acquire a first hash mapping value corresponding to each first local sample ID;

in this embodiment, it should be noted that the sample matching degree calculation optimization method is applied to federal learning, the first device initiates a participant for a task with a federal server function in federal learning, wherein the federal server function is a function of aggregating data sent by all participants in federal learning, the first local sample ID is an identification of a sample in the first device, such as an identity card number, a mobile phone number, and the like, and the preset value interval is a preset specific value interval, preferably, the preset value interval may be set to be 0 to 1.

The method includes the steps of obtaining first local sample IDs, mapping first hash values corresponding to the first local sample IDs to a preset value-taking interval, obtaining first hash mapping values corresponding to the first local sample IDs, specifically, obtaining the first local sample IDs, conducting hash processing on the first local sample IDs respectively, obtaining first hash values corresponding to the first local sample IDs, mapping the first hash values to the preset value-taking interval based on the size of the first hash values, and obtaining the first hash mapping values corresponding to the first local sample IDs.

The step of mapping the first hash value corresponding to each first local sample ID to a preset value-taking interval to obtain the first hash mapping value corresponding to each first local sample ID includes:

step S11, performing hash processing on each first local sample ID, to obtain each first hash value;

in this embodiment, based on a preset first type of hash function, the hash processing is performed on each first local sample ID, so as to obtain each first hash value, where the preset first type of hash function includes, but is not limited to, hash functions such as sha256 and secret SM 3.

Step S12, mapping each first hash value to a floating point number in the preset value interval, to obtain each first hash mapping value.

In this embodiment, each of the first hash values is mapped to a floating point number in the preset value interval to obtain each of the first hash mapping values, and specifically, each of the first hash values is respectively complemented by a preset complementation function, and is mapped to a floating point number in the preset value interval to obtain each of the first hash mapping values.

In another practical manner, step S12 further includes:

performing secondary hashing on the first hash value through a preset second type hash function to obtain a secondary hash value, and intercepting a numerical value on a bit of a preset digit in the secondary hash value as a floating point number in the preset value interval, wherein the result digit of the preset second type hash function is smaller than the result digit of the preset first type hash function, for example, the preset second type hash function may be set as a hash function such as md 5.

Step S20, selecting a first sampling hash mapping value set from each first hash mapping value, and receiving a second sampling hash mapping value set sent by each second device, where the second sampling hash mapping value set is selected from second hash mapping values corresponding to each second local sample ID by the second device;

in this embodiment, it should be noted that the second device is a task cooperative participant for federal learning, the task cooperative participant needs to send data of its own party to a task initiating participant for aggregation, and the number of the second devices is at least 1.

Selecting a first sampling Hash mapping value set from each first Hash mapping value, and receiving a second sampling Hash mapping value set sent by each second device, wherein the second sampling Hash mapping value set is selected from second Hash mapping values corresponding to each second local sample ID by the second device, specifically, each Hash mapping value with the smallest size order of the sampling first quantity in each first Hash mapping value is used as the first sampling Hash mapping value set together, and the second sampling Hash mapping value set sent by each second device is received, wherein the second device obtains each second local sample ID, and performs Hash processing on each second local sample ID respectively to obtain each second Hash value, and then the second device maps each second Hash value to a preset value interval to obtain a second Hash mapping value corresponding to each second Hash value, and the second device takes the hash mapping values with the smallest size order of the second sampling number in the second hash mapping values as the second sampling hash mapping value set together.

Wherein the step of selecting a first set of sample hash map values among each of the first hash map values comprises:

step S21, sorting each first Hash mapping value to obtain a local sorting result;

in this embodiment, the first hash mapping values are sorted to obtain a local sorting result, and specifically, based on the size of each first hash mapping value, the first hash mapping values are sorted from large to small to obtain the local sorting result.

Step S22, based on the local sorting result, selecting a hash mapping value of a first number of samples sorted later from each of the first hash mapping values as the first sample hash mapping value set.

In this embodiment, based on the local sorting result, a hash mapping value of a first number of samples after sorting is selected from each first hash mapping value as the first sample hash mapping value set, specifically, based on the local sorting result, a hash mapping value of a first number of samples after sorting is selected from each first hash mapping value, each first sample hash mapping value is obtained, and each first sample hash mapping value is collectively used as the first sample hash mapping value set.

Step S30, determining a global sampling ordering value corresponding to each second device based on a first number of samples corresponding to the first sampling hash mapping value set and a second number of samples corresponding to each second sampling hash mapping value set, and selecting a global sampling hash mapping value corresponding to each global sampling ordering value from the first sampling hash mapping value set and each second sampling hash mapping value set;

in this embodiment, it should be noted that the global sampling ordering value is a global size ordering ranking of the global sampling hash mapping value corresponding to the global sampling ordering value in the first sampling hash mapping value set and the second sampling hash mapping value set corresponding to the global sampling ordering value.

Determining a global sampling ordering value corresponding to each second device based on a first sampling number corresponding to the first sampling hash mapping value set and a second sampling number corresponding to each second sampling hash mapping value set, and selecting a global sampling hash mapping value corresponding to each global sampling ordering value from the first sampling hash mapping value set and each second sampling hash mapping value set, specifically, for each second hash mapping value set sent by each second device, the following steps are executed:

averaging a first sampling number corresponding to the first sampling HashMap value set and a second sampling number corresponding to the second sampling HashMap value set to obtain a global sampling ordering value corresponding to the second device, further merging the first HashMap value set and the second HashMap value set to obtain a global HashMap value set, and further selecting a HashMap value with a size ordering name of the global sampling ordering value from the global HashMap value set as the global sampling HashMap value.

Additionally, in another implementable manner, the determining, based on a first number of samples corresponding to the first set of sample hash mapping values and a second number of samples corresponding to each of the second set of sample hash mapping values, a global sample ordering value corresponding to each of the second devices further includes:

and taking the numerical value with smaller numerical value in the first sampling quantity corresponding to the first sampling Hash mapping value set and the second sampling quantity corresponding to each second sampling Hash mapping value set as the global sampling sorting value.

Wherein the step of selecting a global sample hash mapping value corresponding to each global sample ordering value from the first sample hash mapping value set and each second sample hash mapping value set comprises:

step S31, aggregating the first sample hash mapping value set with each second sample hash mapping value set, respectively, to obtain each aggregation result;

in this embodiment, the first sample hash mapping value set is aggregated with each of the second sample hash mapping value sets, so that the first sample hash mapping value set is merged with each of the second sample hash mapping value sets, and each aggregation result is obtained.

Step S32, selecting hash mapping values with sorting names of the big and small sorts as the global sampling sorting values from the aggregation results as the global sampling hash mapping values.

In this embodiment, the hash mapping values with the ranking names of the order of size and the rank of order of the global sampling order values are respectively selected from the aggregation results as the global sampling hash mapping values, and specifically, the hash mapping values with the ranking names of the order of size and the rank of the global sampling order values are respectively selected from the aggregation results as the global sampling hash mapping values corresponding to the second devices by ranking the hash mapping values in each aggregation result.

Step S40, respectively calculating a sample matching degree between the first device and each of the second devices based on each of the global sample ordering values and each of the global sample hash mapping values.

In this embodiment, a sample matching degree between the first device and each of the second devices is respectively calculated based on each of the global sampling ordering values and each of the global sampling hashmap values, specifically, an intersection sample number between the first device and each of the second devices is respectively estimated based on the global sampling ordering value corresponding to each of the second devices and the corresponding global sampling hashmap value, and then calculating the ratio of each intersection sample number in the total number of corresponding samples respectively to obtain the sample matching degree between the first device and each second device, the sample matching degree is a ratio of samples overlapped between the first device and the second device, and the total number of the samples may be set as the number of each first local sample ID, or may be set as the sum of the number of each first local sample ID and the number of each second local sample ID in the corresponding second device.

Additionally, it should be noted that, in the current manner of estimating the sample matching degree between the participants through a filter, a malicious participant can easily deduce the local sample IDs of other participants through rainbow attack, and the security is low, and since the data exchanged between the first device and each second device is only the mapping value of the hash value of the sample ID in the preset value interval, rather than the hash value itself, the malicious participant cannot reversely deduce the local sample ID of the participant through rainbow attack, the security of calculating the sample matching degree in federal learning is improved, and the security in federal learning modeling and sample alignment is improved.

Furthermore, based on the sample matching degree corresponding to each second device, a federal learning modeling device is selected from each second device, and then the first device and each federal learning modeling device perform federal learning modeling to obtain a federal learning model, so that the purpose of selecting a device which is more matched with the first device to perform federal learning on the basis of rapidly calculating the sample matching degree is achieved, and the effect and the efficiency of the federal learning modeling are improved.

Furthermore, when the federal learning model is a longitudinal federal learning model, longitudinal federal learning equipment can be quickly selected from the second equipment based on the quickly calculated sample matching degree between the first equipment and the second equipment, and then the first equipment only needs to combine the longitudinal federal learning models to carry out sample prediction without combining the second equipment eliminated based on the sample matching degree to carry out sample prediction, so that the data interaction process and the data calculation process between the first equipment and the second equipment eliminated based on the sample matching degree are reduced, the purpose of quickly combining the longitudinal federal learning equipment to carry out sample prediction is realized on the basis of quickly calculating the sample matching degree, the efficiency of sample prediction in longitudinal federal learning is improved, wherein the longitudinal federal learning model can be a bank wind control model, and the efficiency of user loan risk prediction can be improved, the longitudinal federated learning model can also be a message recommendation model, so that the message recommendation efficiency can be improved.

Wherein the step of calculating the sample matching degree between the first device and each of the second devices based on each of the global sample ordering values and each of the global sample hash mapping values, respectively, includes:

step S41, respectively estimating the number of union sample between the first device and each of the second devices based on each of the global sample ordering values and each of the global sample hash mapping values;

in this embodiment, based on each of the global sampling ordering values and each of the global sampling hashmap values, the number of union set samples between the first device and each of the second devices is estimated, specifically, for each of the global sampling ordering values and the corresponding global sampling hashmap values corresponding to the second devices, the following steps are performed:

calculating a difference value between the global sampling ordering value and 1, and calculating a ratio between the difference value and the global sampling hash mapping value to obtain a union set sample number corresponding to the second device, wherein a formula for estimating the union set sample number is as follows:

wherein, | M_uI is the number of samples of the union, k_uFor the global sampling ordering value, K_uMax (K) is the polymerization result_u) Hash-map values for the global samples.

Step S42, respectively calculating the number of intersection samples between the first device and each second device based on the number of first sample sets corresponding to each first local sample ID, the number of second sample sets corresponding to each second local sample ID sent by each second device, and the number of union sample;

in this embodiment, based on the number of first sample sets corresponding to each first local sample ID, the number of second sample sets corresponding to each second local sample ID sent by each second device, and the number of union sample, the number of intersection sample between the first device and each second device is respectively calculated, specifically, the number of first sample sets corresponding to each first local sample ID is obtained, and the number of second sample sets corresponding to each second local sample ID sent by each second device is received, so that the following steps are performed for the number of second sample sets corresponding to each second local sample ID sent by each second device:

calculating a difference between a sum of the first sample set quantity and the second sample set quantity and the union sample quantity to obtain an intersection sample quantity corresponding to the second device, as shown in fig. 2, which is a schematic flow chart of interactively calculating the intersection sample quantity by the first device and the second device, where Guest is the first device, Host is the second device, h (id) is the first sampling hash mapping value set and the second sampling hash mapping value set, and global K is the global sampling ordering value and the global sampling hash mapping value.

Step S43, calculating a sample matching degree between the first device and each of the second devices based on each of the intersection sample numbers.

In this embodiment, based on each of the intersection sample numbers, a sample matching degree between the first device and each of the second devices is calculated, specifically, a ratio between each of the intersection sample numbers and a total number of corresponding samples is calculated, and the sample matching degree between the first device and each of the second devices is obtained, for example, assuming that the intersection sample number is 100, the number of each of the first local sample IDs is 1000, and further, the sample matching degree is 10%.

After the step of calculating the sample matching degree between the first device and each of the second devices based on each of the global sample ordering values and each of the global sample hash mapping values, the sample matching degree calculation optimization method further includes:

step A10, based on each sample matching degree, eliminating low matching degree devices with sample matching degrees lower than a preset matching degree threshold value from each second device to obtain each high matching degree device; performing longitudinal federal learning modeling with each high-matching-degree device to obtain a longitudinal federal learning model;

in this embodiment, it should be noted that, when the longitudinal federal learning modeling is performed, the higher the sample matching degree between the participants of the longitudinal federal learning modeling is, the better the effect of the longitudinal federal learning modeling is, the low-matching-degree device is the second device whose sample matching degree is lower than the preset matching-degree threshold, and the high-matching-degree device is the second device whose sample matching degree is not lower than the preset matching-degree threshold.

Step B10, based on each sample matching degree, eliminating high matching degree equipment with the sample matching degree not lower than a preset matching degree threshold value from each second equipment to obtain each low matching degree equipment; and performing transverse federal learning modeling on the low-matching-degree devices to obtain a transverse federal learning model.

In this embodiment, it should be noted that, when modeling for horizontal federal learning, the lower the sample matching degree between each participant of the horizontal federal learning modeling is, the better the effect of the vertical federal learning modeling is, and this embodiment of the present application achieves the purpose of eliminating the participants with high sample matching degree before modeling for horizontal federal learning, so that the participants participating in horizontal federal learning are all the participants with low sample matching degree, and further the horizontal federal learning model can converge faster, and the processes of the participants with high sample matching degree and horizontal federal learning modeling are reduced, so the efficiency of the vertical federal learning modeling is improved.

Compared with the technical means of estimating the sample matching degree between participants through a filter in the prior art, the embodiment of the present application firstly obtains each first local sample ID, maps a first hash value corresponding to each first local sample ID to a preset value-taking interval, obtains a first hash mapping value corresponding to each first local sample ID, further selects a first sampling hash mapping value set from each first hash mapping value, and receives a second sampling hash mapping value set sent by each second device, wherein the second sampling hash mapping value set is selected by the second device from second hash mapping values corresponding to each second local sample ID, and further is based on a first sampling number corresponding to the first sampling hash mapping value set and a second sampling number corresponding to each second sampling hash mapping value set, determining global sampling ordering values corresponding to the second devices, selecting global sampling hash mapping values corresponding to the global sampling ordering values from the first sampling hash mapping value set and the second sampling hash mapping value set, and further respectively estimating the sample matching degree between the first device and the second devices based on the global sampling ordering values and the global sampling hash mapping values, so that the purpose of calculating the sample matching degree between the first device and the second devices based on partial hash values selected from the first device and the second devices is achieved, the sample matching degree does not need to be calculated by comparing the sample IDs of the first device and the second devices one by one, the calculation amount and the calculation complexity for calculating the sample matching degree in federal learning are reduced, and therefore, the problem that when the number of samples of each party is large is overcome, the calculation amount and the calculation complexity when the sample IDs of all the participants are compared one by one are extremely high, so that the technical defect that the efficiency of calculating the sample matching degree in the federal learning is extremely low is caused, and the efficiency of calculating the sample matching degree in the federal learning is improved.

Further, referring to fig. 3, in another embodiment of the present application, after the step of calculating a sample matching degree between the first device and each of the second devices based on each of the global sample ordering values and each of the global sample hashmap values, respectively, according to the first embodiment of the present application, the sample matching degree calculation optimization method further includes:

step S50, based on each sample matching degree, eliminating low matching degree devices with sample matching degrees lower than a preset matching degree threshold value from each second device to obtain each high matching degree device;

in this embodiment, it should be noted that, before the longitudinal federal learning is performed, sample alignment is generally performed to determine a common sample between the participants of the longitudinal federal learning, and at present, when a sample matching degree is calculated by comparing sample IDs between the participants one by one through a filter, a common sample ID between the participants is determined at the same time, and since a calculation amount and a calculation complexity when comparing sample IDs of the participants one by one are extremely high, efficiency of sample alignment is low.

And based on each sample matching degree, removing low matching degree equipment with the sample matching degree lower than a preset matching degree threshold value from each second equipment to obtain each high matching degree equipment, specifically, selecting each to-be-removed sample matching degree lower than the preset matching degree threshold value from each sample matching degree, further removing low matching degree equipment corresponding to each to-be-removed sample matching degree from each second equipment to obtain each high matching degree equipment.

Step S60, performing sample alignment with each high-matching-degree device, and obtaining a sample alignment result.

In this embodiment, sample alignment is performed on each high-matching-degree device to obtain a sample alignment result, specifically, sample alignment is performed on each high-matching-degree device to obtain each common sample ID between the first device and each high-matching-degree device, and each common sample ID is used as the sample alignment result.

The embodiment of the present application provides a sample alignment method, that is, after the sample matching degrees of a first device and each second device are obtained through calculation, based on each sample matching degree, a low-matching-degree device with a sample matching degree lower than a preset matching degree threshold is removed from each second device, each high-matching-degree device is obtained, and then sample alignment is performed with each high-matching-degree device, so as to obtain a sample alignment result, thereby achieving the purpose that the first device selectively performs sample alignment with each second device, avoiding performing sample alignment with the low-matching-degree device in each second device, and since sample alignment is performed by comparing sample IDs between participants one by one through filters, the calculation complexity and the calculation amount of the sample alignment are far greater than those of the sample matching degree in the embodiment of the present application, so the embodiment of the present application replaces entering of the low-matching-degree device in each second device with the calculation process of calculating the sample matching degree The calculation process of the line sample alignment reduces the calculation amount in the sample alignment process, so the efficiency of the sample alignment is improved.

Further, referring to fig. 4, the sample matching degree calculation optimization method is applied to the second device, and the sample matching degree calculation optimization method further includes:

step C10, acquiring each second local sample ID, and mapping a second hash value corresponding to each second local sample ID to a preset value-taking interval to acquire a second hash mapping value corresponding to each second local sample ID;

in this embodiment, it should be noted that the sample matching degree calculation optimization method is applied to federal learning, the first device is a task cooperative participant in federal learning, the task cooperative participant needs to send data of the own party to a task initiating participant for aggregation, the number of the second devices is at least 1, the second local sample ID is an identification of a sample in the second device, such as an identity card number, a mobile phone number, and the like, the preset value interval is a preset specific value interval, and preferably, the preset value interval can be set to be 0 to 1.

Acquiring each second local sample ID, mapping a second hash value corresponding to each second local sample ID to a preset value interval, and acquiring a second hash mapping value corresponding to each second local sample ID, specifically, acquiring each second local sample ID, performing hash processing on each second local sample ID respectively, and acquiring a second hash value corresponding to each second local sample ID, and further mapping each second hash value to a preset value interval based on the size of each second hash value, and acquiring a second hash mapping value corresponding to each second local sample ID.

The step of mapping the second hash value corresponding to each second local sample ID to a preset value-taking interval to obtain the second hash mapping value corresponding to each second local sample ID includes:

step C11, performing hash processing on each second local sample ID, to obtain each second hash value;

in this embodiment, based on a preset first type of hash function, the hash processing is performed on each second local sample ID, so as to obtain each second hash value, where the preset first type of hash function includes, but is not limited to, hash functions such as sha256 and secret SM 3.

And step C12, mapping each second hash value to a floating point number in the preset value interval, to obtain each second hash mapping value.

In this embodiment, each of the second hash values is mapped to a floating point number in the preset value interval to obtain each of the second hash mapping values, and specifically, each of the second hash values is respectively complemented by a preset complementation function, and is mapped to a floating point number in the preset value interval to obtain each of the second hash mapping values.

In another practical manner, step C12 further includes:

performing secondary hashing on the second hash value through a preset second-class hash function to obtain a local secondary hash value, and intercepting a numerical value on a bit of a preset digit in the local secondary hash value as a floating point number in the preset value interval, wherein the result digit of the preset second-class hash function is smaller than the result digit of the preset first-class hash function, for example, the preset second-class hash function can be set as a hash function such as md 5.

Step C20, sending the second sampling hash mapping value set selected from the second hash mapping values to a first device, so that the first device calculates the sample matching degree between the first device and each second device based on the second sampling hash mapping value set sent by each second device and the first sampling hash mapping value set generated based on each first local sample ID.

In this embodiment, a second sampling hash mapping value set selected from each second hash mapping value is sent to a first device, so that the first device calculates a sample matching degree between the first device and each second device based on the second sampling hash mapping value set sent by each second device and a first sampling hash mapping value set generated based on each first local sample ID, specifically, each hash mapping value with the smallest second sampling number in each second hash mapping value is used as a second sampling hash mapping value set together, and the second sampling hash mapping value set is sent to the first device, so that the first device calculates a second sample hash mapping value set based on a first sampling number corresponding to the first sampling hash mapping value set and a second sampling number corresponding to each second sampling hash mapping value set, determining global sampling ordering values corresponding to the second devices, and selecting global sampling hash mapping values corresponding to the global sampling ordering values from the first sampling hash mapping value set and the second sampling hash mapping value set, and then the first device calculates sample matching degrees between the first device and the second devices respectively based on the global sampling ordering values and the global sampling hash mapping values, wherein the process of calculating the sample matching degrees between the first device and the second devices based on the first sampling hash mapping value set and the second sampling hash mapping value set specifically refers to specific contents in step S30 and the refinement step thereof, and S40 and the refinement step thereof, and is not described herein again.

Compared with the technical means of estimating the sample matching degree among all participants through a filter in the prior art, the embodiment of the application firstly obtains all second local sample IDs, maps the second hash values corresponding to all the second local sample IDs to a preset value taking interval to obtain the second hash mapping values corresponding to all the second local sample IDs, further sends the second sampling hash mapping value sets selected from all the second hash mapping values to the first equipment so that the first equipment respectively calculates the sample matching degree between the first equipment and all the second equipment based on the second sampling hash mapping value sets sent by all the second equipment and the first sampling hash mapping value sets generated based on all the first local sample IDs, and realizes the calculation of the sample matching degree between the first equipment and all the second equipment based on the partial hash values selected from the first equipment and all the second equipment, the purpose of calculating the sample matching degree between the first equipment and each second equipment is achieved, the sample matching degree does not need to be calculated by comparing the sample IDs of the first equipment and each second equipment one by one, and the calculation amount and the calculation complexity of calculating the sample matching degree in the federal learning are reduced.

Referring to fig. 5, fig. 5 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present application.

As shown in fig. 5, the sample matching degree calculation optimizing device may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. The communication bus 1002 is used for realizing connection communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a memory device separate from the processor 1001 described above.

Optionally, the sample matching degree calculation optimization device may further include a rectangular user interface, a network interface, a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. The rectangular user interface may comprise a Display screen (Display), an input sub-module such as a Keyboard (Keyboard), and the optional rectangular user interface may also comprise a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).

Those skilled in the art will appreciate that the configuration of the sample matching degree calculation and optimization device shown in fig. 5 does not constitute a limitation of the sample matching degree calculation and optimization device, and may include more or less components than those shown, or combine some components, or arrange different components.

As shown in fig. 5, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, and a sample matching degree calculation optimization program. The operating system is a program for managing and controlling hardware and software resources of the sample matching degree calculation optimization device, and supports the running of the sample matching degree calculation optimization program and other software and/or programs. The network communication module is used for realizing communication among the components in the memory 1005 and communication with other hardware and software in the sample matching degree calculation optimization system.

In the sample matching degree calculation optimization apparatus shown in fig. 5, the processor 1001 is configured to execute a sample matching degree calculation optimization program stored in the memory 1005, and implement the steps of the sample matching degree calculation optimization method described in any one of the above.

The specific implementation of the sample matching degree calculation optimization device of the present application is substantially the same as each embodiment of the sample matching degree calculation optimization method, and is not described herein again.

The embodiment of the present application further provides a sample matching degree calculation and optimization device, where the sample matching degree calculation and optimization device is applied to a first device, and the sample matching degree calculation and optimization device includes:

Optionally, the computing module is further configured to:

respectively estimating the number of union set samples between the first device and each second device based on each global sampling sorting value and each global sampling Hash mapping value;

respectively calculating the number of intersection samples between the first equipment and each second equipment based on the number of first sample sets corresponding to each first local sample ID, the number of second sample sets corresponding to each second local sample ID sent by each second equipment and the number of union sample;

and calculating the sample matching degree between the first device and each second device based on the quantity of the intersection samples.

Optionally, the selecting module is further configured to:

aggregating the first sampling Hash mapping value set and each second sampling Hash mapping value set respectively to obtain each aggregation result;

and respectively selecting hash mapping values with the sorting names of the sorting order of the size sorting as the global sampling sorting values from the aggregation results as the global sampling hash mapping values.

Optionally, the mapping module is further configured to:

performing hash processing on each first local sample ID to obtain each first hash value;

and mapping each first hash value into a floating point number in the preset value interval to obtain each first hash mapping value.

Optionally, the receiving module is further configured to:

sequencing the first Hash mapping values to obtain a local sequencing result;

and selecting a hash mapping value of a first sampling number in the first hash mapping values as the first sampling hash mapping value set according to the local sorting result.

Optionally, the sample matching degree calculation optimizing device is further configured to:

based on the sample matching degrees, eliminating low-matching-degree equipment with the sample matching degree lower than a preset matching-degree threshold value from the second equipment to obtain high-matching-degree equipment;

and carrying out sample alignment with each high-matching-degree device to obtain a sample alignment result.

based on the sample matching degrees, eliminating low-matching-degree equipment with the sample matching degree lower than a preset matching-degree threshold value from the second equipment to obtain high-matching-degree equipment; performing longitudinal federal learning modeling with each high-matching-degree device to obtain a longitudinal federal learning model; and/or

Based on the sample matching degrees, eliminating high matching degree equipment of which the sample matching degree is not lower than a preset matching degree threshold value from the second equipment to obtain low matching degree equipment; and performing transverse federal learning modeling on the low-matching-degree devices to obtain a transverse federal learning model.

The specific implementation of the sample matching degree calculation optimization device of the present application is substantially the same as that of each embodiment of the sample matching degree calculation optimization method, and is not described herein again.

Optionally, the mapping module is further configured to:

performing hash processing on each second local sample ID to obtain each second hash value;

and mapping each second hash value into a floating point number in the preset value interval to obtain each second hash mapping value.

The present application provides a medium, which is a readable storage medium, and the readable storage medium stores one or more programs, which are also executable by one or more processors for implementing the steps of the sample matching degree calculation optimization method described in any one of the above.

The specific implementation of the readable storage medium of the present application is substantially the same as the embodiments of the sample matching degree calculation optimization method, and is not described herein again.

The present application provides a computer program product, and the computer program product includes one or more computer programs, which can also be executed by one or more processors for implementing the steps of the sample matching degree calculation optimization method described in any one of the above.

The specific implementation of the computer program product of the present application is substantially the same as the embodiments of the sample matching degree calculation optimization method, and is not described herein again.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. A sample matching degree calculation optimization method is applied to a first device, and comprises the following steps:

2. The method for optimizing calculation of sample matching degree according to claim 1, wherein the step of calculating the sample matching degree between the first device and each of the second devices based on each of the global sample ordering values and each of the global sample hash mapping values, respectively, comprises:

3. The method for optimizing calculation of sample matching degree according to claim 1, wherein the step of selecting the global sample hash mapping value corresponding to each global sample ordering value from the first sample hash mapping value set and each second sample hash mapping value set includes:

4. The method for computing and optimizing the sample matching degree according to claim 1, wherein the step of mapping the first hash value corresponding to each first local sample ID to a preset value range to obtain the first hash mapping value corresponding to each first local sample ID includes:

5. The method for optimizing calculation of sample matching degree according to claim 1, wherein the step of selecting the first set of sample hash map values among the first hash map values comprises:

sequencing the first Hash mapping values to obtain a local sequencing result;

6. The method for optimizing calculation of sample matching degree according to claim 1, wherein after the step of calculating the degree of sample matching between the first device and each of the second devices based on each of the global sample ordering values and each of the global sample hash mapping values, the method for optimizing calculation of sample matching degree further comprises:

7. The method for optimizing calculation of sample matching degree according to claim 1, wherein after the step of calculating the degree of sample matching between the first device and each of the second devices based on each of the global sample ordering values and each of the global sample hash mapping values, the method for optimizing calculation of sample matching degree further comprises:

8. A sample matching degree calculation optimization method is applied to a second device, and comprises the following steps:

9. The method for computing and optimizing the sample matching degree according to claim 8, wherein the step of mapping the second hash value corresponding to each second local sample ID to a preset value range to obtain the second hash mapping value corresponding to each second local sample ID includes:

10. A sample matching degree calculation optimization apparatus, characterized by comprising: a memory, a processor, and a program stored on the memory for implementing the sample matching degree calculation optimization method,

the memory is used for storing a program for realizing the sample matching degree calculation optimization method;

the processor is configured to execute a program for implementing the sample matching degree calculation optimization method to implement the steps of the sample matching degree calculation optimization method according to any one of claims 1 to 7 or 8 to 9.

11. A medium which is a readable storage medium, characterized in that the readable storage medium has stored thereon a program for implementing a method for computational optimization of sample matching, the program for implementing the method for computational optimization of sample matching being executed by a processor to implement the steps of the method for computational optimization of sample matching according to any one of claims 1 to 7 or 8 to 9.

12. A computer program product comprising a computer program, wherein the computer program when executed by a processor implements the steps of the method for computational optimization of sample matching as claimed in any one of claims 1 to 7 or 8 to 9.