CN114625977A

CN114625977A - Service recommendation method and device based on federal learning and related medium

Info

Publication number: CN114625977A
Application number: CN202210529080.4A
Authority: CN
Inventors: 吴远津; 起亚·伊曼纽尔通格姆; 王俊宜; 袁戟
Original assignee: Shenzhen Wanwuyun Technology Co ltd
Current assignee: Shenzhen Wanwuyun Technology Co ltd
Priority date: 2022-05-16
Filing date: 2022-05-16
Publication date: 2022-06-14

Abstract

The invention discloses a service recommendation method, a device and a related medium based on federal learning, wherein the method comprises the following steps: acquiring service information of an e-commerce platform and an offline community, and performing data processing on the service information to obtain a service data set; selecting the characteristics of the service data set to obtain corresponding target characteristics; and inputting the target characteristics into a federal learning SecureBoost algorithm, and taking an output result of the federal learning SecureBoost algorithm as a service recommendation result. According to the method, the service recommendation is predicted for the user through the federal learning algorithm after feature training, so that the service recommendation accuracy can be improved, and the privacy safety of the user can be guaranteed.

Description

Service recommendation method and device based on federal learning and related medium

Technical Field

The invention relates to the technical field of computer software, in particular to a service recommendation method and device based on federal learning and a related medium.

Background

With the social development and the improvement of living standard, the consumption concept of residents is greatly changed, and the consumption psychology is becoming mature. The residents can not blindly follow up trends and fashion, but begin to teach fashionability and taste and individually select services suitable for the residents. Thus, personalized recommendations become a marketing approach that is tailored to the service. Personalized recommendation was proposed in the 90 s of the 20 th century as an independent subject concept in the computer field, emphasizing that individual preferences are predicted according to user information and information recommendation conforming to and reflecting the individual preferences is performed by using an artificial intelligence technology.

However, in today's highly digitized society, private data faces unprecedented threats, both personal social media information, medical health information, financial information, location information, biometric information, consumer representation information, and the like, suffer from excessive sharing and abuse problems, and enterprises or institutions that collect and process such information often lack sufficient privacy encryption and protection capabilities. Meanwhile, with the increasing demands of GDPR and our country for data privacy protection, data privacy and security have become important cornerstones for enterprise business operations.

Therefore, how to guarantee the privacy and the security of the user while realizing personalized recommendation is a problem to be solved by those skilled in the art.

Disclosure of Invention

The embodiment of the invention provides a service recommendation method and device based on federal learning, computer equipment and a storage medium, aiming at improving the service recommendation precision and guaranteeing the privacy safety of a user.

In a first aspect, an embodiment of the present invention provides a service recommendation method based on federal learning, including:

acquiring service information of an e-commerce platform and an offline community, and performing data processing on the service information to obtain a service data set;

selecting the characteristics of the service data set to obtain corresponding target characteristics;

and inputting the target characteristics into a federal learning SecureBoost algorithm, and taking an output result of the federal learning SecureBoost algorithm as a service recommendation result.

In a second aspect, an embodiment of the present invention provides a service recommendation apparatus based on federal learning, including:

the data processing unit is used for acquiring service information of an e-commerce platform and an offline community, and performing data processing on the service information to acquire a service data set;

the characteristic selection unit is used for carrying out characteristic selection on the service data set to obtain corresponding target characteristics;

and the result output unit is used for inputting the target features into a federal learning SecureBoost algorithm and taking the output result of the federal learning SecureBoost algorithm as a service recommendation result.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the federal learning based service recommendation method according to the first aspect when executing the computer program.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the method for recommending a service based on federal learning according to the first aspect.

The embodiment of the invention provides a service recommendation method, a device, computer equipment and a storage medium based on federal learning, wherein the method comprises the following steps: acquiring service information of an e-commerce platform and an offline community, and performing data processing on the service information to obtain a service data set; selecting the characteristics of the service data set to obtain corresponding target characteristics; and inputting the target characteristics into a federal learning SecureBoost algorithm, and taking an output result of the federal learning SecureBoost algorithm as a service recommendation result. According to the embodiment of the invention, the service recommendation is predicted for the user through the Federal learning algorithm after feature training, so that the privacy safety of the user can be guaranteed while the service recommendation precision is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a service recommendation method based on federal learning according to an embodiment of the present invention;

fig. 2 is a sub-flow diagram of a service recommendation method based on federal learning according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of a service recommendation device based on federal learning according to an embodiment of the present invention;

fig. 4 is a sub-schematic block diagram of a service recommendation device based on federal learning according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1, fig. 1 is a schematic flow chart of a service recommendation method based on federal learning according to an embodiment of the present invention, which specifically includes: s101 to S103.

S101, acquiring service information of an e-commerce platform and an offline community, and performing data processing on the service information to obtain a service data set;

s102, selecting characteristics of the service data set to obtain corresponding target characteristics;

s103, inputting the target features into a federal learning SecureBoost algorithm, and taking an output result of the federal learning SecureBoost algorithm as a service recommendation result.

In this embodiment, data processing is performed on the obtained service information to obtain the service data set, then, corresponding target features are selected from the service data set, and the target features are used for training and learning the federal learning algorithm, so that the federal learning algorithm can output a final service recommendation result. According to the method and the device, the service recommendation is predicted for the user through the federal learning algorithm after feature training, so that the privacy safety of the user can be guaranteed while the service recommendation accuracy is improved.

The federated learning is used as a distributed machine learning paradigm, the data island problem can be effectively solved, participators can jointly model on the basis of not sharing data, the data island can be technically broken, and AI cooperation is realized. Under a federal mechanism, the identity and the status of each participant are the same, and a shared data strategy can be established. Because the data is not transferred, the user privacy or the data specification is not revealed, and the requirements of protecting the data privacy and meeting the legal compliance are met.

Specifically, the SecureBoost algorithm is a security tree algorithm provided for realizing longitudinal federal learning, and aims to perform multi-party combined learning on the premise of protecting data privacy. When applied to longitudinal federal learning, the SecureBoost mainly includes two points: firstly, data alignment provides a method for searching data intersection under the condition of not revealing data identification; secondly, homomorphic encryption is adopted during training and building of the Boost tree, split point information is only sent to a union party, and the same starting point is to protect the safety of data.

The service recommendation method provided by the embodiment is based on the longitudinal federal learning secure boost algorithm, so that community residents are well served on the premise of protecting privacy data of the community residents, and online APP and offline community personalized recommendation services are provided for community users.

In one embodiment, the step S101 includes:

respectively carrying out abnormal value detection on the lowest price and the highest price in the service information by adopting a Z-score method according to the following formula:

Z=（x-x_m）/s

wherein x represents the lowest price or the highest price, x_mThe average value of the lowest price or the average value of the highest price is represented, s represents the standard deviation of all the lowest prices or the standard deviation of all the highest prices, Z represents the standard score, and if the absolute value of Z is larger than the preset score value, the corresponding lowest price or the corresponding highest price is judged to be an abnormal value;

removing the detected abnormal value, and detecting the missing value of the service information;

when a missing value is detected, filling the missing value by adopting a mean filling method;

and acquiring character type characteristics in the service information, and converting the character type characteristics into digital type characteristics.

In this embodiment, the service information refers to data exchanged between the e-commerce APP user and the service, and specifically may include: the service ID, the classification ID, the service name, whether it is marketable, the logistics type, the supplier ID, the lowest price, the highest price, and other characteristics may also refer to the interaction behavior of the user with the service, and specifically may include: user ID, service ID, shipping address, contact phone number, shipping time, completion time, payment amount, whether to exempt from mail, etc. Further, the basic information of community residents under the line can be collected through the community management process and used as a part of the service information, and the basic information specifically comprises the characteristics of name, contact telephone, address, work, salary, family population number, marital status, age, gender and the like.

And performing data processing on the service information, specifically including abnormal value detection, missing value filling, data conversion and the like. When abnormal value detection is carried out, the standard deviation of characteristic data (namely the lowest price and the highest price) is calculated by adopting a Z-score method; carrying out missing value detection, and carrying out missing value filling on the detected missing values by adopting a mean value filling method; during data conversion, the SecureBoost algorithm directly supports the category features, so that data conversion can be directly performed through the SecureBoost algorithm, for example, two character-type features of whether the data is marketable or not and whether the data is free of mail are converted into digital category features, specifically: 1 is marketable and 0 is not marketable; similarly, 1 is exempt from mailing, and 0 is ineluctable.

In one embodiment, as shown in fig. 2, the step S102 includes: steps S201 to S204.

S201, creating a label column based on the service information;

s202, generating corresponding address characteristics according to the address information in the service data set; wherein the address features comprise city features, regional features, street features, community features, and house features;

s203, calculating a correlation coefficient between each feature in the service data set and the label column by a SelectKBest algorithm of a feature _ selection library in combination with the address feature;

s204, selecting the first N with the highest correlation coefficients as the target features.

In this embodiment, a tag column is created according to the service information, and the tag column may be represented as: 1 is purchased and 0 is not purchased. Meanwhile, feature extraction is performed on the service information, for example, two features of a receiving address and a residential address are extracted, so that new features of a city, a region, a street, a community, a house and the like, namely the address features, are generated. As shown in table 1:

TABLE 1

Then, using the SelectKBest algorithm of feature _ selection library to calculate the importance of each feature in the service data set to the tag, that is, calculating the correlation coefficient of each feature and the tag, and selecting the first N features with the highest correlation coefficient as target features, for example, selecting the first 10 features with the highest correlation coefficient: categorical ID, whether saleable, type of logistics, supplier ID, lowest price, highest price, amount paid, whether exempt from post, city and region.

In one embodiment, the step S103 includes:

and performing optimization updating on the federated learning SecureBoost algorithm according to the following formula:

wherein L represents a loss function, i represents the ith split point, n represents the number of split points,

representing said target feature, g_iAnd h_iRespectively representing the first and second derivatives of the target feature,

a regularization term representing a target feature; wherein:

wherein i and j represent different cleavage points, y_iAn objective function representing the split point of i,

representing the derivative of the objective function at the split point of i at time t-1, R^TRepresents a weight matrix, R^dRepresenting a leaf sequence matrix, w representing leaf weights, q representing sample sequences on the corresponding leaves, T representing the number of leaves,

representing regularization coefficients, R representing a matrix, W_jRepresenting the weight of the j split point.

In this embodiment, the target features are input into the SecureBoost algorithm for training and learning, and optimization updating is performed by combining the loss function, so that the final output result of the federal learning SecureBoost algorithm is more accurate.

The secure boost algorithm adopted in the embodiment is to model the XGBoost in a federal learning mode. Firstly, partial data of the same user in E-commerce APP data and community data are found, and a training set and a verification set are divided according to the proportion of 70% to 30%. Because the APP data has a label and the community data has no label, the E-commerce APP is an active party (Guest) and the community is a passive party (Host). The E-commerce APP data set has 10 ten thousand sample numbers and 15 characteristics; the community data set has the same number of 10 ten thousand samples, 11 features.

Further, the maximum information Gain of the splitting point of the leaf node is calculated according to the following formula:

in the formula, G_L、H_LFirst order and second order polymerization gradients for all left split points are indicated, respectively; g_R、H_RRespectively representing a first-order polymerization gradient and a second-order polymerization gradient of all right split points;

complexity of the newly added split point, G_j、H_jRespectively represent first order split gathersA combination gradient and a second order split polymerization gradient.

It can be seen from the loss function that the loss function of each round is only linked to the sum of the first derivative and the second derivative of each target feature, i.e. loss calculation depends only on g_iAnd h_i. And because the split point of a leaf node depends on the maximum information gain, a gain calculation is required.

In one embodiment, the service recommendation method based on federal learning further comprises:

for the split point G according to the following formula_jAnd H_jAnd (3) performing additive homomorphic encryption processing:

in the formula I_jRepresenting the set of all split points.

In this embodiment, in order to protect the tag information of the e-commerce APP from being leaked, the first tree is completely constructed by the features of the e-commerce APP, and the community does not participate at all, so that the community can only contact the residual error passing through the first tree, namely, the differential privacy. While using g to prevent communities_iAnd h_iDiscovering tag information of E-commerce APP, the E-commerce APP in the embodiment first uses g_iAnd h_iSending the encrypted g-bit stream to the community, and calculating the encrypted g-bit stream by the community by using an additive homomorphic encryption algorithm_iAnd h_iThereby obtaining a split point G_jAnd H_j。

Further, in an embodiment, the step S103 further includes:

and selecting to obtain a community optimal splitting point based on the splitting point subjected to the additive homomorphic encryption processing, selecting the global optimal splitting of the e-commerce platform according to the community optimal splitting point, and outputting the global optimal splitting as a final service recommendation result.

In this embodiment, the community is based on the encrypted g_iAnd h_iCalculating the encrypted aggregate gradient information [ G ] in the bucket through the additive homomorphic encryption formula_j]]And [ [ H ]_j]]And send it toE-commerce APP, which decrypts [ G ] via the above additive homomorphic encryption formula_j]]And [ [ H ]_j]]And obtaining the optimal splitting point of the community, selecting a global optimal splitting from the optimal splitting information of the local and the community by the E-commerce APP, and continuously iterating the process until the process reaches the maximum depth so as to obtain the final prediction result.

In a specific application scenario, service recommendation to a user is completed according to the following steps:

task preparation: e-commerce APP (Guest) creates a federal learning task, invites a community (Host) to participate in federal learning training, and endows related rights after joint review;

task configuration: the platform configures an execution process, specifies a model type and configures model parameters; e-commerce APP (Guest) and community (Host) are respectively loaded with data sets, and are uploaded to a federal learning platform after being verified;

model training: starting a task, and respectively carrying out data cleaning and characteristic engineering on an e-commerce APP and a community; calculating the intersection of the data sets of the two parties under the condition of ensuring the safety of the data of the two parties by an additive homomorphic encryption algorithm; splitting the training set into a training set and a verification set, and performing model training and verification by using a SecureBoost algorithm;

model deployment: and loading and deploying the trained model to generate a Restful API interface. The method comprises the following specific steps:

initiating a model deployment task by an e-commerce APP (application) (Guest), and pushing a model id and metadata information thereof to a memory;

meanwhile, the model is transmitted to a community (Host), and the community (Host) loads the model id and the metadata information thereof to a memory;

binding the corresponding user ID and commodity ID name when the loading of the models of the two parties is finished;

the deployment is complete.

Production calling: the recommendation system calls a single record or in batches. For example, for a certain user ID and item ID, it can be predicted whether the user purchases the item, 1 is purchase, and 0 is not purchase.

Fig. 3 is a schematic block diagram of a service recommendation apparatus 300 based on federal learning according to an embodiment of the present invention, where the apparatus 300 includes:

the data processing unit 301 is configured to obtain service information of an e-commerce platform and an offline community, and perform data processing on the service information to obtain a service data set;

a feature selection unit 302, configured to perform feature selection on the service data set to obtain a corresponding target feature;

a result output unit 303, configured to input the target feature into a federal learning SecureBoost algorithm, and use an output result of the federal learning SecureBoost algorithm as a service recommendation result.

In one embodiment, the data processing unit 301 comprises:

an abnormal value detection unit, configured to perform abnormal value detection on the lowest price and the highest price in the service information by using a Z-score method according to the following formula:

Z=（x-x_m）/s

a missing value detection unit, configured to remove the detected abnormal value and perform missing value detection on the service information;

a missing value filling unit, configured to fill the missing value by using a mean value filling method when the missing value is detected;

and the characteristic conversion unit is used for acquiring the character type characteristics in the service information and converting the character type characteristics into digital type characteristics.

In one embodiment, as shown in fig. 4, the feature selection unit 302 includes:

a tag column creating unit 401 configured to create a tag column based on the service information;

a feature generating unit 402, configured to generate a corresponding address feature according to address information in the service data set; wherein the address features comprise city features, regional features, street features, community features, and house features;

a coefficient calculating unit 403, configured to calculate, by using the SelectKBest algorithm of the feature _ selection library, a correlation coefficient between each feature in the service data set and the tag column in combination with the address feature;

a coefficient selecting unit 404, configured to select the first N highest correlation coefficients as the target features.

In one embodiment, the result output unit 303 includes:

an optimization updating unit, configured to perform optimization updating on the federal learning SecureBoost algorithm according to the following formula:

a regularization term representing a target feature; wherein:

In an embodiment, the federal learning based service recommendation device 300 further includes:

a Gain calculation unit, configured to calculate a maximum information Gain of a splitting point of a leaf node according to the following formula:

in the formula, G_L、H_LFirst order and second order polymerization gradients for all left split points are indicated, respectively; g_R、H_RRespectively representing a first order polymerization gradient and a second order polymerization gradient of all right split points;

complexity of the newly added split point, G_j、H_jFirst order and second order split-polymerization gradients are indicated, respectively.

an encryption processing unit for processing the split point G according to the following formula_jAnd H_jAnd (3) performing additive homomorphic encryption processing:

in the formula I_jRepresenting the set of all split points.

In one embodiment, the result output unit 303 further includes:

and the global selection unit is used for selecting the optimal splitting point of the community based on the splitting point subjected to the additive homomorphic encryption processing, selecting the global optimal splitting of the e-commerce platform according to the optimal splitting point of the community, and outputting the global optimal splitting as the final service recommendation result.

Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.

Embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed, the steps provided by the above embodiments can be implemented. The storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiment of the present invention further provides a computer device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided in the above embodiments when calling the computer program in the memory. Of course, the computer device may also include various network interfaces, power supplies, and the like.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A service recommendation method based on federal learning is characterized by comprising the following steps:

2. The federal learning-based service recommendation method as claimed in claim 1, wherein the obtaining of service information of an e-commerce platform and an offline community and the data processing of the service information to obtain a service data set comprises:

and respectively carrying out abnormal value detection on the lowest price and the highest price in the service information by adopting a Z-score method according to the following formula:

Z=（x-x_m）/s

3. The federal learning-based service recommendation method as claimed in claim 1, wherein the selecting the features of the service data set to obtain corresponding target features comprises:

creating a tag column based on the service information;

generating corresponding address characteristics according to the address information in the service data set; the address features comprise city features, regional features, street features, community features and house features;

calculating a correlation coefficient between each feature in the service data set and the label column by a SelectKBest algorithm of a feature _ selection library in combination with the address feature;

and selecting the first N with the highest correlation coefficients as the target characteristics.

4. The federal learning-based service recommendation method as claimed in claim 1, wherein the inputting the target feature into a federal learning SecureBoost algorithm and taking an output result of the federal learning SecureBoost algorithm as a service recommendation result comprises:

and performing optimization updating on the Federal learning SecureBoost algorithm according to the following formula:

a regularization term representing a target feature; wherein:

5. The federal learning based service recommendation method as claimed in claim 4, further comprising:

calculating the maximum information Gain of the splitting point of the leaf node according to the following formula:

complexity of the newly added split point, G_j、H_jAre respectively provided withShowing a first order split-cluster gradient and a second order split-cluster gradient.

6. The federal learning based service recommendation method as claimed in claim 5, further comprising:

for the first order cleavage polymerization gradient G according to the following formula_jAnd a second order fission polymerization gradient H_jAnd (3) performing additive homomorphic encryption processing:

in the formula I_jRepresenting the set of all split points.

7. The federal learning-based service recommendation method as claimed in claim 6, wherein the inputting the target feature into a federal learning SecureBoost algorithm and taking an output result of the federal learning SecureBoost algorithm as a service recommendation result further comprises:

8. A federal learning-based service recommendation apparatus, comprising:

9. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the federal learning based service recommendation method of any of claims 1 to 7 when executed by the computer program.

10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the federal learning based service recommendation method as claimed in any of claims 1 to 7.