WO2021203980A1

WO2021203980A1 - Meteorological event prediction method and apparatus, and related device

Info

Publication number: WO2021203980A1
Application number: PCT/CN2021/083026
Authority: WO
Inventors: 王健宗; 李泽远
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-11-20
Filing date: 2021-03-25
Publication date: 2021-10-14
Also published as: CN112381307A; CN112381307B

Abstract

A meteorological event prediction method. The method comprises: a first terminal calculating, according to each sample in a first sample set, a first-order gradient set and a second-order gradient set of a loss function of a model to be trained; receiving an aggregated first-order gradient set and an aggregated second-order gradient set, which are sent by a second terminal and are obtained by performing calculation according to a second sample set, wherein the second sample set is a sample set which is determined from a sample set of the second terminal and is similar to the first sample set; and then training the model according to gradient values of samples and aggregated gradient values of similar samples, and predicting a meteorological situation by means of the trained model. By means of sending aggregated gradient values of similar samples to a first terminal for model training, the problem of data leakage is avoided, and similar samples of other terminals are also used during a model training process, such that a trained model is more accurate, and the terminals can perform synchronous and parallel training, thereby improving the calculation efficiency of a model, and rationally using data and resources.

Description

Method, device and related equipment for predicting meteorological events

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on November 20, 2020, the application number is 202011312818.9, and the invention title is "a method, device and related equipment for predicting meteorological events", the entire content of which is incorporated by reference In this application.

Technical field

This application relates to the technical field of big data processing, and in particular to a method, device and related equipment for predicting meteorological events.

Background technique

With the development of big data and artificial intelligence, the gradual application of massive data deep learning, complex neural networks, etc., the use of big data and artificial intelligence technology to predict meteorological events has become a hot topic, such as rainfall forecasts, temperature forecasts, wind speed forecasts, etc. .

The inventor found that currently, weather forecasting methods mainly include traditional statistical methods such as regression models and autoregressive moving average models, and artificial neural networks, support vector machines, regression trees and other artificial intelligence models. However, the inventor realizes that the existing researches are all aimed at centralized training models, that is, the models are trained after uploading all the data of each meteorological station to the central server. However, because the meteorological stations are widely distributed, the number is large, and the time is long. Monitoring, the amount of data is very large, and the meteorological data of different provinces involves confidentiality issues. Only using the centralized training mode to train the model often fails to meet people's expectations. The training process is very weak and will inevitably lead to inefficient calculations. Too broad and insufficient performance.

Summary of the invention

The embodiments of the present application provide a method for predicting meteorological events, which can solve the problems of data privacy protection between meteorological data and the problems of low computational efficiency, over-extensiveness, and insufficient performance of models.

In the first aspect, this application provides a method for predicting meteorological events. The method for predicting meteorological items is applied to a meteorological forecasting system. The meteorological forecasting system includes a first terminal located at a first weather station and a second terminal located at a second weather station. A terminal, the method includes: the first terminal calculates the first step set and the second step set of the loss function of the model to be trained according to each sample in the first sample set, wherein one gradient in the first step set The value is calculated based on a sample in the first sample set, and a gradient value in the second gradient set is calculated based on a sample in the first sample set, and the first sample set is the first weather The collection of samples collected by the station; the first terminal receives the aggregated first-order degree set and the aggregated second-order degree set calculated according to the second sample set sent by the second terminal, and the second sample set includes the same as the first Samples in the sample set that are similar to each sample; the first terminal according to the first step set, the aggregated first step set, the second step set, the aggregated second step set, and the first A sample set is used to train the model to be trained to obtain a trained model; the first terminal predicts the sample to be predicted based on the trained model, and determines the prediction result of the sample to be predicted.

In a second aspect, this application provides a method for predicting meteorological events. The method for predicting meteorological items is applied to a meteorological forecasting system. The meteorological forecasting system includes a first terminal at a first weather station and a second terminal at a second weather station. Terminal, the method includes: the second terminal sends a second hash table to the first terminal, the second hash table including an identifier corresponding to each sample in the third sample set and each sample Corresponding to the hash value, the third sample set is the sample collected by the second weather station; the second terminal receives the sample identification set sent by the first terminal, and each sample identification in the sample identification set Indicates a sample in the third sample set; the second terminal determines a second sample set in the third sample set according to the sample identification set, and calculates the second sample set according to each sample in the second sample set Aggregate the first-order degree set and the aggregated second-order degree set of the loss function of the model to be trained, and send the aggregated first-order degree set and the aggregated second-order degree set to the first terminal.

In a third aspect, the present application provides a meteorological event prediction device. The device includes: a processing unit for calculating the first step set and the second step set of the loss function of the model to be trained based on each sample in the first sample set Degree set, wherein a gradient value in the first step degree set is calculated based on a sample in the first sample set, and a gradient value in the second step degree set is calculated based on a sample in the first sample set Obtained, the first sample set is a set of samples collected by the first weather station; the receiving unit is configured to receive the aggregated first degree set and aggregated second degree set calculated from the second sample set sent by the second terminal The second sample set includes samples that are similar to each sample in the first sample set; the processing unit is further configured to: The two-level degree set, the aggregated two-level degree set, and the first sample set are trained on the model to be trained to obtain a trained model; based on the trained model, predict the sample to be predicted, Determine the prediction result of the sample to be predicted; or, the device includes: a sending unit, configured to send a second hash table to the first terminal, the second hash table including each sample corresponding to the third sample set The sample identifier of and the hash value corresponding to each sample, the third sample set is the sample collected by the second weather station; the receiving unit is configured to receive the sample identification set sent by the first terminal, the identifier Each sample identification in the set indicates a sample in the third sample set; the processing unit is configured to determine a second sample set in the third sample set according to the sample identification set, and according to each sample in the second sample set. Samples to calculate the aggregated first degree set and aggregated second degree set of the loss function of the model to be trained; the sending unit is further configured to send the aggregated first degree set and the aggregated second degree set To the first terminal.

In a fourth aspect, the present application provides a computer device that includes a processor and a memory, the memory is used to store instructions, the processor is used to execute the instructions, and when the processor executes the instructions , Execute the following method: according to each sample in the first sample set, calculate the first step set and the second step set of the loss function of the model to be trained, wherein a gradient value in the first step set is based on the first A sample in the sample set is calculated, a gradient value in the second gradient set is calculated based on a sample in the first sample set, and the first sample set is the sample collected by the first weather station A set; receiving an aggregated first-order degree set and an aggregated second-order degree set calculated according to a second sample set sent by the second terminal, the second sample set including samples similar to each sample in the first sample set; Training the model to be trained according to the first step degree set, the aggregate first step degree set, the second step degree set, the aggregate second step degree set, and the first sample set to obtain training Good model; based on the trained model, predict the sample to be predicted, and determine the prediction result of the sample to be predicted; or,

Send a second hash table to the first terminal, where the second hash table includes a sample identifier corresponding to each sample in the third sample set and a hash value corresponding to each sample. The third sample set is The sample collected by the second weather station; receiving the sample identification set sent by the first terminal, each sample identification in the identification set indicates a sample in the third sample set; according to the sample identification set in the In the third sample set, a second sample set is determined, and according to each sample in the second sample set, the aggregated first degree set and aggregated second degree set of the loss function of the model to be trained are calculated, and the aggregated The first-order degree set and the aggregated second-order degree set are sent to the first terminal.

In a fifth aspect, the present application provides a computer-readable storage medium storing a computer program, and the computer program is executed by a processor to implement the following method: according to each sample in the first sample set Calculate the first step set and the second step set of the loss function of the model to be trained, wherein a gradient value in the first step set is calculated based on a sample in the first sample set, and the second step set A gradient value in the set is calculated based on a sample in the first sample set. The first sample set is a set of samples collected by the first weather station; receiving the second terminal sent and calculated based on the second sample set Aggregating a first degree set and an aggregate second degree set, the second sample set includes samples similar to each sample in the first sample set; according to the first degree set and the aggregate first degree set , The two-level degree set, the aggregated two-level degree set, and the first sample set are trained on the model to be trained to obtain a trained model; based on the trained model, a sample to be predicted Make a prediction to determine the prediction result of the sample to be predicted; or,

In the embodiment of this application, the first terminal of the first weather station uses the gradient set calculated by the first sample and the gradient set calculated by the second sample of the second terminal to train the model, where the second sample is the second weather The samples of the station and the first weather station are similar, it can be seen that the local data and the data similar to other weather stations are used as the training parameters of the model, and the data and resources are rationally used to make the prediction results of the model more accurate; and the first terminal What is received is the gradient value of the sample of the second terminal. This method does not involve the central server. The terminals of each weather station do not need to upload the data to the central server, which avoids the problem of data privacy leakage; this method can also be applied to multiple applications at the same time. For the weather forecast of a weather station, each weather station can cooperate and train the model for each weather station in parallel, which effectively improves the calculation efficiency of the model.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application or the background art, the following will describe the drawings that need to be used in the embodiments of the present application or the background art.

FIG. 1 is a schematic diagram of the overall flow of a method for predicting a meteorological event provided by an embodiment of this application;

Figure 2 is a sample data structure of a data terminal provided in an embodiment of the application;

FIG. 3 is a schematic flowchart of a model training process provided in an embodiment of this application;

FIG. 4 is a schematic diagram of a data structure obtained by encrypting sample data of a data terminal provided in an embodiment of the application; FIG.

FIG. 5 is a schematic diagram of a data structure of a sample identification set determined by a data terminal according to a hash table according to an embodiment of the application; FIG.

Fig. 6 is a schematic structural diagram of a meteorological event prediction device provided in an embodiment of the application;

FIG. 7 is a schematic structural diagram of a computer device provided in an embodiment of this application.

Detailed ways

The embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. The terms used in the implementation mode part of this application are only used to explain specific embodiments of this application, and are not intended to limit this application.

The technical solution of this application may involve the field of artificial intelligence and/or big data technology to realize event prediction and promote the construction of smart cities. Optionally, the data involved in this application, such as samples and/or prediction results, can be stored in a database, or can be stored in a blockchain, which is not limited in this application.

When multiple participants participate in the training of the model, the traditional method is often to upload the data of multiple participants to a central server, train the model through the central server, and then send the trained model to the participants for use Participants make predictions about related events. Although this method can use all participants' data for model training, so that the trained model is suitable for the data prediction of each participant, the method has low computational efficiency, the model is too broad, and there is a problem of privacy leakage.

In order to solve the above problems, this application provides a method for predicting meteorological events, which combines the characteristics of federated learning technology and uses the XGBOOST model to train the model. Each weather station does not need to upload data to the central server to share data. Each weather station locally trains a model for local samples, and also uses similar samples from other weather stations for local model training, where similar samples refer to collections from other weather stations Among the samples in the sample, the local model is trained by receiving the aggregate gradient value of similar samples from other weather stations, which avoids the problem of data privacy leakage. Each weather station synchronizes and parallel trains the model for each weather station, which effectively improves The calculation efficiency of the model and the accuracy of the model are improved.

First, the overall flow of the meteorological event prediction method involved in the embodiment of the present application is introduced.

Figure 1 shows a schematic diagram of the overall process of the meteorological event prediction method. The overall process of predicting a meteorological event includes the following steps:

S101: Obtain early data to train a weather forecast model.

Among them, the early data refers to the early meteorological data. For example, a certain meteorological data points out that when the temperature is 29°C, the humidity is 73%, the wind speed is 27 km/h, and the pressure is 1009 hPa, the meteorological data in a certain place The condition is light rain, then temperature, humidity, wind speed, and air pressure are the sample feature sets of the meteorological event, and light rain is the sample label of the meteorological event. The sample feature set and sample label of the meteorological event constitute a meteorological event sample. Use these early known data to train the model, that is, find the parameters of the model, and finally get a weather forecast model with known parameters.

Among them, in the embodiment of the present application, the model used for model training is the XGBOOST model. The process of model training is the process of constructing regression trees, that is, constantly adding regression trees, that is, learning a new function f(t) to fit the residuals predicted by the previously trained t-1 tree.

S102: Use the trained model to predict meteorological events.

The final trained model can input the sample feature set of the meteorological event into the trained model when the feature set of each meteorological sample (temperature, humidity, wind speed, air pressure, etc.) is known without knowing the meteorological event. The meteorological result of the meteorological event is predicted. Specifically, when predicting the meteorological condition of a meteorological event, the sample feature set of the meteorological event is input into the trained model, and a sample feature of the sample feature set will correspond to a leaf node in a tree. Finally, the sum of the weights of the leaf nodes obtained from each tree is used as the predicted value of the meteorological event.

In the specific embodiment of the present application, the sample feature set in the meteorological data may include other meteorological features except temperature, humidity, wind speed, air pressure, etc. The embodiment of the present application does not limit the sample feature set, and the meteorological conditions may be wind, cloud, etc. For one or more of other meteorological conditions such as, snow, etc., the number of sample labels corresponding to a certain meteorological condition is also not limited.

The meteorological event prediction method provided in the embodiments of the present application is applied to a meteorological prediction system, where the meteorological prediction system includes data terminals of multiple weather stations, and each weather station is trained in parallel and synchronized. The process of model training is the same, and the data is not shared. On the basis of joint training. When each weather station trains the local XGBOOST model, not only the local sample data is used, but also the data similar to the local sample data of other weather stations. This method can realize joint training without leaking the sample data of each weather station, and solves the problem of data privacy leakage between weather station data. Using local samples and similar samples from other weather stations to train the model can make the model more accurate. The data and resources are used rationally, and the weather stations synchronize and train the model, which effectively improves the calculation efficiency of the model.

The following takes a single weather station as an example to introduce the model training method provided in the application embodiment.

Since a large amount of sample data needs to be used in the process of model training, the structure of the sample data is described first.

Embodiments of the present application relates to the model training between the plurality of weather stations, each for its own weather station meteorological data exists, then, P _i represents the i Station, i∈ (1,2,3, ......, M), M is the number of weather stations,

Q represents the samples of data samples P _i of the i-th weather, q∈ (1,2,3, ......, N i), N i is the number of samples P _i of the i-th Station, sample data

Include sample feature set

And sample label

Sample feature set

Sample Station P q _i represents the i corresponding feature set (temperature, humidity, wind speed, pressure and other meteorological data), wherein the sample feature set

T is the number of sample features, sample label

Q represents sample _i of the i-th Station P samples corresponding tag (no rain, drizzle, rain, heavy rain, heavy rain), wherein

0 means no rain, 1 means light rain, 2 means moderate rain, 3 means heavy rain, and 4 means heavy rain. Station P of the i I ⁱ _i of the sample set can be expressed as

Can also be expressed as

And, each sample data corresponds to a sample identification (identity, ID), for example, the sample

The sample ID is 1, the sample

The sample identification is 2, and so on.

In the specific embodiment of the present application, the naming method of the sample identification corresponding to the sample data is not limited, and each weather station can confirm the identification of each sample by itself, or be uniformly determined by all weather stations participating in the model training.

Fig. 2 shows a sample data structure of a data terminal provided in an embodiment of the present application. Take _{the sample data on the first terminal of the first weather station P 1} as an example, the first weather station shown in Fig. 2 The sample data table of P ₁ ^{, I 1} represents the first sample set of the first weather station P _1,

sample

The sample ID is 1, the sample data

The sample ID is 2, the sample data

The samples are identified as 3,……,

The sample is identified as N ₁ , the sample

Sample feature set

Include sample characteristics

The sample label corresponding to this sample is

In the table shown in Fig. 2, the row corresponding to the sample ID of _{1 is the sample data of the first weather station P 1} sample 1, and the sample data includes sample characteristics

The value is 12, the sample feature

The value is 17, the sample feature

The value of is 10,..., the sample feature

The value is 54, the sample label

If the value is 0, the sample ID is 2 and the row corresponds to _{the sample data of the meteorological station P 1} sample 2, and so on.

Taking the training model of the first weather station P ₁ _{as an example, the training process of the first terminal of the first weather station P 1 and} the second terminal of the second weather station P ₂ in the model training is introduced. Fig. 3 shows a schematic flowchart of a model training process provided in an embodiment of the present application. Since each of the same weather model training process, so FIG. 3 shows only the first weather model training process P ₁ a first terminal, it should be understood that, when there are M meteorological stations simultaneously training model, the The second weather station P ₂ to the M _{-th weather station P M,} the M-1 weather station _{, is also training the model locally during the model training of the first weather station P 1} , and the model training process is the same as that of the first weather station P The model training process of _{1 is the same.}

S201: The first terminal converts each sample in the first sample set into a hash value to obtain a first hash table corresponding to the first sample set.

The first sample set I ¹ is a collection of samples collected by the first weather station P ₁ _{, which includes N 1} sample data, each sample includes a sample feature set and a sample label, and the sample feature set includes temperature, humidity, wind speed, and air pressure , The sample label indicates the weather conditions. Before training the model, the data needs to be encrypted. For each sample data of the _{first weather station P 1}

Generate L hash values according to L hash functions

Among them, δ _a,b (v)=CosSim(a,v)+b represents a hash function, a is a d-dimensional random vector, v is d-dimensional sample data, and b is a random value on [0,1] The random number is set by each weather station. Correspondingly, {δ _k } _{k=1, 2,..., L} represents L hash functions corresponding to different random vectors a and random numbers b. Thus, each sample data is mapped into a fixed-length character string through a hash function. FIG. 4 shows a schematic diagram of a data structure obtained after encryption of a sample set of a data terminal provided in an embodiment of the present application.

The data structure diagram of the encrypted sample set of _{the first weather station P 1} is shown in Fig. 4, _{and the sample data of the first weather station P 1}

After hash function processing, L hash functions will be generated

sample

After hash function processing, L hash functions will be generated

By analogy, the first weather station P _{1 has} a total of N ₁ samples, and the weather station P ₁ obtains a first hash table of _{N 1 *L.}

S202: The second terminal converts each sample in the third sample set into a hash value, obtains a second hash table corresponding to the third sample set, and sends the second hash table to the first terminal.

Among them, the third sample set I ² is the samples collected by the second weather station P _2. Each sample includes a sample feature set and a sample label. The sample feature set includes temperature, humidity, wind speed and air pressure. The sample label indicates the meteorological condition. The hash table includes a sample identifier corresponding to each sample in the third sample set and a hash value corresponding to each sample. Similar to the first terminal, the second P weather third sample set of the I ₂ ^2, the second terminal of the second weather P ₂ N ₂ * L to generate a second hash function hash table, and The second hash table is sent to the first weather station P ₁ .

It should be understood that when M terminals are involved in model training, the terminals of each weather station participating in model training will perform the operations of the second terminal, generate a hash table corresponding to each sample, and send it to the first terminal. weather stations generates a P _i N _i * L hash table transmits to the first terminal, and the same hash function used in each of the weather.

S203: The first terminal receives the second hash table sent by the second terminal, obtains the sample identification set according to the first hash table and the second hash table, and sends the sample identification set to the second terminal.

The second hash table includes the identifier corresponding to each sample in the third sample set and the hash value corresponding to each sample. The first terminal determines in the third sample set according to the first hash table and the second hash table. The sample identification corresponding to the most similar sample of each sample in the first sample set, thereby obtaining the sample identification set.

Specifically, for a certain hash value in the first hash table, the hash value corresponds to a certain sample in the first sample set, and the first terminal searches the second hash table for a hash value that is the closest to the hash value. The closest hash value, the sample ID corresponding to the closest hash value is determined. When each sample in the first hash table has a sample ID determined, the set represented by all samples is the sample ID set, then The sample corresponding to the closest hash value is the most similar sample of a certain sample in the first sample set.

When it comes to training the model M participating terminals, Sample ID Sample ID model training set comprises weather stations participating in the first sample _a local weather station P corresponding to the most similar samples, a first sample is a local weather station P ₁ Any sample in the first sample set. According to the received M-1 hash tables of other weather stations, the first weather station P ₁ compares the hash tables of other weather stations with the first hash table, and determines that the samples collected by other weather stations are the same as those of the first weather station. The most similar sample identification set of the first sample set of the station P ₁ _{, that is, N 1} sample identifications are determined from each weather station, where the sample data corresponding to each sample identification represents a sample from the first weather station P ₁ The sample data of is the most similar, and the sample identification set is sent to the terminals of other M-1 weather stations participating in the model training.

Since the first weather station P ₁ has N ₁ samples, the sample data of one of the samples q

The sample ID of the most similar sample can be found from other weather stations P _i (i=2,3,4,……., M), denoted as

When i = 1, is most similar to the first sample of the sample itself Station P sample identification _1, wherein the most similar samples by comparing L hash hash table is worth a single sample, since ha The value is a fixed-length string, and the most similar sample is found by comparing the size of the string, that is, if the size of the two compared strings is the closest, the two samples are the most similar.

5 exemplarily shows _a diagram of a data structure of a hash table to determine sample identification set a first weather P. In the table shown in Figure 5, each row represents _{the sample identification of a sample of the first weather station P 1 that is} most similar to other weather stations, and each column represents a weather station corresponding to the sample of the _{first weather station P 1} The sample ID of the most similar sample. Taking the second weather station P ₂ as an example, the second column in Fig. 5 is the sample identification of the most similar sample corresponding to the samples of _{the second weather station P 2} and the first weather station P _{1, for example, the second weather station P} ₂ and the sample data of the first weather station P ₁

The most similar sample is the sample data corresponding to the sample ID 45 in the _{second weather station P 2.}

Incidentally, since the first weather station is looking for similar samples P _1, P ₁ have a first weather the N ₁ samples, the weather station must find each other the N ₁ in the first sample identity Station N ₁ * M of a sample identity P ₁ set in correspondence sample, i.e., a first obtained Station P ₁ S ^1. For example, there are a first weather station 100 samples P _1, P ₂ of the second weather station 150 samples, for each sample of the first weather P _1, P ₁ from the first weather second weather P ₂ The sample ID of a most similar sample _{is determined in P 2} , where the sample IDs of the 100 most similar samples determined from the second weather station P 2 may be different, or may be partly the same. As can be seen, when looking for similar samples of the P _i when i Station, Station P _i is the i-th sample identity will get a set S ^_i ∈N ⁱ × M, and the set of sample identity broadcasted.

S204: The first terminal calculates the first step set and the second step set of the loss function of the model to be trained according to each sample in the first sample set.

Wherein, a gradient value in the first step set is calculated based on a sample in the first sample set, and a gradient value in the second step set is calculated based on a sample in the first sample set. Use formula

with

_{Calculate the first step g 1q} and the second _{step h 1q of} a sample q of the first weather station P ₁ , where,

Represents the sample label of sample i of the first weather station P _1,

Represents the predicted value of sample i. Since there is no training model yet, when constructing the first tree, the predicted value is the given initial value. Function l() represents the loss function of the model to be trained, function l′() loss function represents the first derivative of the function l "() represents the second derivative of the loss function, a first sample of a weather station P Q ₁ is any one of _a first sample a first Station P concentration sample, That is, for each sample of the first weather station P ₁ , the first gradient and the second gradient must be calculated.

It should be understood that when the participating model training involves M terminals, the terminal of each weather station participating in the model training must perform the operation of the first weather station P ₁ described above: use the sample set to calculate the first step and the second step of the loss function In this application, the selection of the loss function is not limited. For example, the loss function can be logloss.

S205: The second terminal receives the sample identification set sent by the first terminal, determines the second sample set in the third sample set according to the sample identification set, and calculates the aggregation of the loss function of the model to be trained according to each sample in the second sample set. The level set and the aggregated second level set, and the aggregated first level set and the aggregated second level set are sent to the first terminal.

Wherein, the second sample set includes the samples that are most similar to each sample in the first sample set determined in the third sample set, and _{the second terminal of the second weather station P 2} receives the first terminal of the first weather station P ₁ The sent sample identification set, each sample identification in the identification set indicates a sample in the third sample set, and the second terminal finds the second sample set from the third sample set according to the sample identification set. Also use the formula

with

To calculate the aggregate first degree set and aggregate second degree set of the loss function of the second sample set, and send it to the first weather station P ₁ , where,

Represents the sample label of sample q of the second weather station P _2,

Indicates the predicted value of the sample q. Since there is no training model at this time, when the first tree is constructed, the predicted value is a given initial value. The function l() represents the loss function of the model to be trained, and the function l'() Represents the first derivative of the loss function, and the function l″() represents the second derivative of the loss function. Since the first weather station P ₁ has a total of N ₁ samples, the second terminal calculates a total of N ₁ aggregation steps and N ₁ polymerized second steps. It should be understood that these N ₁ polymerized first steps may be different, or may be partly the same, and similarly, the N ₁ polymerized second steps may be different or partly the same.

It should be understood that when M terminals are involved in model training, the terminal of each weather station participating in model training will perform the operation of the second terminal, that is, find similar sample sets based on the received sample identification set, and calculate similar sample sets. The aggregated first-order degree set and aggregated second-order degree set are sent to the first weather station.

For example, when the weather model training involved in further comprising a third sample Station P _3, the fourth weather P _4, ......, the first weather M P _M, the third terminal transmits a weather station in accordance with a first weather Identification set, find the sample similar to the first weather station in the third weather station, calculate the gradient value of the similar sample and send it to the first weather station, the terminal of the fourth weather station finds the sample identification set sent by the first weather station In the fourth weather station similar to the first weather station, the gradient value of the similar sample is calculated and sent to the first weather station, and so on, so that the first weather station will get the gradient value of the first sample set, The gradient values of samples similar to the first sample set in other weather stations participating in the model training will also be obtained.

S206: The first terminal receives the aggregated first degree set and aggregated second degree set calculated according to the second sample set sent by the second terminal, and aggregates the first degree set, aggregates first degree set, second degree set, and aggregates The second step set and the first sample set are trained on the model to be trained to obtain a trained model.

(1) The first terminal updates the first-order gradient set of the first sample set according to the aggregation of the first-order degree set and the first-order degree set to obtain the first-order sample gradient set, and updates the first-order sample gradient set according to the aggregated second-order degree set and the second-order degree set. The second-order gradient set of the sample set obtains the second-order sample gradient set.

When the weather stations participating in the model training only include the first weather station P ₁ and the second weather station P ₂ , for a certain sample data in the first sample set, the first terminal has a step of the loss function of the sample data The sum of the aggregate first-order gradient corresponding to the sample most similar to the sample data in the second weather station P ₂ is taken as the first-order sample gradient of the sample data, and the second-order gradient of the loss function of the sample data is compared with that of the second weather The sum of the aggregated second-order gradients corresponding to the samples most similar to the sample data in the station P _{2 is used as the second-order sample gradient of the sample data.} Whereby a first terminal to obtain a set of first-order gradient Sample Station P ₁ of a first set of samples and the second order gradient of the sample set.

It should be understood that when M terminals are involved in participating in model training, the first terminal of the first weather station P ₁ receives the aggregated first degree set and aggregated second degree set sent by terminals of other weather stations. For the first weather station, _1, a sample of any P q, P ₁ obtained weather first step a degree of polymerization

And the degree of aggregation

Then the first terminal of the first weather station P _{1 is based on}

Update the first-order sample gradient G _{1q of} sample q, according to

renew

The second-order sample gradient of H _1q , and each sample of the first weather station P ₁ must update the first step and the second step to prepare for the training of the model.

(2) The first terminal uses the first sample set, the first-order sample gradient set, and the second-order sample gradient set to train the XGBOOST model, so as to obtain the model to be predicted _{for the first weather station P 1.}

In the process of training _{the meteorological prediction model of the first weather station P 1} , the first-order sample gradient set and the second-order sample gradient set of the first sample set are used as the first-order gradient set and the second-order gradient set during the training of the first sample set. Degree set, the first gradient tree of the training model.

Since the process of training the model is the process of continuously building the regression tree, specifically, when building the first tree, it is necessary to split at the root node, and divide the first sample set into a left child node and a right child node at the node. Two sets, and use the sample gradient value of the sample to calculate the G _L , G _R , H _L , H _{R of the} two sets, and then use the formula:

Calculate the gain, and use the maximum value of the gain value as the criterion for judging the optimal split point.

Wherein the set of left leaf nodes of sample points after G _L Representative if splitting a first order sample gradients and, first order sample gradients and, H _L that represents a collection of G _R representative of the right leaf node if dividing sample points if split The sum of the second-order sample gradients of the set of sample points in the rear left leaf node, and H _R represents the sum of the second-order sample gradients of the set of sample points in the right leaf node after the split.

First, the division interval needs to be determined according to different split points. The first sample set is divided into two sets of left child nodes and right child nodes. Then, this split point is determined by the sample feature set of the sample data, and then Repeatedly calculate the gain under different division points.

For example, for sample features

If for sample characteristics

There 12,15,20,30,35} {weather data values in the sample data P _1, respectively, to the division point is calculated as 12,15,20,30,35 gain Gain. Then traverse the next sample feature in the same way, calculate the gain Gain, and so on.

Take the split point with the largest gain as the root node, and get the sample set of the left and right child nodes after the split, and then judge whether to continue the split according to the depth of the tree. If there is only one sample left in the child node after the split, the node is not Need to split again, the node becomes a leaf node, according to the formula:

Calculate the weight of the leaf node. Among them,

Is the sum of the first-order sample gradient statistics of all samples falling into leaf i,

It is the sum of the second-order sample gradient statistics of all samples falling into leaf i.

If the depth of the tree is not reached, the same split operation is continued on the left and right child nodes, that is, the child node is regarded as the root node and the above process is repeated.

If the depth of the tree is reached, the nodes of the tree can no longer be split, and the weight of the leaf nodes is calculated

Thus, the first gradient tree is trained.

When constructing the tree for lesson t (t>1), the training process is exactly the same as that of the previous t-1 tree, but the input parameters of the tree are no longer the initial input parameters G _1q and H _{1q used by the first tree.} , Since the t-th tree is fitted on the basis of the previous t-1 trees, the first step at this time is

And second degree

Need to use the predicted value of the i-th training sample by the model composed of the first t-1 trees

Therefore, the gain gain is calculated based on the split point, and the optimal split point and optimal weight required for the current round of gradient tree construction are finally determined.

When all the sample features in the sample feature set are used in the construction of the model, the XGBOOST model training is completed.

It should be understood, when it comes to training the model M participating terminals, a terminal model training weather stations have to perform a first operation of the first terminal Station P _1, model training weather stations participating terminal have to perform Operation of the second terminal of the second weather station P _{2 described above.} It can be seen that for a weather station, the weather station not only uses the sample data of other weather stations to train the model, but also obtains a prediction model for the weather station.

S207: The first terminal predicts the sample to be predicted based on the trained model, and determines the prediction result of the sample to be predicted.

After the first terminal has trained the meteorological event prediction model, it can use the sample to be predicted to predict the meteorological condition of a certain event. That is, the sample feature set in the sample to be predicted is substituted into the trained regression tree, and each sample feature will eventually fall on a leaf node of a regression tree. Add up the weight values of the leaf nodes obtained from all trees to obtain this The weather forecast value of the event, and then by comparing the result value to the value of which sample label is the closest, the meteorological condition (no rain, light rain, moderate rain, heavy rain, heavy rain) corresponding to the sample label is the forecast of the sample to be predicted result.

It can be seen that this method uses the idea of federated learning to find samples similar to the samples of the model training party from the sample data of the model training participants to expand the training sample set, thereby constructing a more accurate model. At the same time, this method After finding the similar samples, instead of sending the sample data directly to the model trainer, it sends the gradient value of the loss function of the similar sample, avoiding the problem of data leakage. This method also supports multiple terminals to train the model at the same time, which effectively improves The computational efficiency of the model.

6 is a schematic diagram of a meteorological event prediction device provided by an embodiment of the present application, which can perform the operations of the first terminal and the second terminal described above. The meteorological event prediction device 100 includes a receiving unit 101, a processing unit 102, and a sending unit 103. Wherein, when the weather event prediction apparatus 100 performs the operation of the first terminal:

The receiving unit 101 is configured to receive the aggregate first degree set and the aggregate second degree set calculated according to the second sample set sent by the second terminal, and the second sample set includes information similar to each sample in the first sample set. Sample; receiving a second hash table sent by the second terminal, the second hash table including a sample identifier corresponding to each sample in the third sample set and a hash value corresponding to each sample.

The processing unit 102 is configured to calculate the first step set and the second step set of the loss function of the model to be trained according to each sample in the first sample set, wherein a gradient value in the first step set is based on the first step The same is calculated from a sample in this set, a gradient value in the second gradient set is calculated based on a sample in the first sample set, and the first sample set is a sample collected by the first weather station According to the first step set, the first step set, the second step set, the second step set, and the first sample set, the training model is trained to obtain a trained model; the first is the same Each sample in this set is converted into a hash value to obtain the first hash table corresponding to the first sample set; according to the first hash table and the second hash table, the third sample set is The sample ID corresponding to the most similar sample of each sample in the sample set is obtained.

The sending unit 103 is configured to send the sample identification set to the second terminal, so that the second terminal determines the second sample set according to the sample identification in the sample identification set.

When the weather event prediction device 100 performs the operation of the second terminal:

The receiving unit 101 is configured to receive a sample identification set sent by the first terminal, where each sample identification in the sample identification set indicates a sample in the third sample set.

The processing unit 102 is configured to convert each sample in the third sample set into a hash value to obtain a second hash table corresponding to the third sample set; determine the second sample set in the third sample set according to the identification set, According to each sample in the second sample set, calculate the aggregate first degree set and aggregate second degree set of the loss function of the model to be trained, and send the aggregate first degree set and aggregate second degree set to the first terminal.

The sending unit 103 is configured to send a second hash table to the first terminal. The second hash table includes a sample identifier corresponding to each sample in the third sample set and a hash value corresponding to each sample. The third sample set is the sample collected by the second weather station.

Specifically, the meteorological event prediction apparatus 100 described above can refer to the related operations of the first terminal in the foregoing method embodiment for predicting meteorological events, which will not be described in detail here.

FIG. 7 is a schematic structural diagram of a computing device provided by an embodiment of the present application. The computing device 200 includes a processor 210, a communication interface 220, and a memory 230. The processor 210, the communication interface 220, and the memory 230 are connected to each other through a bus 240, The processor 210 is configured to execute instructions stored in the memory 230. The memory 230 stores program codes, and the processor 210 can call the program codes stored in the memory 230 to perform the following operations:

The meteorological event prediction device calculates the first step set and the second step set of the loss function of the model to be trained according to each sample in the first sample set, wherein a gradient value in the first step set is based on the first step. Calculated from a sample in this set, a gradient value in the second gradient set is calculated based on a sample in the first sample set, and the first sample set is a set of samples collected by the first weather station Receiving the aggregated first-order degree set and aggregated second-order degree set calculated according to the second sample set sent by the second terminal, the second sample set including samples similar to each sample in the first sample set; The first step degree set, the aggregated first step degree set, the second step degree set, the aggregated second step degree set, and the first sample set are trained on the model to be trained, and the training is good The model; based on the trained model, predict the sample to be predicted, and determine the prediction result of the sample to be predicted.

In the embodiment of the present application, the processor 210 may have a variety of specific implementation forms. For example, the processor 210 may be a central processing unit (CPU), a graphics processing unit (GPU), or a tensor processing unit ( A tensor processing unit (TPU) or a neural network processing unit (NPU) or a combination of any one or more of the processors. The processor 210 may also be a single-core processor or a multi-core processor. The processor 210 may be a combination of a CPU (GPU, TPU, or NPU) and a hardware chip. The above-mentioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof. The above-mentioned PLD complex programmable logic device (CPLD), field-programmable gate array (FPGA), generic array logic (GAL) or any combination thereof. The processor 210 may also be implemented solely by a logic device with built-in processing logic, such as an FPGA or a digital signal processor (digital signal processor, DSP).

The communication interface 220 can be a wired interface or a wireless interface for communicating with other modules or devices. The wired interface can be an Ethernet interface, a controller area network (CAN) interface, or a local interconnect network (local interconnect network, LIN) interface. The wireless interface can be a cellular network interface or a wireless LAN interface.

The memory 230 may be a non-volatile memory, for example, read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), Electrically erasable programmable read-only memory (EPROM, EEPROM) or flash memory. The memory 230 may also be a volatile memory, and the volatile memory may be a random access memory (random access memory, RAM), which is used as an external cache.

The memory 230 may also be used to store instructions and data, so that the processor 210 can call the instructions stored in the memory 230 to implement the operations performed by the processing unit 103 described above or the operations performed by the meteorological event prediction apparatus in the method embodiment. In addition, the computing device 200 may include more or fewer components than those shown in FIG. 7, or may have different component configurations.

The bus 240 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in FIG. 6, but it does not mean that there is only one bus or one type of bus.

Optionally, the computing device 200 may further include an input/output interface 250 to which an input/output device is connected to the input/output interface 250 for receiving input information and outputting operation results.

It should be understood that the computing device 200 in the embodiment of the present application may correspond to the data processing apparatus 100 in the above-mentioned embodiment, and can perform operations performed by the meteorological event prediction apparatus in the above-mentioned method embodiment, which will not be repeated here.

An embodiment of the present application also provides a computer (readable) storage medium, wherein the computer readable storage medium stores a computer program (or instruction), and the computer program is executed by a processor to implement the above method. Optionally, the storage medium involved in this application, such as a computer-readable storage medium, may be non-volatile (such as a non-transitory computer storage medium) or volatile. For example, this application provides a non-transitory computer storage medium. The computer storage medium stores instructions. When it runs on a processor, it can implement the method steps in the above method embodiments, and the processor of the computer storage medium is executing For the specific implementation of the steps of the foregoing method, reference may be made to the specific operations of the foregoing method embodiments, and details are not described herein again.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For a part that is not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

The above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented using software, the above-mentioned embodiments may be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center that includes one or more sets of available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium, or a semiconductor medium, and the semiconductor medium may be a solid state hard disk.

The above are only specific implementations of this application. Those skilled in the art can think of changes or substitutions according to the specific implementation manners provided by this application, and they should all be covered by the protection scope of this application.

Claims

A method for predicting meteorological events, wherein the method for predicting meteorological items is applied to a meteorological forecasting system, and the meteorological forecasting system includes a first terminal located at a first weather station and a second terminal located at a second weather station, the method include:

The first terminal calculates the first step set and the second step set of the loss function of the model to be trained according to each sample in the first sample set, wherein a gradient value in the first step set is based on the first sample A sample in the set is calculated, a gradient value in the second gradient set is calculated based on a sample in a first sample set, and the first sample set is a set of samples collected by a first weather station;

The first terminal receives the aggregated first-stage degree set and aggregated second-stage degree set calculated according to a second sample set sent by the second terminal, and the second sample set includes each sample that is similar to each sample in the first sample set. Sample of

The first terminal performs an evaluation on the model to be trained according to the first degree set, the aggregated first degree set, the second degree set, the aggregated second degree set, and the first sample set Perform training and get a trained model;

The first terminal predicts the sample to be predicted based on the trained model, and determines the prediction result of the sample to be predicted.
The method according to claim 1, wherein each sample includes a sample feature set and a sample label, the sample feature set includes temperature, humidity, wind speed, and air pressure, and the sample label indicates meteorological conditions.
The method according to claim 1 or 2, wherein the second sample set includes a sample determined in a third sample set that is most similar to each sample in the first sample set, and the third sample set is The sample collected by the second weather station.
The method according to claim 3, wherein before the first terminal receives the aggregated first-order degree set and aggregated second-order degree set calculated according to the second sample set sent by the second terminal, the method further comprises:

The first terminal converts each sample in the first sample set into a hash value to obtain a first hash table corresponding to the first sample set;

Receiving, by the first terminal, a second hash table sent by the second terminal, where the second hash table includes an identifier corresponding to each sample in the third sample set and a hash value corresponding to each sample;

According to the first hash table and the second hash table, the first terminal determines, in the third sample set, the sample identifier corresponding to the sample most similar to each sample in the first sample set , Get the sample identification set;

The first terminal sends the sample identification set to the second terminal, so that the second terminal determines the second sample set according to the sample identifications in the sample identification set.
A method for predicting meteorological events, wherein the method for predicting meteorological items is applied to a meteorological forecasting system, and the meteorological forecasting system includes a first terminal located at a first weather station and a second terminal located at a second weather station, the method include:

The second terminal sends a second hash table to the first terminal, where the second hash table includes a sample identifier corresponding to each sample in the third sample set and a hash value corresponding to each sample, The third sample set is samples collected by the second weather station;

Receiving, by the second terminal, a sample identification set sent by the first terminal, each sample identification in the identification set indicating a sample in the third sample set;

The second terminal determines a second sample set in the third sample set according to the sample identification set, and calculates the aggregation level of the loss function of the model to be trained according to each sample in the second sample set Collecting and aggregating a two-level degree set, and sending the aggregated first-level degree set and the aggregated two-level degree set to the first terminal.
The method according to claim 5, wherein before the second terminal sends the second hash table to the first terminal, the method further comprises: the second terminal compiling each sample in the third sample set It is converted into a hash value to obtain a second hash table corresponding to the third sample set.
The method according to claim 5 or 6, wherein each sample includes a sample feature set and a sample label, the sample feature set includes temperature, humidity, wind speed, and air pressure, and the sample label indicates meteorological conditions.
A meteorological event prediction device, wherein the device includes:

The processing unit is configured to calculate the first step set and the second step set of the loss function of the model to be trained according to each sample in the first sample set, wherein a gradient value in the first step set is based on the first A sample in the sample set is calculated, a gradient value in the second gradient set is calculated based on a sample in the first sample set, and the first sample set is the sample collected by the first weather station gather;

The receiving unit is configured to receive the aggregated first-order degree set and aggregated second-order degree set calculated according to a second sample set sent by the second terminal, the second sample set including each sample similar to the first sample set Sample of

The processing unit is further configured to perform processing on the first sample set according to the first degree set, the aggregated first degree set, the second degree set, the aggregated second degree set, and the first sample set. The model to be trained is trained to obtain a trained model; based on the trained model, the sample to be predicted is predicted, and the prediction result of the sample to be predicted is determined;

Or, the device includes:

The sending unit is configured to send a second hash table to the first terminal, where the second hash table includes a sample identifier corresponding to each sample in the third sample set and a hash value corresponding to each sample. The third sample set is the samples collected by the second weather station;

A receiving unit, configured to receive a sample identification set sent by the first terminal, where each sample identification in the identification set indicates a sample in the third sample set;

The processing unit is configured to determine a second sample set in the third sample set according to the sample identification set, and calculate the aggregation level of the loss function of the model to be trained according to each sample in the second sample set Set and aggregate two-level set;

The sending unit is further configured to send the aggregated first-order degree set and the aggregated second-order degree set to the first terminal.
A computer device, wherein the computing device includes a processor and a memory, the memory is used to store instructions, the processor is used to execute the instructions, and when the processor executes the instructions, the following method is executed:

According to each sample in the first sample set, calculate the first step set and the second step set of the loss function of the model to be trained, wherein a gradient value in the first step set is based on a value in the first sample set Obtained by sample calculation, a gradient value in the second gradient set is calculated based on a sample in a first sample set, and the first sample set is a set of samples collected by a first weather station;

Receiving an aggregated first-order degree set and an aggregated second-order degree set calculated according to a second sample set sent by the second terminal, the second sample set including samples similar to each sample in the first sample set;

Training the model to be trained according to the first step degree set, the aggregate first step degree set, the second step degree set, the aggregate second step degree set, and the first sample set to obtain training Good model

Based on the trained model, the sample to be predicted is predicted, and the prediction result of the sample to be predicted is determined.
9. The computer device according to claim 9, wherein each sample includes a sample feature set and a sample label, the sample feature set includes temperature, humidity, wind speed, and air pressure, and the sample label indicates meteorological conditions.
The computer device according to claim 9 or 10, wherein the second sample set includes a sample determined in the third sample set that is most similar to each sample in the first sample set, and the third sample set It is a sample collected by the second weather station.
11. The computer device according to claim 11, wherein before the receiving the aggregated first-order degree set and the aggregated second-order degree set calculated according to the second sample set sent by the second terminal, the processor is further configured to execute:

Converting each sample in the first sample set into a hash value to obtain a first hash table corresponding to the first sample set;

Receiving a second hash table sent by the second terminal, where the second hash table includes an identifier corresponding to each sample in the third sample set and a hash value corresponding to each sample;

According to the first hash table and the second hash table, the sample identifier corresponding to the sample most similar to each sample of the first sample set is determined in the third sample set to obtain a sample identifier set ；

The sample identification set is sent to the second terminal, so that the second terminal determines the second sample set according to the sample identifications in the sample identification set.
A computer device, wherein the computing device includes a processor and a memory, the memory is used to store instructions, the processor is used to execute the instructions, and when the processor executes the instructions, the following method is executed:

Send a second hash table to the first terminal, where the second hash table includes a sample identifier corresponding to each sample in the third sample set and a hash value corresponding to each sample. The third sample set is Samples collected by the second weather station;

Receiving a sample identification set sent by the first terminal, where each sample identification in the identification set indicates a sample in the third sample set;

Determine a second sample set in the third sample set according to the sample identification set, and calculate the aggregate first step set and aggregate second step of the loss function of the model to be trained according to each sample in the second sample set And send the aggregated first-order degree set and the aggregated second-order degree set to the first terminal.
The computer device according to claim 13, wherein, before the sending the second hash table to the first terminal, the processor is further configured to execute:

Each sample in the third sample set is converted into a hash value to obtain a second hash table corresponding to the third sample set.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the following method:

According to each sample in the first sample set, calculate the first step set and the second step set of the loss function of the model to be trained, wherein a gradient value in the first step set is based on a value in the first sample set Obtained by sample calculation, a gradient value in the second gradient set is calculated based on a sample in a first sample set, and the first sample set is a set of samples collected by a first weather station;

Receiving an aggregated first-order degree set and an aggregated second-order degree set calculated according to a second sample set sent by the second terminal, the second sample set including samples similar to each sample in the first sample set;

Training the model to be trained according to the first step degree set, the aggregate first step degree set, the second step degree set, the aggregate second step degree set, and the first sample set to obtain training Good model

Based on the trained model, the sample to be predicted is predicted, and the prediction result of the sample to be predicted is determined.
15. The computer-readable storage medium according to claim 15, wherein each sample includes a sample feature set and a sample label, the sample feature set includes temperature, humidity, wind speed, and air pressure, and the sample label indicates meteorological conditions.
The computer-readable storage medium according to claim 15 or 16, wherein the second sample set includes a sample determined in the third sample set that is most similar to each sample in the first sample set, and the first sample set The three-sample set is the samples collected by the second weather station.
The computer-readable storage medium according to claim 17, wherein the computer program is processed before receiving the aggregated first-order degree set and aggregated second-order degree set calculated according to the second sample set sent by the second terminal It is also used to implement:

Converting each sample in the first sample set into a hash value to obtain a first hash table corresponding to the first sample set;

Receiving a second hash table sent by the second terminal, where the second hash table includes an identifier corresponding to each sample in the third sample set and a hash value corresponding to each sample;

According to the first hash table and the second hash table, the sample identifier corresponding to the sample most similar to each sample of the first sample set is determined in the third sample set to obtain a sample identifier set ；

The sample identification set is sent to the second terminal, so that the second terminal determines the second sample set according to the sample identifications in the sample identification set.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the following method:

Send a second hash table to the first terminal, where the second hash table includes a sample identifier corresponding to each sample in the third sample set and a hash value corresponding to each sample. The third sample set is Samples collected by the second weather station;

Receiving a sample identification set sent by the first terminal, where each sample identification in the identification set indicates a sample in the third sample set;

Determine a second sample set in the third sample set according to the sample identification set, and calculate the aggregate first step set and aggregate second step of the loss function of the model to be trained according to each sample in the second sample set And send the aggregated first-order degree set and the aggregated second-order degree set to the first terminal.
The computer-readable storage medium according to claim 19, wherein, before the second hash table is sent to the first terminal, the computer program is further used to implement when the computer program is executed by the processor:

Each sample in the third sample set is converted into a hash value to obtain a second hash table corresponding to the third sample set.