WO2021169271A1 - 雷雨天气预测模型的训练方法及雷雨天气预测方法 - Google Patents

雷雨天气预测模型的训练方法及雷雨天气预测方法 Download PDF

Info

Publication number
WO2021169271A1
WO2021169271A1 PCT/CN2020/117578 CN2020117578W WO2021169271A1 WO 2021169271 A1 WO2021169271 A1 WO 2021169271A1 CN 2020117578 W CN2020117578 W CN 2020117578W WO 2021169271 A1 WO2021169271 A1 WO 2021169271A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
data
linear
features
thunderstorm weather
Prior art date
Application number
PCT/CN2020/117578
Other languages
English (en)
French (fr)
Inventor
段洪云
彭琛
汪伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021169271A1 publication Critical patent/WO2021169271A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01WMETEOROLOGY
    • G01W1/00Meteorology
    • G01W1/10Devices for predicting weather conditions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • This application relates to the fields of artificial intelligence and computer technology, and specifically to a training method of a thunderstorm weather prediction model, a thunderstorm weather prediction method, device, computer equipment, and computer-readable storage medium.
  • forecasts can be made based on weather data collected by large-scale equipment such as satellites and radars, such as inputting the collected weather data into a pre-trained weather prediction model.
  • large-scale equipment such as satellites and radars
  • it is usually necessary to ensure the prediction accuracy of the weather prediction model, which places higher requirements on the training process of the weather prediction model.
  • the purpose of this application is to provide a training method for a thunderstorm weather prediction model, a thunderstorm weather prediction method, device, computer equipment, and computer-readable storage medium, which can solve the above-mentioned defects in the prior art.
  • One aspect of this application provides a method for training a thunderstorm weather prediction model, including: acquiring multiple sets of data, where each set of data includes thunderstorm weather, multiple characteristics of the above-mentioned thunderstorm weather, and the above-mentioned thunderstorm weather and the above-mentioned thunderstorm weather The association relationship of multiple features; the target feature is selected from multiple features of the multiple sets of data, where the target feature is a feature whose first feature importance degree meets a first predetermined condition; in each set of data in the multiple sets of data In the process, the features that are irrelevant to the target feature are eliminated to form multiple sets of training data; the multiple sets of training data are used to train a predetermined algorithm to obtain a thunderstorm weather prediction model.
  • a thunderstorm weather forecasting method including: acquiring target features of the current weather; inputting the target features into a pre-trained thunderstorm weather forecasting model, so that the thunderstorm forecasting model outputs weather forecast results; Judging whether the future weather is a thunderstorm weather according to the above-mentioned weather prediction results, wherein the above-mentioned thunderstorm weather prediction model is obtained by the following method: obtaining multiple sets of data, wherein each set of data includes thunderstorm weather, multiple characteristics of the above-mentioned thunderstorm weather, and the above The correlation between the thunderstorm weather and the multiple features of the above-mentioned thunderstorm weather; the target feature is selected from the multiple features of the multiple sets of data, wherein the target feature is the feature whose first feature importance degree satisfies the first predetermined condition; In each of the multiple sets of data, the features that are not related to the target feature are eliminated to form multiple sets of training data; the multiple sets of training data are used to train a predetermined algorithm to obtain a thunderstorm weather prediction model.
  • a training device for a thunderstorm weather prediction model including: a first acquisition module for acquiring multiple sets of data, wherein each set of data includes thunderstorm weather, multiple characteristics of the above-mentioned thunderstorm weather, and The correlation relationship between the above-mentioned thunderstorm weather and the multiple features of the above-mentioned thunderstorm weather; The feature of the condition; the elimination module is used to eliminate the features that are not related to the above-mentioned target feature from each group of the above-mentioned multiple sets of data to form multiple sets of training data; the training module is used to train the predetermined algorithm using the above-mentioned multiple sets of training data , Get the thunderstorm weather forecast model.
  • a thunderstorm weather forecasting device including: a second acquisition module for acquiring target features of the current weather; an input module for inputting the target features into a pre-trained thunderstorm weather forecasting model, So that the above-mentioned thunderstorm weather prediction model outputs weather prediction results; the determination module is used to determine whether the future weather is thunderstorm weather according to the above-mentioned weather prediction results, wherein the above-mentioned thunderstorm weather prediction model is obtained by the following method: multiple sets of data are obtained, wherein, Each group of data includes thunderstorm weather, multiple characteristics of the above-mentioned thunderstorm weather, and the association relationship between the above-mentioned thunderstorm weather and the above-mentioned multiple characteristics of the above-mentioned thunderstorm weather; the target feature is selected from the multiple characteristics of the above-mentioned multiple sets of data, wherein the above-mentioned target A feature is a feature whose importance of the first feature satisfies the first predetermined condition; in each of the above multiple sets of data, the features that are
  • the computer device includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor.
  • the processor executes the computer program, the following is achieved
  • the steps of the training method of the thunderstorm weather forecasting model obtaining multiple sets of data, where each set of data includes thunderstorm weather, multiple characteristics of the above-mentioned thunderstorm weather, and the association relationship between the above-mentioned thunderstorm weather and the multiple characteristics of the above-mentioned thunderstorm weather;
  • the target feature is selected from the multiple features of the multiple sets of data, where the target feature is a feature whose first feature importance degree meets a first predetermined condition; in each set of data in the multiple sets of data, the target feature will not be related to the target feature.
  • After removing the features of form multiple sets of training data; use the above multiple sets of training data to train a predetermined algorithm to obtain a thunderstorm weather forecast model.
  • the computer device includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor.
  • the processor executes the computer program, the following is achieved
  • the steps of the method for forecasting thunderstorm weather obtaining the target characteristics of the current weather; inputting the target characteristics into the pre-trained thunderstorm weather forecasting model, so that the thunderstorm weather forecasting model outputs the weather forecast result; judging whether the future weather is based on the weather forecasting result It is a thunderstorm weather, wherein the above-mentioned thunderstorm weather prediction model is obtained by the following method: obtaining multiple sets of data, wherein each set of data includes thunderstorm weather, multiple characteristics of the above-mentioned thunderstorm weather, and the above-mentioned thunderstorm weather and multiple of the above-mentioned thunderstorm weather The association relationship of the features; the target feature is selected from the multiple features of the multiple sets of data, where the target feature is a feature whose first feature importance degree meets a first predetermined condition; in each set of data in the multiple
  • Another aspect of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the following steps of the method for training a thunderstorm weather prediction model are realized: obtaining multiple sets of data, wherein Each set of data includes thunderstorm weather, multiple characteristics of the above-mentioned thunderstorm weather, and the association relationship between the above-mentioned thunderstorm weather and the multiple characteristics of the above-mentioned thunderstorm weather; the target feature is selected from the multiple characteristics of the above-mentioned multiple sets of data, wherein, the above The target feature is the feature whose importance of the first feature satisfies the first predetermined condition; in each of the above multiple sets of data, features that are not related to the above target feature are eliminated to form multiple sets of training data; training using the above multiple sets of training data A predetermined algorithm is used to obtain a thunderstorm weather forecast model.
  • Another aspect of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the following steps of the method for predicting thunderstorm weather are realized: obtaining the target feature of the current weather;
  • the target feature is input to the pre-trained thunderstorm weather prediction model, so that the above-mentioned thunderstorm weather prediction model outputs the weather prediction result; according to the above-mentioned weather prediction result, it is judged whether the future weather is a thunderstorm weather, wherein the above-mentioned thunderstorm weather prediction model is obtained by the following method: Acquire multiple sets of data, where each set of data includes the thunderstorm weather, multiple characteristics of the above-mentioned thunderstorm weather, and the association relationship between the above-mentioned thunderstorm weather and the multiple characteristics of the above-mentioned thunderstorm weather; filter out the multiple characteristics of the above-mentioned multiple sets of data
  • the target feature wherein the target feature is a feature whose first feature importance degree satisfies a first predetermined condition; in each of the multiple sets
  • the training method of the thunderstorm weather prediction model provided by this application selects the target features whose first feature importance meets the first predetermined condition, and removes the features irrelevant to the target feature to obtain multiple sets of training data, and then use multiple sets of training data for training Thunderstorm weather forecast model. Since these training data no longer include redundant features, and the magnitude of the features included in these training data is also significantly reduced, it is sufficient to overcome the shortcomings in the prior art and achieve the improvement of the accuracy of the trained thunderstorm weather prediction model Purpose.
  • this application considers two parts of features: linear-type features and nonlinear-type features, and considers the independent effects of linear-type features and nonlinear-type features, and On this basis, considering the synergy between multiple features, adding nonlinear effects to improve the expressive ability of the model.
  • For linear features first select N sets of preliminary linear features through N extractions and sequentially output the nonlinear feature screening model, and then calculate the second-step linear features from the N sets of preliminary linear features, and then pass the improved predetermined regression model, After selecting the x that has the greatest impact on the output y, gradually add new factors, and ensure that the new factors will not cause significant changes in the original factors until the goodness of fit of the model no longer improves. After two layers of screening, different The screening process has different pertinence, which can improve the interpretability of the feature screening process and the effectiveness of the final linear target feature.
  • pre-screening can ensure the controllability of the feature level, which is convenient for input into the non-linear feature screening model, and then according to the fourth feature importance of the feature, the fourth feature is important after each round of training
  • the features whose degree meets the fifth predetermined condition are substituted into the next round of training, and features with lower importance are gradually deleted, so as to ensure that the number of features input into the nonlinear feature screening model is entered in a decreasing form, which improves the accuracy of the model.
  • the goal of non-linear target feature screening is achieved.
  • the expressive ability of the model not only depends on the existing single feature, but the collaborative expression between the features can also fit the effect of the model to a certain extent and improve the accuracy of the result.
  • Fig. 1 schematically shows a flowchart of a method for training a thunderstorm weather prediction model according to an embodiment of the present application
  • FIG. 2 schematically shows a flowchart of a thunderstorm weather prediction method according to an embodiment of the present application
  • FIG. 3 schematically shows a block diagram of a training device for a thunderstorm weather prediction model according to an embodiment of the present application
  • Fig. 4 schematically shows a block diagram of a thunderstorm weather forecasting device according to an embodiment of the present application
  • Fig. 5 schematically shows a block diagram of a computer device suitable for implementing a training method for a thunderstorm weather prediction model and/or a thunderstorm weather prediction method according to an embodiment of the present application.
  • the prior art related to the application is introduced first.
  • feature screening is also performed before model training. Due to the improvement of existing storage technology and computing power, the existence of a large number of feature indicators makes the construction of the model more complete, and the accuracy of the results is guaranteed However, a large number of redundant features will cause the training of the model to be extremely time-consuming and prone to over-fitting.
  • the current feature screening methods mainly rely on statistical feature screening methods, such as null rate, variance, correlation, collinearity and other forms.
  • Such methods can play a role in distinguishing features to a certain extent, but in the feature pool When the magnitude is huge, it is difficult to effectively reduce the magnitude of the feature only by this method.
  • the objective screening method is too dependent on statistical theory, which reduces the interpretability of the feature in the screening process.
  • the feature selection from a single angle will make the model not have good scalability and lack the influence of multi-feature antagonism on dependent variables. Therefore, the core features cannot be obtained by using the feature selection of statistical methods, which leads to the inability to fit an effective attribution model.
  • the training method of the thunderstorm weather prediction model provided in this application selects the target features whose first feature importance meets the first predetermined condition, and eliminates the features irrelevant to the target feature to obtain multiple sets of training data, and then use multiple sets of training data Train a thunderstorm weather forecast model. Since these training data no longer include redundant features, and the magnitude of the features included in these training data is also significantly reduced, it is sufficient to overcome the shortcomings in the prior art and achieve the improvement of the accuracy of the trained thunderstorm weather prediction model Purpose.
  • this application considers two parts of features: linear-type features and nonlinear-type features, and considers the independent effects of linear-type features and nonlinear-type features, and On this basis, considering the synergy between multiple features, adding nonlinear effects to improve the expressive ability of the model.
  • For linear features first select N sets of preliminary linear features through N extractions and sequentially output the nonlinear feature screening model, and then calculate the second-step linear features from the N sets of preliminary linear features, and then pass the improved predetermined regression model, After selecting the x that has the greatest impact on the output y, gradually add new factors, and ensure that the new factors will not cause significant changes in the original factors until the goodness of fit of the model no longer improves. After two layers of screening, different The screening process has different pertinence, which can improve the interpretability of the feature screening process and the effectiveness of the final linear target feature.
  • pre-screening can ensure the controllability of the feature level, which is convenient for input into the non-linear feature screening model, and then according to the fourth feature importance of the feature, the fourth feature is important after each round of training
  • the features whose degree meets the fifth predetermined condition are substituted into the next round of training, and features with lower importance are gradually deleted, so as to ensure that the number of features input into the nonlinear feature screening model is entered in a decreasing form, which improves the accuracy of the model.
  • the goal of non-linear target feature screening is achieved.
  • the expressive ability of the model not only depends on the existing single feature, but the collaborative expression between the features can also fit the effect of the model to a certain extent and improve the accuracy of the result.
  • Fig. 1 schematically shows a flowchart of a method for training a thunderstorm weather prediction model according to an embodiment of the present application.
  • the training method of the thunderstorm weather prediction model may include steps S1 to S4, wherein:
  • Step S1 Obtain multiple sets of data, where each set of data includes thunderstorm weather, multiple characteristics of thunderstorm weather, and the association relationship between thunderstorm weather and multiple characteristics of thunderstorm weather.
  • each set of data is data corresponding to a certain thunderstorm day in history
  • each set of data includes output y and input x, that is, thunderstorm weather is called output y
  • multiple features of thunderstorm weather are called Enter x
  • there is an association relationship between y and x that is, there is an association relationship between thunderstorm weather and multiple features.
  • multiple characteristics of thunderstorm weather can be: temperature, air pressure, rainfall, humidity, air density, wind volume, and so on.
  • the first set of data corresponds to the data on March 15, including: thunderstorm weather, multiple characteristics of thunderstorm weather on March 15, and the relationship between the two;
  • the second set of data corresponds to March 18 Daily data, including: thunderstorm weather, multiple characteristics of thunderstorm weather on March 18, and the relationship between the two;
  • the third set of data corresponds to the data on May 7, including: thunderstorm weather, thunderstorm weather on May 7
  • the third set of data corresponds to the data on June 24, including: thunderstorm weather, multiple features of thunderstorm on June 24, and the relationship between the two.
  • step S2 the target feature is selected from the multiple features of the multiple sets of data, where the target feature is a feature whose first feature importance degree satisfies a first predetermined condition.
  • the purpose of this embodiment is to train a thunderstorm weather model by using target features, so as to overcome the defects of the prior art. Therefore, it is necessary to filter out a feature whose first feature importance degree satisfies the first predetermined condition from a plurality of features, as the target feature.
  • each feature corresponds to a first feature importance
  • the first feature importance is used to measure the closeness of the correlation between the feature and the thunderstorm.
  • the first feature importance may be a correlation coefficient between each feature and thunderstorm weather
  • the first predetermined condition may be a feature whose first feature importance is arranged before a predetermined position.
  • step S2 may include step S21 and/or step S22, where:
  • Step S21 using multiple sets of data to filter out the linear target features belonging to the linear type from the multiple features;
  • Step S22 using multiple sets of data to filter out the non-linear target features belonging to the non-linear type from the multiple features.
  • the multiple features may include linear features or nonlinear features, and the linear features may also belong to the nonlinear type at the same time.
  • the linear target feature is determined as the target feature; when there is only a feature belonging to the nonlinear type among the multiple features, the nonlinear target feature is determined as Target feature; when there are both linear and non-linear features among multiple features, the linear target feature and the non-linear target feature are determined as the target feature.
  • step S2 may include Steps S21 to S24, wherein the target feature may include a linear target feature belonging to a linear type, specifically:
  • Step S21 Perform N sampling on multiple sets of data to obtain N data sets, where each data set includes one or more of the multiple sets of data.
  • the sampling method is not limited, for example, the idea of Bootstrapping algorithm can be used for sampling.
  • the second data set includes: the second set of data, the third set of data, and the fourth set of data;
  • the third The data set includes: the first set of data, the second set of data, and the fourth set of data.
  • Step S22 For each data set in the N data sets, input the data set into a linear feature screening model, where the linear feature screening model is used to calculate the second feature importance of each feature for multiple features of the data set, And output the features whose importance of the second feature satisfies the second predetermined condition and belong to the linear type, which is called a set of preliminary linear features.
  • the linear feature screening model only outputs features of linear type, and for each feature of linear type, the second feature importance of the feature is calculated.
  • each feature will carry the coefficient of the feature in front of it. It is used to characterize the importance of a feature. The larger the coefficient, the higher the importance. Therefore, the second feature importance in this embodiment is the coefficient in front of each feature. Then output features that belong to the linear type and whose second feature importance meets the second predetermined condition, for example, output features whose second feature importance is not 0 and belong to the linear type.
  • the linear feature selection model introduces the L1 regular term as the Lasso model, which is used to output linear features and automatically calculate the second feature importance of the features, and then the second feature belonging to the linear feature
  • the importance is output as the coefficient of the feature, for example, 0.8 humidity, then 0.8 is the second feature importance of humidity.
  • the second predetermined condition is: the importance of the second feature is not 0, and for each data set, the Lasso model will output features whose coefficients are not 0 and belong to the linear type.
  • Step S23 Obtain N sets of preliminary linear features output by the linear feature screening model.
  • the linear screening model will sequentially output N sets of preliminary linear features, and each set of preliminary linear features includes The feature types may be different.
  • the first set of preliminary linear features include: temperature, air pressure, and humidity
  • the second set of preliminary linear features include: temperature, air pressure, rainfall, and air volume
  • the third set of preliminary linear features include: temperature and humidity
  • Step S24 using N sets of preliminary linear features to screen out linear target features.
  • step S24 may include step S241 to step S243, wherein:
  • Step S241 performing statistics on all the features in the N groups of preliminary linear features to obtain the third feature importance of each feature
  • step S242 from the N groups of preliminary linear features, the features whose importance of the third feature satisfies the third predetermined condition are screened out, which is called the second-step linear feature;
  • step S243 the linear target feature is screened out by using the secondary linear feature.
  • the third feature importance may be the number of appearances of each feature in the N groups of preliminary linear features, and the third predetermined condition may be that the number exceeds a predetermined number threshold.
  • the number of occurrences of temperature is 3, the number of occurrences of air pressure is 2, the number of occurrences of humidity is 2, the number of occurrences of rainfall is 1, and the number of occurrences of air volume is 1. If the third predetermined condition is that the number of times exceeds 1, the linear characteristics of the next step are temperature, air pressure, and humidity.
  • the linear target feature can be filtered out according to the linear feature of the next step. For example, directly use the linear feature of the next step as the linear target feature.
  • step S243 may include step A1 to step A8, where:
  • Step A1 Calculate the feature quantity M of all features in the linear feature of the second step and the correlation coefficient of each feature with thunderstorm weather;
  • Step A2 Use the feature with the first largest correlation coefficient as a feature of the linear target feature
  • Step A3 Input the feature with the first largest correlation coefficient and the thunderstorm weather into the first predetermined regression model to obtain the first significance
  • Step A4 Judge whether i is greater than M, when i is not greater than M, perform step A5, when i is greater than M, perform step A8, where the initial value of i is 1;
  • Step A5 Input the feature with the i+1th largest correlation coefficient into the i+1th predetermined regression model to obtain the i+1th significance.
  • the i+1th predetermined regression model combines the first i features with the thunderstorm Weather input to the i-th predetermined regression model;
  • Step A6 Determine whether the relationship between the i-th saliency and the i+1 saliency satisfies the sixth predetermined condition, if yes, proceed to step A7, if not, proceed to step A4;
  • Step A7 Determine the feature with the i+1th largest correlation coefficient as a feature of the linear target feature
  • Step A8 Determine all the features from the next-step linear features as linear target features.
  • This embodiment is a cyclic operation. Specifically, the feature with the largest correlation coefficient with y (called the feature with the first largest correlation coefficient) is selected from the linear features of the next step as a feature of the linear target feature, and The feature with the first largest correlation coefficient and the output y are input into the predetermined regression model (called the first predetermined regression model at this time), and the significance of one is called the first significance, and the correlation coefficient is the first largest
  • the model obtained after inputting the features of into the first predetermined regression model is called the second predetermined regression model. Further, select the feature with the second largest correlation coefficient with y from the linear features of the second step (called the feature with the second largest correlation coefficient), and input the feature with the second largest correlation coefficient into the second predetermined regression model.
  • the second saliency Get a saliency, called the second saliency. Then judge whether the relationship between the first significance and the second significance meets the sixth predetermined condition (for example, judge whether the difference between the two significance is greater than 0.0001), if so, it indicates that the correlation coefficient is the second largest.
  • the feature will have a significant sound for the feature with the first largest correlation coefficient. At this time, continue to determine the relationship between the significance of the feature with the third largest correlation coefficient and the first significance; if not, the correlation coefficient with the second largest
  • the feature is also used as a feature in the linear target feature, and continue to determine the relationship between the significance of the third-largest feature with the correlation coefficient and the second significant, and so on, until all the features in the linear feature of the next step are judged.
  • the number of features included in the second-step linear feature is large, if the cyclic execution of the judgment logic of the significance of all features will seriously increase the workload of the processor, at this time, it can be determined by judging the fit of the predetermined regression model. Degree determines when to stop the judgment logic of significance. details as follows:
  • Step A3 may include: inputting the feature with the first largest correlation coefficient and thunderstorm weather into the first predetermined regression model to obtain the first significance and the first goodness of fit;
  • Step A5 may include: inputting the feature with the i+1-th largest correlation coefficient into the i+1-th predetermined regression model to obtain the i+1-th significance and the i+1-th first goodness of fit;
  • the training method of the thunderstorm weather prediction model may further include: judging whether the relationship between the i-th first goodness of fit and the i+1-th first goodness of fit is satisfied The seventh predetermined condition, if not, execute step A4, if yes, execute step A8.
  • the judgment logic of the significance of all features has not been executed, if the relationship between the i-th first goodness of fit and the (i+1)th first goodness of fit satisfies the seventh predetermined condition , The judgment logic of judging the significance of the remaining features is no longer continued, and all the features determined from the linear features of the next step so far are regarded as the linear target features.
  • the relationship between the ith first goodness of fit and the (i+1)th first goodness of fit satisfies the seventh predetermined condition may be: the ith first goodness of fit and the (i+1)th first goodness of fit The difference in the first goodness of fit is less than 0.0001.
  • the goodness of fit can be determined by R2, which is also called the coefficient of determination.
  • step S2 may also include steps S21' to S24', wherein the target features may include non-linear types belonging to the nonlinear type.
  • Target characteristics specifically:
  • Step S21' input multiple sets of data into a nonlinear feature screening model, where the nonlinear feature screening model is used to use multiple sets of data to calculate the fourth feature importance of each feature in the multiple features, and output the fourth feature importance The degree satisfies the fourth predetermined condition and is a feature of the non-linear type.
  • the non-linear feature screening model only outputs features of the non-linear type, and for each feature of the non-linear type, calculates the fourth feature importance of the feature, and then outputs the non-linear type and the fourth feature's importance satisfies the fourth predetermined condition Features, for example, output the fourth feature whose importance is not 0 and belongs to the non-linear type.
  • the nonlinear feature selection model is, for example, a machine learning model, such as a random forest algorithm (Random Forest, referred to as RF) or a gradient boosting tree (Gradient Boosting Decison Tree, referred to as GBDT) in the machine learning model.
  • a random forest algorithm Random Forest, referred to as RF
  • a gradient boosting tree Gradient Boosting Decison Tree, referred to as GBDT
  • the tree constructed in the random forest algorithm can count the reduction degree of the Gini coefficient after the node feature is split at each node.
  • a certain feature improves the classification or regression purity. This value is the degree of contribution, that is, the fourth feature importance.
  • the fourth feature importance of the non-linear type feature can also be output as the coefficient of the feature, for example, 0.6 air density, then 0.6 is the fourth feature importance of air density.
  • the fourth predetermined condition is: the fourth feature importance is not 0, then for each data set, the nonlinear screening model will output features whose coefficients are not 0 and belong to the nonlinear type.
  • step S22' the feature whose importance of the fourth feature satisfies the fifth predetermined condition is removed from the features output by the non-linear feature screening model to obtain a preliminary non-linear feature.
  • the fifth predetermined condition is: the feature with the fourth lowest importance.
  • the features output by the non-linear feature screening model can be sorted by using the four feature importances in descending order, and then the features arranged at the end can be removed to obtain preliminary non-linear features.
  • step S23' for each group of data in the multiple groups of data, features that have nothing to do with the preliminary non-linear characteristics are eliminated, and multiple sets of preliminary screening data are obtained.
  • Eliminating features that have nothing to do with the preliminary nonlinear features is to eliminate features other than the preliminary nonlinear features.
  • Step S24' continue to input multiple sets of preliminary screening data into the non-linear feature screening model until the non-linear target feature is screened out.
  • the training method of the thunderstorm weather prediction model may further include: calculating the second goodness of fit of the non-linear feature screening model this time.
  • Step S24' may include step S241' to step S246', in which:
  • Step S241' continue to input multiple sets of preliminary screening data into the non-linear feature screening model to obtain the next-step non-linear feature
  • step S242' for each group of preliminary screening data of the multiple sets of preliminary screening data, features that have nothing to do with the non-linear characteristics of the sub-steps are eliminated, and multiple sets of sub-step screening data are obtained;
  • Step S243' calculating the third goodness of fit of this non-linear feature screening model
  • step S244' it is judged whether the relationship between the second goodness of fit and the third goodness of fit meets the eighth predetermined condition; if so, step S245' is executed; if not, step S246' is executed.
  • step S245' the non-linear feature of the next step is determined as the non-linear target feature.
  • Step S246' continue to input multiple sets of sub-step screening data into the non-linear feature screening model until the non-linear target feature is screened out.
  • This embodiment also belongs to a cyclic operation. Specifically, multiple sets of preliminary screening data are first obtained, and the second goodness of fit is calculated; then multiple sets of substep screening data are obtained, and the third goodness of fit is calculated. The relationship between the goodness of fit and the third goodness of fit satisfies the eighth predetermined condition, then the next-step non-linear feature is determined as the non-linear target feature, otherwise, continue to input multiple sets of sub-step screening data into the non-linear feature screening Model until the relationship between the degree of fit satisfies the eighth predetermined condition.
  • the eighth predetermined condition is, for example, that the difference between the loss function corresponding to the second goodness of fit and the loss function corresponding to the third goodness of fit is less than 0.0001.
  • this embodiment can also perform multiple sets of data first. Preprocess, and then input the preprocessed data into the nonlinear feature screening model. details as follows:
  • Step S21' may include step S211' and step S212', in which:
  • Step S211' for each group of data in the multiple sets of data, pre-screening multiple features using predetermined rules to obtain multiple sets of preprocessed data;
  • Step S212' input multiple sets of pre-processed data into a nonlinear feature screening model, where the non-linear feature screening model is used to use multiple sets of pre-processed data to calculate the fourth feature importance of each feature in the pre-screened features , And output the fourth feature importance that satisfies the fourth predetermined condition and belongs to the non-linear type.
  • the preprocessing may be to calculate the distance between every two features for each set of data, such as Euclidean distance. If the distance between the two features is greater than a predetermined threshold, it is considered that the two features are one of the two features. The correlation between the two is very strong, only one is needed. At this time, you can continue to calculate the distance between each of the two features and the output y thunderstorm weather, and eliminate the feature that is less distant from the thunderstorm weather.
  • multiple sets of preprocessed data can be obtained.
  • multiple sets of preprocessed data are input into the nonlinear feature screening model, where the processing logic here is consistent with the processing logic of directly inputting multiple sets of data into the nonlinear feature screening model, and will not be repeated here.
  • Step S3 in each of the multiple sets of data, features that are not related to the target feature are eliminated to form multiple sets of training data.
  • the target feature when there are only linear features among multiple features, the target feature only includes linear target features; when there are only nonlinear features among multiple features, the target feature only includes nonlinear target features; When there are both linear and non-linear features in the two features, the target feature includes both the linear target feature and the non-linear target feature.
  • the features included are features that have a greater contribution to thunderstorm weather.
  • Step S4 Use multiple sets of training data to train a predetermined algorithm to obtain a thunderstorm weather prediction model.
  • a thunderstorm weather prediction model can be obtained.
  • the thunderstorm weather prediction model is used to predict whether the future weather will be a thunderstorm based on the characteristics of the current weather.
  • the predetermined algorithm is, for example, a Support Vector Machine (SVM) algorithm, an Adaptive Boosting (AdaBoost) algorithm, a Logistic Regression (LR) algorithm, or a Decision Tree (Decision Tree). algorithm.
  • Fig. 2 schematically shows a flowchart of a thunderstorm weather prediction method according to an embodiment of the present application.
  • the method for predicting thunderstorm weather may include steps M1 to M3, wherein:
  • Step M1 obtain the target feature of the current weather
  • Step M2 Input the target feature into the pre-trained thunderstorm weather forecast model, so that the thunderstorm weather forecast model outputs the weather forecast result.
  • the thunderstorm weather prediction model is obtained by the method in the first embodiment.
  • Step M3 judging whether the future weather is thunderstorm weather according to the weather prediction result.
  • the target feature of the current weather belongs to the pre-trained thunderstorm weather prediction model. Since the thunderstorm weather prediction model training process is rigorous and the training results are accurate, the weather prediction results obtained are also more credible. Among them, the weather forecast result can be thunderstorm weather or not thunderstorm weather. When the weather forecast result is thunderstorm weather, it indicates that the predicted future weather is thunderstorm weather. When the weather forecast result is not thunderstorm weather, it indicates that the predicted future weather is not. Thunderstorm weather.
  • FIG. 3 schematically shows a block diagram of a training device for a thunderstorm weather prediction model according to an embodiment of the present application.
  • the training device 300 of the thunderstorm weather prediction model may include a first acquisition module 301, a screening module 302, a rejection module 303, and a training module 304, wherein:
  • the first acquisition module 301 is configured to acquire multiple sets of data, where each set of data includes thunderstorm weather, multiple characteristics of the thunderstorm weather, and the association relationship between the thunderstorm weather and the multiple characteristics of the thunderstorm weather;
  • the screening module 302 is configured to screen out target features from multiple features of the multiple sets of data, where the target feature is a feature whose first feature importance degree meets a first predetermined condition;
  • the culling module 303 is used for culling features that are not related to the target feature in each of the multiple sets of data to form multiple sets of training data;
  • the training module 304 is configured to use the multiple sets of training data to train a predetermined algorithm to obtain a thunderstorm weather prediction model.
  • the screening module is further configured to: use the multiple sets of data to filter out linear target features belonging to the linear type from the multiple features; and/or use the multiple sets of data to select from the multiple The non-linear target features belonging to the non-linear type are filtered out of the features.
  • the target feature includes a linear target feature belonging to a linear type
  • the screening module selects the target feature from the multiple features of the multiple sets of data, it is further configured to: perform N times on the multiple sets of data Sampling to obtain N data sets, wherein each of the data sets includes one or more of the multiple sets of data; for each of the N data sets, the data sets are Input a linear feature screening model, where the linear feature screening model is used to calculate the second feature importance of each feature for the multiple features of the data set, and output that the second feature importance satisfies the first 2.
  • Features that belong to the linear type under predetermined conditions are called a set of preliminary linear features; obtain N sets of preliminary linear features output by the linear feature screening model; use the N sets of preliminary linear features to screen out the linear target feature .
  • the screening module uses the N sets of preliminary linear features to screen out the linear target features, it is also used to: perform statistics on all the features in the N sets of preliminary linear features to obtain the third feature of each feature Importance; from the N groups of preliminary linear features, the third feature with importance that meets the third predetermined condition is selected, which is called the secondary linear feature; the linear target feature is selected by using the secondary linear feature.
  • Step A1 Calculate the feature quantity M of all features in the secondary linear feature and the relationship between each feature and all features. State the correlation coefficient of thunderstorm weather;
  • Step A2 Use the feature with the first largest correlation coefficient as a feature of the linear target feature;
  • Step A3 Input the first feature with the first correlation coefficient and the thunderstorm weather into the first Predetermine the regression model to obtain the first significance;
  • Step A4 Determine whether i is greater than M, when i is not greater than M, perform step A5, when i is greater than M, perform step A8, where the initial value of i is 1;
  • Step A5 Input the feature with the i+1th largest correlation coefficient into the i+1th said predetermined regression model to obtain the i+1th said significance, wherein the i+1th said predetermined regression model Obtained by inputting the first i features and thunderstorm weather into the i-th predetermined regression model;
  • Step A6 Determine whether the relationship between the i-th sal
  • the screening module is further used to: input the feature with the first largest correlation coefficient and thunderstorm weather into the first predetermined regression model to obtain the first significance and the first first simulation.
  • the screening module is also used to: input the i+1th largest feature of the correlation coefficient into the i+1th said predetermined regression model to obtain the i+1th said significance sum
  • the device further includes: a judging module for judging the i-th first goodness of fit and the first Whether the relationship between the i+1 first goodness-of-fits satisfies the seventh predetermined condition, if not, the screening module is caused to perform step A4, and if so, the screening module is caused to perform step A8.
  • the target feature includes a non-linear target feature belonging to a non-linear type
  • the screening module selects the target feature from the multiple features of the multiple sets of data, it is further used to: input the multiple sets of data A non-linear feature screening model, wherein the non-linear feature screening model is used to use the multiple sets of data to calculate the fourth feature importance of each feature in the multiple features, and output the fourth feature importance Features that meet the fourth predetermined condition and belong to the non-linear type; remove the features whose importance of the fourth feature meets the fifth predetermined condition from the features output by the non-linear feature screening model to obtain preliminary non-linear features; For each of the multiple sets of data, features that have nothing to do with the preliminary non-linear characteristics are eliminated, and multiple sets of preliminary screening data are obtained; the multiple sets of preliminary screening data are continuously input into the non-linear feature screening model until they are filtered out The non-linear target feature.
  • the device further includes: a calculation module for calculating the second goodness of fit of the non-linear feature screening model this time;
  • the screening module continues to input the multiple sets of preliminary screening data into the non-linear feature screening model until the non-linear target feature is screened out, and is also used to: continue to input the multiple sets of preliminary screening data into the non-linear feature Feature screening model to obtain sub-step non-linear features; for each group of preliminary screening data of the multiple sets of preliminary screening data, remove features irrelevant to the sub-step non-linear features to obtain multiple sets of sub-step screening data; calculate this time The third goodness of fit of the non-linear feature screening model; determine whether the relationship between the second goodness of fit and the third goodness of fit meets the eighth predetermined condition; if so, the The second-step non-linear feature is determined as the non-linear target feature. If not, continue to input the multiple sets of sub-step screening data into the non-linear feature screening model until the non-linear target feature is screened out.
  • the screening module when inputting the multiple sets of data into the non-linear feature screening model, is further configured to: for each set of data in the multiple sets of data, pre-screening the multiple features using a predetermined rule, Obtain multiple sets of pre-processed data; input the multiple sets of pre-processed data into a non-linear feature screening model, where the non-linear feature screening model is used to use the multiple sets of pre-processed data to calculate features after pre-screening
  • the fourth feature importance of each feature is output, and the fourth feature importance satisfies the fourth predetermined condition and belongs to the non-linear type.
  • the embodiment of the present application also provides a thunderstorm weather forecasting device.
  • the thunderstorm weather forecasting device corresponds to the thunderstorm weather forecasting method provided in the above-mentioned embodiments.
  • the corresponding technical features and technical effects are no longer in this embodiment. For details, reference may be made to the above-mentioned embodiments for relevant points. specifically,
  • Fig. 4 schematically shows a block diagram of a thunderstorm weather forecasting device according to an embodiment of the present application.
  • the thunderstorm weather forecasting device 400 may include a second acquisition module 401, an input module 402, and a determination module 403, where:
  • the second obtaining module 401 is used to obtain the target feature of the current weather
  • the input module 402 is configured to input the target feature into the pre-trained thunderstorm weather prediction model, so that the thunderstorm weather prediction model outputs weather prediction results, wherein the thunderstorm weather prediction model is trained by the above-mentioned thunderstorm weather prediction model Method to get
  • the determining module 403 is configured to determine whether the future weather is thunderstorm weather according to the weather prediction result.
  • Fig. 5 schematically shows a block diagram of a computer device suitable for implementing a training method for a thunderstorm weather prediction model and/or a thunderstorm weather prediction method according to an embodiment of the present application.
  • the computer device 500 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server, or a cabinet server (including independent servers, or multiple Server cluster composed of servers) and so on.
  • the computer device 500 of this embodiment at least includes but is not limited to: a memory 501, a processor 502, and a network interface 503 that can be communicatively connected to each other through a system bus.
  • FIG. 5 only shows a computer device 500 with components 501-503, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the memory 501 includes at least one type of computer-readable storage medium.
  • the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM). ), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 501 may be an internal storage unit of the computer device 500, such as a hard disk or memory of the computer device 500.
  • the memory 501 may also be an external storage device of the computer device 500, such as a plug-in hard disk, a smart media card (SMC), and a secure digital (Secure Digital, SMC) equipped on the computer device 500. SD) card, flash card (Flash Card), etc.
  • the memory 501 may also include both an internal storage unit of the computer device 500 and an external storage device thereof.
  • the memory 501 is generally used to store the operating system and various application software installed in the computer device 500, such as the program code of the training method of a thunderstorm weather prediction model and/or the program code of the thunderstorm weather prediction method, etc.
  • the memory 501 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 502 is generally used to control the overall operation of the computer device 500. For example, control and processing related to data interaction or communication with the computer device 500 are performed.
  • the processor 502 is configured to run the program code of the training method of the thunderstorm weather prediction model and/or the program code of the thunderstorm weather prediction method stored in the memory 501.
  • the training method of the thunderstorm weather prediction model and/or the thunderstorm weather prediction method stored in the memory 501 may also be divided into one or more program modules, which are executed by one or more processors (this embodiment It is executed by the processor 502) to complete the application.
  • the network interface 503 may include a wireless network interface or a wired network interface, and the network interface 503 is generally used to establish a communication link between the computer device 500 and other computer devices.
  • the network interface 503 is used to connect the computer device 500 to an external terminal through a network, and to establish a data transmission channel and a communication link between the computer device 500 and the external terminal.
  • the network can be Intranet, Internet, Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network , 5G network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • the computer-readable storage medium may be non-volatile or volatile, including flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX). Memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory , Magnetic disks, optical disks, servers, App application malls, etc., on which computer programs are stored, and when the computer programs are executed by the processor, the steps of the training method of the thunderstorm weather prediction model and/or the steps of the thunderstorm weather prediction method are realized.
  • RAM random access memory
  • SRAM static random access memory
  • ROM read only memory
  • EEPROM electrically erasable programmable read only memory
  • PROM programmable read only memory
  • magnetic memory Magnetic disks, optical disks, servers, App application malls, etc.
  • modules or steps of the embodiments of the present application described above can be implemented by a general computing device, and they can be concentrated on a single computing device or distributed among multiple computing devices.
  • they can be implemented by the program code executable by the computing device, so that they can be stored in the storage device for execution by the computing device, and in some cases, they can be different from here
  • the steps shown or described are executed in the order of, or they are respectively fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module to achieve. In this way, the embodiments of the present application are not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Environmental & Geological Engineering (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Atmospheric Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Ecology (AREA)
  • Environmental Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请公开了一种雷雨天气预测模型的训练方法,包括:获取多组数据,其中,每组数据包括雷雨天气、所述雷雨天气的多个特征、以及所述雷雨天气及所述雷雨天气的多个特征的关联关系;从所述多组数据的多个特征中筛选出目标特征,其中,所述目标特征为第一特征重要度满足第一预定条件的特征;在所述多组数据的每组数据中,将与所述目标特征无关的特征剔除,形成多组训练数据;利用所述多组训练数据训练预定算法,得到雷雨天气预测模型。本申请还提供了一种雷雨天气预测方法、一种雷雨天气预测模型的训练装置、一种雷雨天气预测装置、一种计算机设备和一种计算机可读存储介质。

Description

雷雨天气预测模型的训练方法及雷雨天气预测方法
本申请要求于2020年02月25日提交中国专利局、申请号为202010116671.X,发明名称为“雷雨天气预测模型的训练方法及雷雨天气预测方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能和计算机技术领域,具体涉及一种雷雨天气预测模型的训练方法、雷雨天气预测方法、装置、计算机设备及计算机可读存储介质。
背景技术
随着气象技术的发展,对天气情况进行预测的方式随之发展。通常,在对天气情况进行预测试时,可以根据卫星、雷达等大型设备采集到的天气数据进行预测,如将采集到的天气数据输入到预先训练完成的天气预测模型中。为了保证天气预测的准确性,通常需要保证天气预测模型的预测准确度,这就对天气预测模型的训练过程有较高的要求。
然而,发明人在研究本申请的过程中发现,现有技术中至少存在如下缺陷:在现有技术中,在训练天气模型时,通常只是对天气因子池中的天气因子进行简单筛选,保留的冗余因子仍会过多,对于模型训练来说,由于无法得到核心因子,导致无法训练出有效的天气预测模型。
发明内容
本申请的目的在于提供一种雷雨天气预测模型的训练方法、雷雨天气预测方、装置、计算机设备及计算机可读存储介质,能够解决上述现有技术中存在的缺陷。
本申请的一个方面提供了一种雷雨天气预测模型的训练方法,包括:获取多组数据,其中,每组数据包括雷雨天气、上述雷雨天气的多个特征、以及上述雷雨天气及上述雷雨天气的多个特征的关联关系;从上述多组数据的多个特征中筛选出目标特征,其中,上述目标特征为第一特征重要度满足第一预定条件的特征;在上述多组数据的每组数据中,将与上述目标特征无关的特征剔除,形成多组训练数据;利用上述多组训练数据训练预定算法,得到雷雨天气预测模型。
本申请的另一个方面提供了一种雷雨天气预测方法,包括:获取当前天气的目标特征;将上述目标特征输入预先训练完成的雷雨天气预测模型,以使上述雷雨天气预测模型输出天气预测结果;根据上述天气预测结果判断未来天气是否为雷雨天气,其中,上述雷雨天气预测模型通过以下的方法得到:获取多组数据,其中,每组数据包括雷雨天气、上述雷雨天气的多个特征、以及上述雷雨天气及上述雷雨天气的多个特征的关联关系;从上述多组数据的多个特征中筛选出目标特征,其中,上述目标特征为第一特征重要度满足第一预定条件的特征;在上述多组数据的每组数据中,将与上述目标特征无关的特征剔除,形成多组训练数据;利用上述多组训练数据训练预定算法,得到雷雨天气预测模型。
本申请的再一个方面提供了一种雷雨天气预测模型的训练装置,包括:第一获取模块,用于获取多组数据,其中,每组数据包括雷雨天气、上述雷雨天气的多个特征、以及上述雷雨天气及上述雷雨天气的多个特征的关联关系;筛选模块,用于从上述多组数据的多个特征中筛选出目标特征,其中,上述目标特征为第一特征重要度满足第一预定条件的特征;剔除模块,用于在上述多组数据的每组数据中,将与上述目标特征无关的特征剔除,形成多组训练数据;训练模块,用于利用上述多组训练数据训练预定算法,得到雷雨天气预测模型。
本申请的又一个方面提供了一种雷雨天气预测装置,包括:第二获取模块,用于获取 当前天气的目标特征;输入模块,用于将上述目标特征输入预先训练完成的雷雨天气预测模型,以使上述雷雨天气预测模型输出天气预测结果;判定模块,用于根据上述天气预测结果判断未来天气是否为雷雨天气,其中,上述雷雨天气预测模型通过以下的方法得到:获取多组数据,其中,每组数据包括雷雨天气、上述雷雨天气的多个特征、以及上述雷雨天气及上述雷雨天气的多个特征的关联关系;从上述多组数据的多个特征中筛选出目标特征,其中,上述目标特征为第一特征重要度满足第一预定条件的特征;在上述多组数据的每组数据中,将与上述目标特征无关的特征剔除,形成多组训练数据;利用上述多组训练数据训练预定算法,得到雷雨天气预测模型。
本申请的又一个方面提供了一种计算机设备,该计算机设备包括:存储器、处理器以及存储在上述存储器上并可在上述处理器上运行的计算机程序,上述处理器执行上述计算机程序时实现以下的雷雨天气预测模型的训练方法的步骤:获取多组数据,其中,每组数据包括雷雨天气、上述雷雨天气的多个特征、以及上述雷雨天气及上述雷雨天气的多个特征的关联关系;从上述多组数据的多个特征中筛选出目标特征,其中,上述目标特征为第一特征重要度满足第一预定条件的特征;在上述多组数据的每组数据中,将与上述目标特征无关的特征剔除,形成多组训练数据;利用上述多组训练数据训练预定算法,得到雷雨天气预测模型。
本申请的又一个方面提供了一种计算机设备,该计算机设备包括:存储器、处理器以及存储在上述存储器上并可在上述处理器上运行的计算机程序,上述处理器执行上述计算机程序时实现以下的雷雨天气预测方法的步骤:获取当前天气的目标特征;将上述目标特征输入预先训练完成的雷雨天气预测模型,以使上述雷雨天气预测模型输出天气预测结果;根据上述天气预测结果判断未来天气是否为雷雨天气,其中,上述雷雨天气预测模型通过以下的方法得到:获取多组数据,其中,每组数据包括雷雨天气、上述雷雨天气的多个特征、以及上述雷雨天气及上述雷雨天气的多个特征的关联关系;从上述多组数据的多个特征中筛选出目标特征,其中,上述目标特征为第一特征重要度满足第一预定条件的特征;在上述多组数据的每组数据中,将与上述目标特征无关的特征剔除,形成多组训练数据;利用上述多组训练数据训练预定算法,得到雷雨天气预测模型。
本申请的又一个方面提供了一种计算机可读存储介质,其上存储有计算机程序,上述计算机程序被处理器执行时实现以下的雷雨天气预测模型的训练方法的步骤:获取多组数据,其中,每组数据包括雷雨天气、上述雷雨天气的多个特征、以及上述雷雨天气及上述雷雨天气的多个特征的关联关系;从上述多组数据的多个特征中筛选出目标特征,其中,上述目标特征为第一特征重要度满足第一预定条件的特征;在上述多组数据的每组数据中,将与上述目标特征无关的特征剔除,形成多组训练数据;利用上述多组训练数据训练预定算法,得到雷雨天气预测模型。
本申请的又一个方面提供了一种计算机可读存储介质,其上存储有计算机程序,上述计算机程序被处理器执行时实现以下的雷雨天气预测方法的步骤:获取当前天气的目标特征;将上述目标特征输入预先训练完成的雷雨天气预测模型,以使上述雷雨天气预测模型输出天气预测结果;根据上述天气预测结果判断未来天气是否为雷雨天气,其中,上述雷雨天气预测模型通过以下的方法得到:获取多组数据,其中,每组数据包括雷雨天气、上述雷雨天气的多个特征、以及上述雷雨天气及上述雷雨天气的多个特征的关联关系;从上述多组数据的多个特征中筛选出目标特征,其中,上述目标特征为第一特征重要度满足第一预定条件的特征;在上述多组数据的每组数据中,将与上述目标特征无关的特征剔除,形成多组训练数据;利用上述多组训练数据训练预定算法,得到雷雨天气预测模型
本申请提供的雷雨天气预测模型的训练方法,筛选出第一特征重要度满足第一预定条件的目标特征,并剔除与目标特征无关的特征,得到多组训练数据,进而利用多组训练数据训练出雷雨天气预测模型。由于这些训练数据中已经不包括冗余特征,且这些训练数据中包括的特征的量级也显著降低,因此足以克服现有技术中的缺陷,达到提升训练出的雷 雨天气预测模型的准确度的目的。
进一步,本申请在现有特征工程特征筛选的基础上,考虑了两部分特征:线性类型的特征和非线性类型的特征,并且考虑了线性类型的特征和非线性类型的特征的独立作用,并在此基础上,考虑多项特征间的协同作用,加入非线性的影响提升模型的表达能力。
对于线性类型的特征,先通过N次抽和依次输出非线性特征筛选模型筛选出N组初步线性特征,再从N组初步线性特征中统计出次步线性特征,然后通过改进的预定回归模型,选出对输出y响最大的x之后,逐步添加新的因子,并保证新的因子不会导致原来的因子显著性变化,直到模型的拟合优度不再提升,通过两层筛选,不同的筛选过程针对性不同,从而能够很好的提高特征筛选过程的可解释性及最后的线性目标特征的有效性。
对于非线性类型的特征,通过预筛选可以确保特征量级的可控性,便于输入到非线性特征筛选模型中,然后根据特征的第四特征重要度,将每一轮训练后第四特征重要度满足第五预定条件的特征代入下一轮的训练中,逐步删除重要度较低的特征,从而保证输入到非线性特征筛选模型中的特征数量是以递减形式进入的,在提升模型准确性的同时又达到了非线性目标特征筛选的目的。模型的表达能力既依赖于现有的单一特征,同时特征之间的协同表达也能够在一定程度上拟合模型的效果,提升结果的准确性。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示意性示出了根据本申请实施例的雷雨天气预测模型的训练方法的流程图;
图2示意性示出了根据本申请实施例的雷雨天气预测方法的流程图;
图3示意性示出了根据本申请实施例的雷雨天气预测模型的训练装置的框图;
图4示意性示出了根据本申请实施例的雷雨天气预测装置的框图;
图5示意性示出了根据本申请实施例的适于实现雷雨天气预测模型的训练方法和/或雷雨天气预测方法的计算机设备的框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
为了更好的了解本申请达到的有益技术效果,在介绍本申请的具体方案之前,先行介绍与本申请相关的现有技术。在现有技术中,在进行模型训练之前,也会进行特征筛选,由于现有的存储技术及运算能力的提高,特征指标的大量存在使得模型的构建更为完善,其结果的准确性得到保证,但大量的冗余特征会导致模型的训练极为耗时,且容易产生过拟合现象。目前特征筛选的方法主要借助基于统计学的特征筛选方式,比如基于空值率、方差、相关性、共线性等形式,这类方法能够在一定程度上起到辨别特征的作用,但在特征池量级巨大的情况下,仅仅依靠这种方式难以有效降低特征的量级,一方面客观筛选的方式对于统计理论的依赖性过大,这就降低了特征在筛选过程中的可解释性,另一方面仅从单一角度进行特征筛选会使得模型不具备良好的可扩展性,缺少多特征拮抗性对因变量 的影响。因此利用统计方法的特征选仍不能得到核心的特征,导致无法拟合出有效的归因模型。
而本申请提供的雷雨天气预测模型的训练方法,筛选出第一特征重要度满足第一预定条件的目标特征,并剔除与目标特征无关的特征,得到多组训练数据,进而利用多组训练数据训练出雷雨天气预测模型。由于这些训练数据中已经不包括冗余特征,且这些训练数据中包括的特征的量级也显著降低,因此足以克服现有技术中的缺陷,达到提升训练出的雷雨天气预测模型的准确度的目的。
进一步,本申请在现有特征工程特征筛选的基础上,考虑了两部分特征:线性类型的特征和非线性类型的特征,并且考虑了线性类型的特征和非线性类型的特征的独立作用,并在此基础上,考虑多项特征间的协同作用,加入非线性的影响提升模型的表达能力。
对于线性类型的特征,先通过N次抽和依次输出非线性特征筛选模型筛选出N组初步线性特征,再从N组初步线性特征中统计出次步线性特征,然后通过改进的预定回归模型,选出对输出y响最大的x之后,逐步添加新的因子,并保证新的因子不会导致原来的因子显著性变化,直到模型的拟合优度不再提升,通过两层筛选,不同的筛选过程针对性不同,从而能够很好的提高特征筛选过程的可解释性及最后的线性目标特征的有效性。
对于非线性类型的特征,通过预筛选可以确保特征量级的可控性,便于输入到非线性特征筛选模型中,然后根据特征的第四特征重要度,将每一轮训练后第四特征重要度满足第五预定条件的特征代入下一轮的训练中,逐步删除重要度较低的特征,从而保证输入到非线性特征筛选模型中的特征数量是以递减形式进入的,在提升模型准确性的同时又达到了非线性目标特征筛选的目的。模型的表达能力既依赖于现有的单一特征,同时特征之间的协同表达也能够在一定程度上拟合模型的效果,提升结果的准确性。
图1示意性示出了根据本申请实施例的雷雨天气预测模型的训练方法的流程图。
如图1所示,该雷雨天气预测模型的训练方法可以包括步骤S1~步骤S4,其中:
步骤S1,获取多组数据,其中,每组数据包括雷雨天气、雷雨天气的多个特征、以及雷雨天气及雷雨天气的多个特征的关联关系。
本实施例中,每组数据均为历史上某一雷天气日对应的数据,且每组数据均包括输出y和输入x,即,雷雨天气称为输出y,雷雨天气的多个特征称为输入x,且y和x之间的存在关联关系,即雷雨天气与多个特征之间存在关联关系。其中,雷雨天气的多个特征可以为:温度、气压、降雨量、湿度、空气密度及风量等等。
例如,存在4组数据,第一组数据对应3月15日的数据,包括:雷雨天气、3月15日雷雨天气的多个特征、及二者的关联关系;第二组数据对应3月18日的数据,包括:雷雨天气、3月18日雷雨天气的多个特征、及二者的关联关系;第三组数据对应5月7日的数据,包括:雷雨天气、5月7日雷雨天气的多个特征、及二者的关联关系;第三组数据对应6月24日的数据,包括:雷雨天气、6月24日雷雨天气的多个特征、及二者的关联关系。
步骤S2,从多组数据的多个特征中筛选出目标特征,其中,目标特征为第一特征重要度满足第一预定条件的特征。
本实施例的目的在于利用目标特征训练出雷雨天气模型,从而克服现有技术的缺陷。因此,需要从多个特征中筛选出第一特征重要度满足第一预定条件的特征,作为目标特征。其中,每个特征对应一个第一特征重要度,第一特征重要度用于衡量特征与雷雨天气的关联紧密度。可选地,第一特征重要度可以为每一个特征与雷雨天气的相关系数,第一预定条件可以为第一特征重要度排列在预定位置之前的特征。
可选地,步骤S2可以包括步骤S21和/或步骤S22,其中:
步骤S21,利用多组数据,从多个特征中筛选出属于线性类型的线性目标特征;和/或
步骤S22,利用多组数据,从多个特征中筛选出属于非线性类型的非线性目标特征。
其中,多个特征中可能包括线性类型的特征,也可能包括非线性类型的特征,且线性 类型的特征也可能同时属于非线性类型。本实施例中,在多个特征中只存在属于线性类型的特征时,将线性目标特征确定为目标特征;在多个特征中只存在属于非线性类型的特征时,将非线性目标特征确定为目标特征;在多个特征中即存在属于线性类型的特征又存在属于非线性类型的特征时,将线性目标特征和非线性目标特征确定为目标特征。
需要说明的是,预先并不知道哪些特征是属于线性类型的,哪些特征是属于非线性类型的,因此,为了确保在存在线性类型的特征时,能够准确筛选出线性目标特征,步骤S2可以包括步骤S21~步骤S24,其中,目标特征可以包括属于线性类型的线性目标特征,具体地:
步骤S21,对多组数据执行N次抽样,得到N个数据集,其中,每个数据集包括多组数据中的一组或多组。
其中,抽样的方式不做限定,如,可以利用Bootstrapping算法的思想进行抽样。例如,N=3,第一数据集包括:第一组数据、第三组数据和第四组数据;第二数据集包括:第二组数据、第三组数据和第四组数据;第三数据集包括:第一组数据、第二组数据和第四组数据。
步骤S22,针对N个数据集中的每个数据集,将数据集输入线性特征筛选模型,其中,线性特征筛选模型用于针对数据集的多个特征,计算每个特征的第二特征重要度,并输出第二特征重要度满足第二预定条件且属于线性类型的特征,称为一组初步线性特征。
线性特征筛选模型只输出线性类型的特征,且对于线性类型的每一特征,计算特征的第二特征重要度,其中,由于模型在输出特征时,每个特征前面会携带该特征的系数,系数用于表征特征的重要程度,系数越大,重要程度越高,因此,本实施例中第二特征重要度即为每个特征前面的系数。然后输出属于线性类型且第二特征重要度满足第二预定条件的特征,如,输出第二特征重要度不为0且属于线性类型的特征。
可选地,线性特征筛选模型为引入了L1正则项为Lasso模型,该模型用于输出线性类型的特征,并自动计算特征的第二特征重要度,然后将属于线性类型的特征的第二特征重要度作为该特征的系数形式输出,如,0.8湿度,则0.8即为湿度的第二特征重要度。再如,第二预定条件为:第二特征重要度不为0,则针对每一数据集,Lasso模型会输出系数不为0且属于线性类型的特征。
步骤S23,获取线性特征筛选模型输出的N组初步线性特征。
由于依次向线性特征筛选模型中输入的N组数据集,且每一组数据集对应子域初步线性特征,因此,线性筛选模型会依次输出N组初步线性特征,且每组初步线性特征中包含的特征类型可能不同。
例如,结合上述示例,第一组初步线性特征包括:温度、气压和湿度;第二组初步线性特征包括:温度、气压、降雨量和风量;第三组初步线性特征包括:温度和湿度。
步骤S24,利用N组初步线性特征筛选出线性目标特征。
可选地,步骤S24可以包括步骤S241~步骤S243,其中:
步骤S241,对N组初步线性特征中所有特征进行统计,得到每个特征的第三特征重要度;
步骤S242,从N组初步线性特征中,筛选出第三特征重要度满足第三预定条件的特征,称为次步线性特征;
步骤S243,利用次步线性特征筛选出线性目标特征。
本实施例中,第三特征重要度可以为N组初步线性特征中每个特征出现的次数,第三预定条件可以为次数超过预定次数阈值。
如,结合上述示例,温度出现的次数为3,气压出现的次数为2、湿度出现的次数为2、降雨量出现的次数为1、以及风量出现的次数为1。若第三预定条件为次数超过1次,则次步线性特征为温度、气压和湿度。
进一步,可以根据次步线性特征筛选出线性目标特征。如,直接将次步线性特征作为 线性目标特征。
但是,由于L1正则项的损失函数是不可导的,因此,通过引入了L1正则项的Lasso模型直接确定出线性目标特征会存在一定的不稳定性。为了解决上述缺陷,本实施例可以将次步线性特征输入预定回归模型,并通过预定回归模型来确定最终的线性目标特征,进而提高线性目标特征确定的准确性。具体地,步骤S243可以包括步骤A1~步骤A8,其中:
步骤A1:计算次步线性特征中所有特征的特征数量M和每个特征与雷雨天气的相关系数;
步骤A2:将相关系数第1大的特征作为线性目标特征的一个特征;
步骤A3:将相关系数第1大的特征和雷雨天气输入第1个预定回归模型,得到第1个显著性;
步骤A4:判断i是否大于M,当i不大于M时,执行步骤A5,当i大于M时,执行步骤A8,其中,i的初始值为1;
步骤A5:将相关系数第i+1大的特征输入第i+1个预定回归模型,得到第i+1个显著性,其中,第i+1个预定回归模型通过将前i个特征和雷雨天气输入第i个预定回归模型得到;
步骤A6:判断第i个显著性和i+1个显著性之间的关系是否满足第六预定条件,若是,则执行步骤A7,若否,则执行步骤A4;
步骤A7:将相关系数第i+1大的特征确定为线性目标特征的一个特征;
步骤A8:将从次步线性特征中确定出所有特征作为线性目标特征。
本实施例是一个循环操作,具体地,先从次步线性特征中挑选出与y的相关系数最大的特征(称为相关系数第1大的特征),作为线性目标特征的一个特征,并将相关系数第1大的特征和输出y输入预定回归模型(此时称为第1个预定回归模型)中,得到一个的显著性,称为第1个显著性,并且,将相关系数第1大的特征输入第1个预定回归模型后得到的模型称为第2个预定回归模型。进一步,从次步线性特征中挑选出与y的相关系数第2大的特征(称为相关系数第2大的特征),并将相关系数第2大的特征输入第2个预定回归模型中,得到一个显著性,称为第2个显著性。然后判断第1个显著性和第2个显著性之间的关系是否满足第六预定条件(如,判断两个显著性之间的差异是否大于0.0001),若是,则表明相关系数第2大的特征会对相关系数第1大的特征有显著的音响,此时继续判断相关系数第3大的特征的显著性与第1个显著之间的关系;若否,则将相关系数第2大的特征也作为线性目标特征中一个特征,并继续判断相关系数第3大的特征的显著性与第2个显著之间的关系,依次类推,直至判断完次步线性特征中所有的特征。
需要说明的是,显著性可以通过T统计量表征。
可选地,在次步线性特征中包括的特征数量很多时,若循环执行完所有特征显著性的判断逻辑会严重加大处理器的工作量,此时可以通过判断预定回归模型的拟合优度决定何时停止显著性的判断逻辑。具体如下:
步骤A3可以包括:将相关系数第1大的特征和雷雨天气输入第1个预定回归模型,得到第1个显著性和第1个第一拟合优度;
步骤A5可以包括:将相关系数第i+1大的特征输入第i+1个预定回归模型,得到第i+1个显著性和第i+1个第一拟合优度;
在步骤A7之后,且在步骤A8之前,雷雨天气预测模型的训练方法还可以包括:判断第i个第一拟合优度和第i+1个第一拟合优度之间的关系是否满足第七预定条件,若否,则执行步骤A4,若是,则执行步骤A8。
本实施例中,在尚未执行完所有特征的显著性的判断逻辑时,若是第i个第一拟合优度和第i+1个第一拟合优度之间的关系满足第七预定条件,则不再继续判断剩余特征显著性的判断逻辑,并将至此为止从次步线性特征中确定出的所有特征作为线性目标特征。例如,第i个第一拟合优度和第i+1个第一拟合优度之间的关系满足第七预定条件可以为: 第i个第一拟合优度和第i+1个第一拟合优度的差异小于0.0001。
其中,拟合优度可以通过R2确定,R又称为可决系数。
可选地,为了确保在存在非线性类型的特征时,能够准确筛选出非线性目标特征,步骤S2还可以包括步骤S21’~步骤S24’,其中,目标特征可以包括属于非线性类型的非线性目标特征,具体地:
步骤S21’,将多组数据输入非线性特征筛选模型,其中,非线性特征筛选模型用于利用多组数据,计算多个特征中每个特征的第四特征重要度,并输出第四特征重要度满足第四预定条件且属于非线性类型的特征。
非线性特征筛选模型只输出非线性类型的特征,且对于非线性类型的每一特征,计算特征的第四特征重要度,然后输出属于非线性类型且第四征重要度满足第四预定条件的特征,如,输出第四特征重要度不为0且属于非线性类型的特征。
可选地,非线性特征筛选模型例如为机器学习模型,如机器学习模型中的随机森林算法(Random Forest,简称为RF)或者梯度提升树(Gradient Boosting Decison Tree,简称为GBDT)。以随机森林算法为例,随机森林算法中构建的树,可以在每一个节点统计经过该节点特征分裂后,基尼系数的减少程度,通过随机生成多棵树,并随机选择特征,可以得到大数据条件下,某一个特征均对分类或回归纯度的提升,这个值就是贡献程度,也即第四特征重要度。其中,属于非线性类型的特征的第四特征重要度也可以作为该特征的系数形式输出,如,0.6空气密度,则0.6即为空气密度的第四特征重要度。再如,第四预定条件为:第四特征重要度不为0,则针对每一数据集,非线性筛选模型会输出系数不为0且属于非线性类型的特征。
步骤S22’,从非线性特征筛选模型输出的特征中剔除第四特征重要度满足第五预定条件的特征,得到初步非线性特征。
如,第五预定条件为:第四重要度最低的特征。则本实施例中,可以利用四特征重要度从大到小的顺序对非线性特征筛选模型输出的特征进行排序,然后提剔除排列在末尾的特征,得到初步非线性特征。
步骤S23’,针对多组数据的每组数据,剔除与初步非线性特征无关的特征,得到多组初步筛选数据。
剔除与初步非线性特征无关的特征,即为剔除除初步非线性特征之外的特征。
步骤S24’,将多组初步筛选数据继续输入非线性特征筛选模型,直至筛选出非线性目标特征。
可选地,在步骤S21’之后,该雷雨天气预测模型的训练方法还可以包括:计算本次非线性特征筛选模型的第二拟合优度。
步骤S24’可以包括步骤S241’~步骤S246’,其中:
步骤S241’,将多组初步筛选数据继续输入非线性特征筛选模型,得到次步非线性特征;
步骤S242’,针对多组初步筛选数据的每组初步筛选数据,剔除与次步非线性特征无关的特征,得到多组次步筛选数据;
步骤S243’,计算本次非线性特征筛选模型的第三拟合优度;
步骤S244’,判断第二拟合优度和第三拟合优度之间的关系是否满足第八预定条件;若是,则执行步骤S245’;若否,则执行步骤S246’。
步骤S245’,将次步非线性特征确定为非线性目标特征。
步骤S246’,继续将多组次步筛选数据输入非线性特征筛选模型,直至筛选出非线性目标特征。
本实施例也属于一个循环操作,具体地,先得到多组初步筛选数据,并计算第二拟合优度;然后得到多组次步筛选数据,并计算第三拟合优度,若是第二拟合优度和第三拟合优度之间的关系满足第八预定条件,则将次步非线性特征确定为非线性目标特征,否则, 继续将多组次步筛选数据输入非线性特征筛选模型,直至拟合度之间的关系满足第八预定条件。其中,第八预定条件例如为第二拟合优度对应的损失函数和第三拟合优度对应的损失函数之间的差异小于0.0001。
可选地,为了避免直接将多组数据输入非线性特征筛选模型中会导致处理任务在同一时间内过于繁重,从而引起其他方面的问题,如机器瘫痪,本实施例还可以先对多组进行预处理,然后将预处理后的数据输入非线性特征筛选模型中。具体如下:
步骤S21’可以包括步骤S211’和步骤S212’,其中:
步骤S211’,针对多组数据中的每组数据,利用预定规则对多个特征进行预筛选,得到多组预处理数据;
步骤S212’,将多组预处理数据输入非线性特征筛选模型,其中,非线性特征筛选模型用于利用多组预处理数据,计算进行预筛选后的特征中每个特征的第四特征重要度,并输出第四特征重要度满足第四预定条件且属于非线性类型的特征。
本实施例中,预处理可以是,针对每一组数据,计算每两个特征之间的距离,如欧式距离,若存在两个特征之间的距离大于预定阈值,则认为这两个特征之间的相关性很强,只需保留一个即可,此时可以继续计算两个特征中每一个特征与输出y雷雨天气的距离,并剔除与雷雨天气距离较小的这个特征。通过上述预处理,可以得到多组预处理数据。进一步将多组预处理数据输入非线性特征筛选模型,其中,此处的处理逻辑与直接将多组数据输入非线性特征筛选模型的处理逻辑一致,不再赘述。
步骤S3,在多组数据的每组数据中,将与目标特征无关的特征剔除,形成多组训练数据。
其中,在多个特征中只存在属于线性类型的特征时,目标特征只包括线性目标特征;在多个特征中只存在属于非线性类型的特征时,目标特征只包括非线性目标特征;在多个特征中即存在属于线性类型的特征又存在属于非线性类型的特征时,目标特征即包括线性目标特征又包括非线性目标特征。
本实施例中,针对每组数据,从该组数据的多个特征中,剔除除目标特征之外的特征。此时,执行完步骤S3之后的数据中,包括的特征为为对雷雨天气贡献度较大的特征。
步骤S4,利用多组训练数据训练预定算法,得到雷雨天气预测模型。
将多组训练数据作为训练集,训练预设算法,进而可以得到雷雨天气预测模型,其中,雷雨天气预测模型用于通过当前天气的特征预测未来天气是否为雷雨天气。预定算法例如为支持向量机(Support Vector Machine,简称为SVM)算法、自适应增强学习(Adaptive Boosting,简称为AdaBoost)算法、逻辑回归(Logistic Regression,简称为LR)算法或决策树(Decision Tree)算法。
图2示意性示出了根据本申请实施例的雷雨天气预测方法的流程图。
如图2所示,该雷雨天气预测方法可以包括步骤M1~步骤M3,其中:
步骤M1,获取当前天气的目标特征;
步骤M2,将目标特征输入预先训练完成的雷雨天气预测模型,以使雷雨天气预测模型输出天气预测结果。
其中,雷雨天气预测模型通过实施例一的方法得到。
步骤M3,根据天气预测结果判断未来天气是否为雷雨天气。
本实施例中,通过将当前天气的目标特征属于预先训练好的雷雨天气预测模型,由于雷雨天气预测模型训练过程严谨,训练结果准确,因此得到天气预测结果也较为可信。其中,天气预测结果可以为是雷雨天气或者不是雷雨天气,在天气预测结果为是雷雨天气时,表明预测的未来天气是雷雨天气,在天气预测结果为不是雷雨天气时,表明预测的未来天气不是雷雨天气。
本申请的实施例还提供了一种雷雨天气预测模型的训练装置,该雷雨天气预测模型的训练装置与上述实施例提供的雷雨天气预测模型的训练方法相对应,相应的技术特征和技术效果在本实施例中不再详述,相关之处可参考上述实施例。具体地,图3示意性示出了根据本申请实施例的雷雨天气预测模型的训练装置的框图。如图3所示,该雷雨天气预测模型的训练装置300可以包括第一获取模块301、筛选模块302、剔除模块303和训练模块304,其中:
第一获取模块301,用于获取多组数据,其中,每组数据包括雷雨天气、所述雷雨天气的多个特征、以及所述雷雨天气及所述雷雨天气的多个特征的关联关系;
筛选模块302,用于从所述多组数据的多个特征中筛选出目标特征,其中,所述目标特征为第一特征重要度满足第一预定条件的特征;
剔除模块303,用于在所述多组数据的每组数据中,将与所述目标特征无关的特征剔除,形成多组训练数据;
训练模块304,用于利用所述多组训练数据训练预定算法,得到雷雨天气预测模型。
可选地,筛选模块,还用于:利用所述多组数据,从所述多个特征中筛选出属于线性类型的线性目标特征;和/或利用所述多组数据,从所述多个特征中筛选出属于非线性类型的非线性目标特征。
可选地,所述目标特征包括属于线性类型的线性目标特征,筛选模块在从所述多组数据的多个特征中筛选出目标特征时,还用于:对所述多组数据执行N次抽样,得到N个数据集,其中,每个所述数据集包括所述多组数据中的一组或多组;针对所述N个数据集中的每个所述数据集,将所述数据集输入线性特征筛选模型,其中,所述线性特征筛选模型用于针对所述数据集的所述多个特征,计算每个特征的第二特征重要度,并输出所述第二特征重要度满足第二预定条件且属于所述线性类型的特征,称为一组初步线性特征;获取所述线性特征筛选模型输出的N组初步线性特征;利用所述N组初步线性特征筛选出所述线性目标特征。
可选地,筛选模块在利用所述N组初步线性特征筛选出所述线性目标特征时,还用于:对所述N组初步线性特征中所有特征进行统计,得到每个特征的第三特征重要度;从所述N组初步线性特征中,筛选出第三特征重要度满足第三预定条件的特征,称为次步线性特征;利用所述次步线性特征筛选出所述线性目标特征。
可选地,筛选模块在利用所述次步线性特征筛选出所述线性目标特征时,还用于:步骤A1:计算所述次步线性特征中所有特征的特征数量M和每个特征与所述雷雨天气的相关系数;步骤A2:将所述相关系数第1大的特征作为所述线性目标特征的一个特征;步骤A3:将所述相关系数第1大的特征和雷雨天气输入第1个预定回归模型,得到第1个显著性;步骤A4:判断i是否大于M,当i不大于M时,执行步骤A5,当i大于M时,执行步骤A8,其中,i的初始值为1;步骤A5:将所述相关系数第i+1大的特征输入第i+1个所述预定回归模型,得到第i+1个所述显著性,其中,第i+1个所述预定回归模型通过将前i个特征和雷雨天气输入第i个所述预定回归模型得到;步骤A6:判断第i个所述显著性和i+1个所述显著性之间的关系是否满足第六预定条件,若是,则执行步骤A7,若否,则执行步骤A4;步骤A7:将所述相关系数第i+1大的特征确定为所述线性目标特征的一个特征;步骤A8:将从所述次步线性特征中确定出所有特征作为所述线性目标特征。
可选地,筛选模块在执行步骤A3时,还用于:将所述相关系数第1大的特征和雷雨天气输入第1个预定回归模型,得到第1个显著性和第1个第一拟合优度;筛选模块在步骤A5时,还用于:将所述相关系数第i+1大的特征输入第i+1个所述预定回归模型,得到第i+1个所述显著性和第i+1个所述第一拟合优度;在步骤A7之后,且在步骤A8之前,所述装置还包括:判断模块,用于判断第i个所述第一拟合优度和第i+1个所述第一拟合优度之间的关系是否满足第七预定条件,若否,则使筛选模块执行步骤A4,若是,则使筛选模块执行步骤A8。
可选地,所述目标特征包括属于非线性类型的非线性目标特征,筛选模块在从所述多组数据的多个特征中筛选出目标特征时,还用于:将所述多组数据输入非线性特征筛选模型,其中,所述非线性特征筛选模型用于利用所述多组数据,计算所述多个特征中每个特征的第四特征重要度,并输出所述第四特征重要度满足第四预定条件且属于所述非线性类型的特征;从所述非线性特征筛选模型输出的特征中剔除所述第四特征重要度满足第五预定条件的特征,得到初步非线性特征;针对所述多组数据的每组数据,剔除与所述初步非线性特征无关的特征,得到多组初步筛选数据;将所述多组初步筛选数据继续输入所述非线性特征筛选模型,直至筛选出所述非线性目标特征。
可选地,在将所述多组数据输入非线性特征筛选模型之后,所述装置还包括:计算模块,用于计算本次所述非线性特征筛选模型的第二拟合优度;
筛选模块在将所述多组初步筛选数据继续输入所述非线性特征筛选模型,直至筛选出所述非线性目标特征时,还用于:将所述多组初步筛选数据继续输入所述非线性特征筛选模型,得到次步非线性特征;针对所述多组初步筛选数据的每组初步筛选数据,剔除与所述次步非线性特征无关的特征,得到多组次步筛选数据;计算本次所述非线性特征筛选模型的第三拟合优度;判断所述第二拟合优度和所述第三拟合优度之间的关系是否满足第八预定条件;若是,则将所述次步非线性特征确定为所述非线性目标特征。若否,则继续将所述多组次步筛选数据输入所述非线性特征筛选模型,直至筛选出所述非线性目标特征。
可选地,筛选模块在将所述多组数据输入非线性特征筛选模型时,还用于:针对所述多组数据中的每组数据,利用预定规则对所述多个特征进行预筛选,得到多组预处理数据;将所述多组预处理数据输入非线性特征筛选模型,其中,所述非线性特征筛选模型用于利用所述多组预处理数据,计算进行预筛选后的特征中每个特征的所述第四特征重要度,并输出所述第四特征重要度满足所述第四预定条件且属于所述非线性类型的特征。
本申请的实施例还提供了一种雷雨天气预测方装置,该雷雨天气预测方装置与上述实施例提供的雷雨天气预测方方法相对应,相应的技术特征和技术效果在本实施例中不再详述,相关之处可参考上述实施例。具体地,
图4示意性示出了根据本申请实施例的雷雨天气预测装置的框图。如图4所示,该雷雨天气预测装置400可以包括第二获取模块401、输入模块402和判定模块403,其中:
第二获取模块401,用于获取当前天气的目标特征;
输入模块402,用于将所述目标特征输入预先训练完成的雷雨天气预测模型,以使所述雷雨天气预测模型输出天气预测结果,其中,所述雷雨天气预测模型通过上述雷雨天气预测模型的训练方法得到;
判定模块403,用于根据所述天气预测结果判断未来天气是否为雷雨天气。
图5示意性示出了根据本申请实施例的适于实现雷雨天气预测模型的训练方法和/或雷雨天气预测方法的计算机设备的框图。本实施例中,计算机设备500可以是执行程序的智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。如图5所示,本实施例的计算机设备500至少包括但不限于:可通过系统总线相互通信连接的存储器501、处理器502、网络接口503。需要指出的是,图5仅示出了具有组件501-503的计算机设备500,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
本实施例中,存储器501至少包括一种类型的计算机可读存储介质,可读存储介质包括包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中, 存储器501可以是计算机设备500的内部存储单元,例如该计算机设备500的硬盘或内存。在另一些实施例中,存储器501也可以是计算机设备500的外部存储设备,例如该计算机设备500上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,存储器501还可以既包括计算机设备500的内部存储单元也包括其外部存储设备。在本实施例中,存储器501通常用于存储安装于计算机设备500的操作系统和各类应用软件,例如雷雨天气预测模型的训练方法的程序代码和/或雷雨天气预测方法的程序代码等。此外,存储器501还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器502在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器502通常用于控制计算机设备500的总体操作。例如执行与计算机设备500进行数据交互或者通信相关的控制和处理等。本实施例中,处理器502用于运行存储器501中存储的雷雨天气预测模型的训练方法的程序代码和/或雷雨天气预测方法的程序代码。
在本实施例中,存储于存储器501中的雷雨天气预测模型的训练方法和/或雷雨天气预测方法还可以被分割为一个或者多个程序模块,并由一个或多个处理器(本实施例为处理器502)所执行,以完成本申请。
网络接口503可包括无线网络接口或有线网络接口,该网络接口503通常用于在计算机设备500与其他计算机设备之间建立通信链接。例如,网络接口503用于通过网络将计算机设备500与外部终端相连,在计算机设备500与外部终端之间的建立数据传输通道和通信链接等。网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,简称为GSM)、宽带码分多址(Wideband Code Division Multiple Access,简称为WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。
本实施例还提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等,其上存储有计算机程序,所述计算机程序被处理器执行时实现雷雨天气预测模型的训练方法的步骤和/或雷雨天气预测方法的步骤。
显然,本领域的技术人员应该明白,上述的本申请实施例的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请实施例不限制于任何特定的硬件和软件结合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种雷雨天气预测模型的训练方法,其中,包括:
    获取多组数据,其中,每组数据包括雷雨天气、所述雷雨天气的多个特征、以及所述雷雨天气及所述雷雨天气的多个特征的关联关系;
    从所述多组数据的多个特征中筛选出目标特征,其中,所述目标特征为第一特征重要度满足第一预定条件的特征;
    在所述多组数据的每组数据中,将与所述目标特征无关的特征剔除,形成多组训练数据;
    利用所述多组训练数据训练预定算法,得到雷雨天气预测模型。
  2. 根据权利要求1所述的雷雨天气预测模型的训练方法,其中,所述目标特征包括属于线性类型的线性目标特征,从所述多组数据的多个特征中筛选出目标特征,包括:
    对所述多组数据执行N次抽样,得到N个数据集,其中,每个所述数据集包括所述多组数据中的一组或多组;
    针对所述N个数据集中的每个所述数据集,将所述数据集输入线性特征筛选模型,其中,所述线性特征筛选模型用于针对所述数据集的所述多个特征,计算每个特征的第二特征重要度,并输出所述第二特征重要度满足第二预定条件且属于所述线性类型的特征,称为一组初步线性特征;
    获取所述线性特征筛选模型输出的N组初步线性特征;
    利用所述N组初步线性特征筛选出所述线性目标特征。
  3. 根据权利要求2所述的雷雨天气预测模型的训练方法,其中,利用所述N组初步线性特征筛选出所述线性目标特征,包括:
    对所述N组初步线性特征中所有特征进行统计,得到每个特征的第三特征重要度;
    从所述N组初步线性特征中,筛选出第三特征重要度满足第三预定条件的特征,称为次步线性特征;
    利用所述次步线性特征筛选出所述线性目标特征。
  4. 根据权利要求1所述的雷雨天气预测模型的训练方法,其中,所述目标特征包括属于非线性类型的非线性目标特征,从所述多组数据的多个特征中筛选出目标特征,包括:
    将所述多组数据输入非线性特征筛选模型,其中,所述非线性特征筛选模型用于利用所述多组数据,计算所述多个特征中每个特征的第四特征重要度,并输出所述第四特征重要度满足第四预定条件且属于所述非线性类型的特征;
    从所述非线性特征筛选模型输出的特征中剔除所述第四特征重要度满足第五预定条件的特征,得到初步非线性特征;
    针对所述多组数据的每组数据,剔除与所述初步非线性特征无关的特征,得到多组初步筛选数据;
    将所述多组初步筛选数据继续输入所述非线性特征筛选模型,直至筛选出所述非线性目标特征。
  5. 根据权利要求4所述的雷雨天气预测模型的训练方法,其中,将所述多组数据输入非线性特征筛选模型,包括:
    针对所述多组数据中的每组数据,利用预定规则对所述多个特征进行预筛选,得到多组预处理数据;
    将所述多组预处理数据输入非线性特征筛选模型,其中,所述非线性特征筛选模型用于利用所述多组预处理数据,计算进行预筛选后的特征中每个特征的所述第四特征重要度,并输出所述第四特征重要度满足所述第四预定条件且属于所述非线性类型的特征。
  6. 一种雷雨天气预测方法,其中,包括:
    获取当前天气的目标特征;
    将所述目标特征输入预先训练完成的雷雨天气预测模型,以使所述雷雨天气预测模型 输出天气预测结果;
    根据所述天气预测结果判断未来天气是否为雷雨天气,
    其中,所述雷雨天气预测模型通过以下雷雨天气预测模型的训练方法得到:
    获取多组数据,其中,每组数据包括雷雨天气、所述雷雨天气的多个特征、以及所述雷雨天气及所述雷雨天气的多个特征的关联关系;
    从所述多组数据的多个特征中筛选出目标特征,其中,所述目标特征为第一特征重要度满足第一预定条件的特征;
    在所述多组数据的每组数据中,将与所述目标特征无关的特征剔除,形成多组训练数据;
    利用所述多组训练数据训练预定算法,得到所述雷雨天气预测模型。
  7. 根据权利要求6所述的雷雨天气预测方法,其中,所述目标特征包括属于线性类型的线性目标特征,从所述多组数据的多个特征中筛选出目标特征,包括:
    对所述多组数据执行N次抽样,得到N个数据集,其中,每个所述数据集包括所述多组数据中的一组或多组;
    针对所述N个数据集中的每个所述数据集,将所述数据集输入线性特征筛选模型,其中,所述线性特征筛选模型用于针对所述数据集的所述多个特征,计算每个特征的第二特征重要度,并输出所述第二特征重要度满足第二预定条件且属于所述线性类型的特征,称为一组初步线性特征;
    获取所述线性特征筛选模型输出的N组初步线性特征;
    利用所述N组初步线性特征筛选出所述线性目标特征。
  8. 根据权利要求7所述的雷雨天气预测方法,其中,利用所述N组初步线性特征筛选出所述线性目标特征,包括:
    对所述N组初步线性特征中所有特征进行统计,得到每个特征的第三特征重要度;
    从所述N组初步线性特征中,筛选出第三特征重要度满足第三预定条件的特征,称为次步线性特征;
    利用所述次步线性特征筛选出所述线性目标特征。
  9. 根据权利要求6所述的雷雨天气预测方法,其中,所述目标特征包括属于非线性类型的非线性目标特征,从所述多组数据的多个特征中筛选出目标特征,包括:
    将所述多组数据输入非线性特征筛选模型,其中,所述非线性特征筛选模型用于利用所述多组数据,计算所述多个特征中每个特征的第四特征重要度,并输出所述第四特征重要度满足第四预定条件且属于所述非线性类型的特征;
    从所述非线性特征筛选模型输出的特征中剔除所述第四特征重要度满足第五预定条件的特征,得到初步非线性特征;
    针对所述多组数据的每组数据,剔除与所述初步非线性特征无关的特征,得到多组初步筛选数据;
    将所述多组初步筛选数据继续输入所述非线性特征筛选模型,直至筛选出所述非线性目标特征。
  10. 根据权利要求9所述的雷雨天气预测方法,其中,将所述多组数据输入非线性特征筛选模型,包括:
    针对所述多组数据中的每组数据,利用预定规则对所述多个特征进行预筛选,得到多组预处理数据;
    将所述多组预处理数据输入非线性特征筛选模型,其中,所述非线性特征筛选模型用于利用所述多组预处理数据,计算进行预筛选后的特征中每个特征的所述第四特征重要度,并输出所述第四特征重要度满足所述第四预定条件且属于所述非线性类型的特征。
  11. 一种雷雨天气预测模型的训练装置,其中,包括:
    第一获取模块,用于获取多组数据,其中,每组数据包括雷雨天气、所述雷雨天气的 多个特征、以及所述雷雨天气及所述雷雨天气的多个特征的关联关系;
    筛选模块,用于从所述多组数据的多个特征中筛选出目标特征,其中,所述目标特征为第一特征重要度满足第一预定条件的特征;
    剔除模块,用于在所述多组数据的每组数据中,将与所述目标特征无关的特征剔除,形成多组训练数据;
    训练模块,用于利用所述多组训练数据训练预定算法,得到雷雨天气预测模型。
  12. 一种雷雨天气预测装置,其中,包括:
    第二获取模块,用于获取当前天气的目标特征;
    输入模块,用于将所述目标特征输入预先训练完成的雷雨天气预测模型,以使所述雷雨天气预测模型输出天气预测结果;
    判断模块,用于根据所述天气预测结果判断未来天气是否为雷雨天气,
    其中,所述雷雨天气预测模型通过以下雷雨天气预测模型的训练方法得到:
    获取多组数据,其中,每组数据包括雷雨天气、所述雷雨天气的多个特征、以及所述雷雨天气及所述雷雨天气的多个特征的关联关系;
    从所述多组数据的多个特征中筛选出目标特征,其中,所述目标特征为第一特征重要度满足第一预定条件的特征;
    在所述多组数据的每组数据中,将与所述目标特征无关的特征剔除,形成多组训练数据;
    利用所述多组训练数据训练预定算法,得到所述雷雨天气预测模型。
  13. 一种计算机设备,所述计算机设备包括:存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如下的步骤:
    获取多组数据,其中,每组数据包括雷雨天气、所述雷雨天气的多个特征、以及所述雷雨天气及所述雷雨天气的多个特征的关联关系;
    从所述多组数据的多个特征中筛选出目标特征,其中,所述目标特征为第一特征重要度满足第一预定条件的特征;
    在所述多组数据的每组数据中,将与所述目标特征无关的特征剔除,形成多组训练数据;
    利用所述多组训练数据训练预定算法,得到雷雨天气预测模型。
  14. 根据权利要求13所述的计算机设备,其中,所述目标特征包括属于线性类型的线性目标特征,从所述多组数据的多个特征中筛选出目标特征,包括:
    对所述多组数据执行N次抽样,得到N个数据集,其中,每个所述数据集包括所述多组数据中的一组或多组;
    针对所述N个数据集中的每个所述数据集,将所述数据集输入线性特征筛选模型,其中,所述线性特征筛选模型用于针对所述数据集的所述多个特征,计算每个特征的第二特征重要度,并输出所述第二特征重要度满足第二预定条件且属于所述线性类型的特征,称为一组初步线性特征;
    获取所述线性特征筛选模型输出的N组初步线性特征;
    利用所述N组初步线性特征筛选出所述线性目标特征。
  15. 根据权利要求14所述的计算机设备,其中,利用所述N组初步线性特征筛选出所述线性目标特征,包括:
    对所述N组初步线性特征中所有特征进行统计,得到每个特征的第三特征重要度;
    从所述N组初步线性特征中,筛选出第三特征重要度满足第三预定条件的特征,称为次步线性特征;
    利用所述次步线性特征筛选出所述线性目标特征。
  16. 根据权利要求13所述的计算机设备,其中,所述目标特征包括属于非线性类型的 非线性目标特征,从所述多组数据的多个特征中筛选出目标特征,包括:
    将所述多组数据输入非线性特征筛选模型,其中,所述非线性特征筛选模型用于利用所述多组数据,计算所述多个特征中每个特征的第四特征重要度,并输出所述第四特征重要度满足第四预定条件且属于所述非线性类型的特征;
    从所述非线性特征筛选模型输出的特征中剔除所述第四特征重要度满足第五预定条件的特征,得到初步非线性特征;
    针对所述多组数据的每组数据,剔除与所述初步非线性特征无关的特征,得到多组初步筛选数据;
    将所述多组初步筛选数据继续输入所述非线性特征筛选模型,直至筛选出所述非线性目标特征。
  17. 根据权利要求16所述的计算机设备,其中,将所述多组数据输入非线性特征筛选模型,包括:
    针对所述多组数据中的每组数据,利用预定规则对所述多个特征进行预筛选,得到多组预处理数据;
    将所述多组预处理数据输入非线性特征筛选模型,其中,所述非线性特征筛选模型用于利用所述多组预处理数据,计算进行预筛选后的特征中每个特征的所述第四特征重要度,并输出所述第四特征重要度满足所述第四预定条件且属于所述非线性类型的特征。
  18. 一种计算机设备,所述计算机设备包括:存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如下的步骤:
    获取当前天气的目标特征;
    将所述目标特征输入预先训练完成的雷雨天气预测模型,以使所述雷雨天气预测模型输出天气预测结果;
    根据所述天气预测结果判断未来天气是否为雷雨天气,
    其中,所述雷雨天气预测模型通过以下雷雨天气预测模型的训练方法得到:
    获取多组数据,其中,每组数据包括雷雨天气、所述雷雨天气的多个特征、以及所述雷雨天气及所述雷雨天气的多个特征的关联关系;
    从所述多组数据的多个特征中筛选出目标特征,其中,所述目标特征为第一特征重要度满足第一预定条件的特征;
    在所述多组数据的每组数据中,将与所述目标特征无关的特征剔除,形成多组训练数据;
    利用所述多组训练数据训练预定算法,得到所述雷雨天气预测模型。
  19. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时用于实现如下的步骤:
    获取多组数据,其中,每组数据包括雷雨天气、所述雷雨天气的多个特征、以及所述雷雨天气及所述雷雨天气的多个特征的关联关系;
    从所述多组数据的多个特征中筛选出目标特征,其中,所述目标特征为第一特征重要度满足第一预定条件的特征;
    在所述多组数据的每组数据中,将与所述目标特征无关的特征剔除,形成多组训练数据;
    利用所述多组训练数据训练预定算法,得到雷雨天气预测模型。
  20. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时用于实现如下的步骤:
    获取当前天气的目标特征;
    将所述目标特征输入预先训练完成的雷雨天气预测模型,以使所述雷雨天气预测模型输出天气预测结果;
    根据所述天气预测结果判断未来天气是否为雷雨天气,
    其中,所述雷雨天气预测模型通过以下雷雨天气预测模型的训练方法得到:
    获取多组数据,其中,每组数据包括雷雨天气、所述雷雨天气的多个特征、以及所述雷雨天气及所述雷雨天气的多个特征的关联关系;
    从所述多组数据的多个特征中筛选出目标特征,其中,所述目标特征为第一特征重要度满足第一预定条件的特征;
    在所述多组数据的每组数据中,将与所述目标特征无关的特征剔除,形成多组训练数据;
    利用所述多组训练数据训练预定算法,得到所述雷雨天气预测模型。
PCT/CN2020/117578 2020-02-25 2020-09-25 雷雨天气预测模型的训练方法及雷雨天气预测方法 WO2021169271A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010116671.XA CN111368887B (zh) 2020-02-25 2020-02-25 雷雨天气预测模型的训练方法及雷雨天气预测方法
CN202010116671.X 2020-02-25

Publications (1)

Publication Number Publication Date
WO2021169271A1 true WO2021169271A1 (zh) 2021-09-02

Family

ID=71208274

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117578 WO2021169271A1 (zh) 2020-02-25 2020-09-25 雷雨天气预测模型的训练方法及雷雨天气预测方法

Country Status (2)

Country Link
CN (1) CN111368887B (zh)
WO (1) WO2021169271A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368887B (zh) * 2020-02-25 2024-05-03 平安科技(深圳)有限公司 雷雨天气预测模型的训练方法及雷雨天气预测方法
CN111915068A (zh) * 2020-07-17 2020-11-10 同济大学 一种基于集成学习的道路能见度短临预测方法
CN111832828B (zh) * 2020-07-17 2023-12-19 国家卫星气象中心(国家空间天气监测预警中心) 基于风云四号气象卫星的智能降水预测方法
CN113985145A (zh) * 2021-09-13 2022-01-28 广东电网有限责任公司广州供电局 一种雷电预警方法、预警装置和计算机可读存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298389A (zh) * 2019-06-11 2019-10-01 上海冰鉴信息科技有限公司 训练模型时的多轮循环特征选择方法和装置
CN111368887A (zh) * 2020-02-25 2020-07-03 平安科技(深圳)有限公司 雷雨天气预测模型的训练方法及雷雨天气预测方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109031472B (zh) * 2017-06-09 2021-08-03 阿里巴巴集团控股有限公司 一种用于气象预测的数据处理方法及装置
JP2019095323A (ja) * 2017-11-24 2019-06-20 株式会社日立製作所 気象予測装置
CN109472283B (zh) * 2018-09-13 2022-02-01 中国科学院计算机网络信息中心 一种基于多重增量回归树模型的危险天气预测方法和装置
CN110428015A (zh) * 2019-08-07 2019-11-08 北京嘉和海森健康科技有限公司 一种模型的训练方法及相关设备

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298389A (zh) * 2019-06-11 2019-10-01 上海冰鉴信息科技有限公司 训练模型时的多轮循环特征选择方法和装置
CN111368887A (zh) * 2020-02-25 2020-07-03 平安科技(深圳)有限公司 雷雨天气预测模型的训练方法及雷雨天气预测方法

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHEN, LEI: "Application of GPS/PWV Data in the Forecasting of Thunderstorm", ATMOSPHERIC SCIENCE RESEARCH AND APPLICATION, 15 May 2007 (2007-05-15), pages 1 - 11, XP055840528, [retrieved on 20210913] *
GIJBEN MORNÉ, DYSON LIESL L., LOOTS MATTHEUS T.: "A statistical scheme to forecast the daily lightning threat over southern Africa using the Unified Model", ATMOSPHERIC RESEARCH., ELSEVIER, AMSTERDAM., NL, vol. 194, 1 September 2017 (2017-09-01), NL, pages 78 - 88, XP055840491, ISSN: 0169-8095, DOI: 10.1016/j.atmosres.2017.04.022 *
HU, DI: "A Selection Method of Forecast Factors on Summer Heavy Thunderstorm", JOURNAL OF METEOROLOGY AND ENVIRONMENT, vol. 22, no. 3, 1 June 2006 (2006-06-01), XP055840530 *
KONG, DEBING, SHANG KEZHENG,WANG SHIGONG: "Probability Forecast Method of Thunderstorm in East Region of Northwest China Based on Stepwise Regression Analysis", JOURNAL OF ARID METEOROLOGY, vol. 34, no. 1, 1 February 2016 (2016-02-01), pages 181 - 187, XP055840519, ISSN: 1006-7639 *

Also Published As

Publication number Publication date
CN111368887B (zh) 2024-05-03
CN111368887A (zh) 2020-07-03

Similar Documents

Publication Publication Date Title
WO2021169271A1 (zh) 雷雨天气预测模型的训练方法及雷雨天气预测方法
WO2022047658A1 (zh) 日志异常检测系统
CN107506865B (zh) 一种基于lssvm优化的负荷预测方法及系统
CN108764273A (zh) 一种数据处理的方法、装置、终端设备及存储介质
CN112215442B (zh) 电力系统短期负荷预测方法、系统、设备及介质
CN109753987B (zh) 文件识别方法和特征提取方法
CN114492279B (zh) 一种模拟集成电路的参数优化方法及系统
CN110826617A (zh) 态势要素分类方法及其模型的训练方法、装置及服务器
US20160314484A1 (en) Method and system for mining churn factor causing user churn for network application
CN114090402A (zh) 一种基于孤立森林的用户异常访问行为检测方法
CN113449919B (zh) 一种基于特征和趋势感知的用电量预测方法及系统
CN112363896A (zh) 日志异常检测系统
CN111738477A (zh) 基于深层特征组合的电网新能源消纳能力预测方法
CN111310918B (zh) 一种数据处理方法、装置、计算机设备及存储介质
CN108446562B (zh) 基于禁忌与人工蜂群双向优化支持向量机的入侵检测方法
CN110569883A (zh) 基于Kohonen网络聚类和ReliefF特征选择的空气质量指数预测方法
CN107392311A (zh) 序列切分的方法和装置
CN112508299A (zh) 一种电力负荷预测方法、装置、终端设备及存储介质
CN110796485A (zh) 一种提高预测模型的预测精度的方法及装置
CN113139570A (zh) 一种基于最优混合估值的大坝安全监测数据补全方法
CN114021425B (zh) 电力系统运行数据建模与特征选择方法、装置、电子设备和存储介质
CN115185804A (zh) 服务器性能预测方法、系统、终端及存储介质
CN116992274B (zh) 基于改进主成分回归模型的短期风速预测方法及系统
CN111783688B (zh) 一种基于卷积神经网络的遥感图像场景分类方法
CN111984514A (zh) 基于Prophet-bLSTM-DTW的日志异常检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20921800

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20921800

Country of ref document: EP

Kind code of ref document: A1