WO2019037260A1 - 预测模型建立装置、方法及计算机可读存储介质 - Google Patents

预测模型建立装置、方法及计算机可读存储介质 Download PDF

Info

Publication number
WO2019037260A1
WO2019037260A1 PCT/CN2017/108801 CN2017108801W WO2019037260A1 WO 2019037260 A1 WO2019037260 A1 WO 2019037260A1 CN 2017108801 W CN2017108801 W CN 2017108801W WO 2019037260 A1 WO2019037260 A1 WO 2019037260A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
prediction
search index
model
threshold
Prior art date
Application number
PCT/CN2017/108801
Other languages
English (en)
French (fr)
Inventor
徐亮
李弦
商瑾
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019037260A1 publication Critical patent/WO2019037260A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements

Definitions

  • the present application relates to the field of terminal technologies, and in particular, to a predictive model establishing apparatus, method, and computer readable storage medium.
  • the technology for predicting data through machine learning is applied in more and more fields, such as the prediction of the click rate of advertisements, the prediction of the incidence of certain epidemic diseases, etc., which are commonly used in the current methods.
  • the historical data of the project to be predicted constitutes a time series, and the autocorrelation regression time series model (ARIMA) is used to predict based on the characteristics of the time series itself, but the model only uses the trend characteristics of the project to be predicted for prediction, and cannot be combined with the prediction external
  • ARIMA autocorrelation regression time series model
  • the present application provides a prediction model establishing apparatus, method and computer readable storage medium, the main purpose of which is to establish a prediction model by combining the exogenous features and autoregressive time features of the item to be tested, and improve the prediction accuracy of the prediction model.
  • the present application provides a predictive model establishing apparatus, the apparatus comprising: a memory, a processor, and a predictive model building program stored on the memory and operable on the processor, the predictive model establishing The program implements the following steps when executed by the processor:
  • the actual observed value of the item to be tested in the target time unit is used as a prediction target, and the plurality of prediction features and the prediction target are used as one prediction sample;
  • E. Acquire a plurality of prediction samples of the plurality of time units according to the steps of A to D, and input the plurality of prediction samples into a preset regression model to perform training to determine model parameters, and determine the model parameters.
  • the preset regression model is used as a prediction model of the item to be tested.
  • the step of pre-processing the external feature includes:
  • the step of pre-processing the external feature further includes:
  • the missing values are supplemented by the search index subjected to the extreme value optimization processing.
  • the step of performing feature screening on the feature set in the feature set according to the preset rule to obtain the predicted feature includes:
  • the feature is selected as a predictive feature by selecting a feature that the pearson correlation coefficient is less than or equal to the preset correlation coefficient.
  • the item to be tested is an influenza prediction item
  • the exogenous features include a search index, a weather feature, and an environmental feature
  • the preset regression model is a LASSO regression model.
  • the present application further provides a prediction model establishing method, the method comprising:
  • the actual observed value of the item to be tested in the target time unit is used as a prediction target, and the plurality of prediction features and the prediction target are used as one prediction sample;
  • E. Acquire a plurality of prediction samples of the plurality of time units according to the steps of A to D, and input the plurality of prediction samples into a preset regression model to perform training to determine model parameters, and determine the model parameters.
  • the preset regression model is used as a prediction model of the item to be tested.
  • the step of pre-processing the external feature includes:
  • the step of pre-processing the external feature further includes:
  • the missing values are supplemented by the search index subjected to the extreme value optimization processing.
  • the step of performing feature screening on the feature set in the feature set according to the preset rule to obtain the predicted feature includes:
  • the feature is selected as a predictive feature by selecting a feature that the pearson correlation coefficient is less than or equal to the preset correlation coefficient.
  • the present application further provides a computer readable storage medium having a prediction model establishing program stored thereon, the predictive model establishing program being executable by one or more processors, To achieve the following steps:
  • the actual observed value of the item to be tested in the target time unit is used as a prediction target, and the plurality of prediction features and the prediction target are used as one prediction sample;
  • E. Acquire a plurality of prediction samples of the plurality of time units according to the steps of A to D, and input the plurality of prediction samples into a preset regression model to perform training to determine model parameters, and determine the model parameters.
  • the preset regression model is used as a prediction model of the item to be tested.
  • the prediction model establishing apparatus, method and computer readable storage medium acquire exogenous features of one or more time units before the target time unit of the item to be tested, and autoregressive time characteristics before the target time unit Preprocessing and normalization of the external features and autoregressive time features are performed to obtain a normalized feature set, and the features in the feature set are filtered according to a preset rule to obtain a predicted feature, and the predicted feature corresponds to the target time unit.
  • the actual observations constitute a prediction sample, and multiple prediction samples of a plurality of time units are acquired according to the above process, and the plurality of prediction samples are input into a preset regression model for training to determine model parameters, and the model parameters are determined.
  • the regression model is used as the prediction model.
  • the scheme of the invention combines the exogenous features of the project to be tested and the autoregressive time features to form a feature set, and selects the qualified features from the feature set as the predictive feature pair regression model. Training to generate predictive models, avoiding the singularity of sample features Improve the prediction accuracy of the prediction model.
  • FIG. 1 is a schematic diagram of a preferred embodiment of a prediction model establishing apparatus of the present application
  • FIG. 2 is a schematic diagram of a program module of a prediction model establishing program in an embodiment of a prediction model establishing apparatus according to the present application;
  • FIG. 3 is a flowchart of a first embodiment of a method for establishing a prediction model according to the present application.
  • the application provides a prediction model establishing device.
  • FIG. 1 a schematic diagram of a preferred embodiment of a prediction model establishing apparatus of the present application is shown.
  • the predictive model establishing device may be a PC (Personal Computer), or may be a smart phone, a tablet computer, an e-book reader, and an MP3 (Moving Picture Experts Group Audio Layer III) compression standard. Audio level 3) Player, MP4 (Moving Picture Experts Group Audio Layer IV) player, portable computer and other portable terminal devices with display functions.
  • PC Personal Computer
  • MP3 Motion Picture Experts Group Audio Layer III
  • MP4 Motion Picture Experts Group Audio Layer IV
  • the predictive model building device includes a memory 11, a processor 12, a communication bus 13, and a network interface 14.
  • the memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (for example, an SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like.
  • the memory 11 may in some embodiments be an internal storage unit of the predictive model building device, such as a hard disk of the predictive model building device.
  • the memory 11 may also be an external storage device of the predictive model building device in other embodiments, such as a plug-in hard disk equipped with a predictive model building device, a smart memory card (SMC), and a secure digital (Secure Digital, SD) card, flash card, etc.
  • the memory 11 may also include both an internal storage unit of the predictive model building device and an external storage device.
  • the memory 11 can be used not only for storing application software installed in the predictive model building device and various types of data, such as codes for predicting model building programs, but also for temporarily storing data that has been output or will be output.
  • the processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data processing chip for running program code or processing stored in the memory 11. Data, such as executing a predictive model building program, and the like.
  • CPU Central Processing Unit
  • controller microcontroller
  • microprocessor or other data processing chip for running program code or processing stored in the memory 11.
  • Data such as executing a predictive model building program, and the like.
  • Communication bus 13 is used to implement connection communication between these components.
  • the network interface 14 can optionally include a standard wired interface, a wireless interface (such as a WI-FI interface), and is typically used to establish a communication connection between the device and other electronic devices.
  • a standard wired interface such as a WI-FI interface
  • Figure 1 shows only the predictive model building device with components 11-14 and the predictive model building program, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
  • the device may further include a user interface
  • the user interface may include a display
  • an input unit such as a keyboard
  • the optional user interface may further include a standard wired interface and a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch sensor, or the like.
  • the display may also be suitably referred to as a display screen or display unit for displaying information processed in the predictive model building device and a user interface for displaying the visualization.
  • the device may also include a touch sensor.
  • the area where the user performs the touch operation is called a touch area.
  • the touch sensor described herein may be a resistive touch sensor, a capacitive touch sensor, or the like.
  • the touch sensor includes not only a contact type touch sensor but also a proximity type touch sensor or the like.
  • the touch sensor may be a single sensor or a plurality of sensors arranged, for example, in an array.
  • the area of the display of the device may be the same as or different from the area of the touch sensor.
  • a display is stacked with the touch sensor to form a touch display. The device detects a user-triggered touch operation based on a touch screen display.
  • the device may further include a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like.
  • sensors such as light sensors, motion sensors, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein if the device is a mobile terminal, the ambient light sensor may adjust the brightness of the display screen according to the brightness of the ambient light, and the proximity sensor may move when the mobile terminal moves to the ear. , turn off the display and / or backlight.
  • the mobile terminal can also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, and the like, and details are not described herein again.
  • a predictive model establishing program is stored in the memory 11; when the processor 12 executes the predictive model establishing program stored in the memory 11, the following steps are implemented:
  • the actual observed value of the item to be tested in the target time unit is used as a prediction target, and the plurality of prediction features and the prediction target are used as one prediction sample;
  • E. Acquire a plurality of prediction samples of the plurality of time units according to the steps of A to D, and input the plurality of prediction samples into a preset regression model to perform training to determine model parameters, and determine the model parameters.
  • the preset regression model is used as a prediction model of the item to be tested.
  • the scheme of the present embodiment will be described with the influenza prediction item as the item to be tested. Since the historical data, that is, the data in the past time period, is used in training the model, when the feature values are collected from the historical data as test samples, the time unit in each test sample is also in the past. A time period. In the present embodiment, the week is taken as a time unit. Assume that the past 100 weeks of exogenous characteristics and the percentage of influenza-like cases are historical data, wherein the percentage of influenza-like cases is the total number of influenza-like cases in sentinel hospitals in a certain area/the total number of outpatient visits in outpatients in a certain area.
  • the process of obtaining a prediction sample from the above historical data is as follows: a week is determined from the above historical data as a target time unit, for example, the 100th week of the past 100 weeks, that is, the week closest to the current time point. As the target time unit, the actual observation value of the percentage of influenza-like cases in the 100th week is taken as the prediction target. Collected from multiple weeks of data before week 100 Set features as predictive features.
  • the exogenous features include a search index, weather features, and environmental characteristics that have a certain degree of influence on the percentage of influenza-like cases. External features.
  • the search index is: in a time unit before the target time unit, preferably the last time unit of the target time unit, the search engine user searches for the frequency of the relevant keyword on the search engine, and the search engine generally uses the frequency.
  • Recording for example, Baidu's Baidu Index.
  • the relevant keywords are keywords that are pre-set by the user to have a causal relationship with the prediction of the flu, such as "nasal congestion”, “sneezing", “antiviral oral liquid”, “scorpion pain” and the like. Words are not listed here. Users can set multiple keywords according to their needs and extract the corresponding search index from keywords on some search websites to form a search index set.
  • Weather characteristics include, but are not limited to, the following characteristics: the number of sunny days per week, the number of cloudy days per week, the number of cloudy days per week, the weekly rainfall, the weekly air volume, the average weekly temperature in the week, the maximum difference in the mid-week temperatures, and the middle of the week.
  • the average weekly intermediate temperature is the average of the intermediate temperature of seven days a week, and the intermediate temperature is the middle value of the temperature range within one day.
  • the weather feature of the week before the week of the 100th week, or the weather feature of the week before the 100th week may be taken, that is, the number of the historical time units may be plural.
  • the environmental characteristics are the mean values of SO2, NO2, PM10, O3, CO, and PM2.5 in the historical time unit, respectively.
  • the predicted target for week 100 its autoregressive time characteristics may include, but are not limited to, the following characteristics: percentage of influenza-like cases in the previous week, percentage of influenza-like cases in the last two weeks, and influenza in the last three weeks. Percentage of cases, percentage of influenza-like cases in the past three weeks, percentage variance of influenza-like cases in the past three weeks, and percentage of influenza-like cases in the same period last year.
  • these features are preprocessed and normalized to remove the anomalous features of the above features, and avoid some abnormalities caused by the training of these abnormal eigenvalues on the prediction model. Influence, and thus improve the prediction accuracy of the prediction model.
  • the pre-processing may include an extreme value optimization process and/or a missing value supplementation process.
  • Quantile Q3 and interquartile range also known as interquartile range
  • the search index of the first threshold is converted to the first threshold.
  • the first threshold may be taken as Q1-k*IQR
  • the second threshold may be taken as Q3+k*IQR, where k ⁇ 0.
  • k 1.5.
  • the missing index value may be added to the search index subjected to the extreme value optimization processing according to the kNN (k-Nearest Neighbor) algorithm. Specifically, in obtaining the search finger After the number, it is detected whether there is a corresponding index value in the search index, and if so, the search index has no missing value time unit among the plurality of time units adjacent to the target time unit, and the search index of the time unit is used. The average or median or mode of the search index of the corresponding keyword replaces the missing data.
  • kNN k-Nearest Neighbor
  • the search index for the keyword “antiviral oral solution” in the search index for the 99th week is missing, while the search index for the keyword “antiviral oral solution” in the search index for the 98th week, the 97th week, and the 95th week is not. If you are missing, you can use the average or median or mode of the search index for the three-week keyword "antiviral oral solution” instead of the search index for the "antiviral oral solution” missing in the search index for the 99th week.
  • the missing values may be supplemented in other ways, such as the mean method, the average of the search indices of other keywords except the missing values, and the missing values in the target unit are replaced by the obtained average values. .
  • the autoregressive time feature and the exogenous features subjected to the above preprocessing are normalized, and the eigenvalues with different dimensions are converted into dimensionless eigenvalues conforming to the normal distribution, so as to avoid serious distortion of the features.
  • the accuracy of the prediction results is low.
  • the features in the feature set are filtered.
  • the features are filtered using a pearson correlation coefficient (Pearson correlation coefficient). Calculating a pearson correlation coefficient of each feature value in the feature set; and selecting, as the prediction feature, a feature that the pearson correlation coefficient is less than or equal to a preset correlation coefficient.
  • the preset correlation coefficient is preferably taken to be 0.2. It can be understood that in other embodiments, other feature screening methods may also be used to filter the feature set.
  • the remaining features after feature screening are used as prediction features, and the actual observation value of the percentage of influenza-like cases in the target time unit is used as a prediction target, and the above-mentioned prediction features and prediction targets constitute a prediction sample.
  • the prediction samples corresponding to the plurality of prediction time units are acquired from the historical data, for example, the 99th week, the 98th week, the 97th week, etc. are respectively taken as the target time units, and their prediction samples are respectively acquired, and The obtained predicted samples are input into a preset regression model for model training, and the model parameters of the model are obtained.
  • the model may be trained by k-fold cross-validation, and the plurality of predictions are performed. One sample in the sample is used as a test set, and the remaining samples are used as training sets to train the model.
  • the preset regression model may be a LASSO regression model, a ridge regression model, or the like. In an embodiment, a LASSO regression model is preferably employed. A regression model that determines the model parameters is used as a prediction model for the item to be tested.
  • the scheme of the present application is described by taking the prediction of the percentage of influenza-like cases as an example.
  • the solution proposed by the present application is not limited to this, and can be applied to prediction of other items, for example, weather prediction, prediction of advertisement click rate, etc., corresponding to different prediction items, and needs to collect changes that reflect the changes of the item.
  • the data is used as an exogenous feature, and the external features are processed according to the above scheme. Then, the regression model is trained to generate the prediction model based on the autoregressive time feature of the project itself in the time dimension.
  • the prediction model establishing apparatus proposed in this embodiment acquires an exogenous feature in one or more time units before the target time unit of the item to be tested, and an autoregressive time before the target time unit Feature, exogenous features and autoregressive time features are preprocessed and normalized to obtain a normalized feature set, and the features in the feature set are screened according to preset rules to obtain predicted features, the predicted features and target time units.
  • Corresponding actual observations constitute a prediction sample, and multiple prediction samples of a plurality of time units are acquired according to the above process, and the plurality of prediction samples are input into a preset regression model for training to determine model parameters, and the model parameters are determined.
  • the preset regression model is used as the prediction model.
  • the scheme of the invention combines the exogenous features of the project to be tested with the autoregressive time features and performs preprocessing to form a feature set.
  • the feature sets are selected to select the qualified features as the predicted features for regression.
  • the model is trained to generate a predictive model, which avoids the singularity of the sample features and improves the prediction accuracy of the predictive model.
  • the predictive model building program may also be divided into one or more modules, one or more modules being stored in the memory 11 and being processed by one or more processors (this embodiment) Illustrated by processor 12) to accomplish the present application, a module as referred to herein refers to a series of computer program instructions that are capable of performing a particular function.
  • FIG. 2 it is a schematic diagram of a program module of a prediction model establishing program in an embodiment of a prediction model establishing apparatus of the present application.
  • the prediction model establishing program may be divided into an obtaining module 10, a processing module 20, and a screening module.
  • Forming module 40 and training module 50 wherein:
  • the obtaining module 10 is configured to acquire an exogenous feature of the item to be tested in one or more time units before the target time unit, and an autoregressive time feature before the target time unit;
  • the processing module 20 is configured to preprocess the exogenous feature, and normalize the autoregressive time feature and the exogenous feature subjected to the preprocessing to obtain a normalized feature set;
  • the screening module 30 is configured to perform feature screening on features in the feature set according to a preset rule to obtain a predicted feature.
  • the forming module 40 is configured to use the actual observation value of the item to be tested in the target time unit as a prediction target, and use the plurality of prediction features and the prediction target as one prediction sample;
  • the training module 50 is configured to separately acquire a plurality of prediction samples of the plurality of time units, input the plurality of prediction samples into a preset regression model for training to determine model parameters, and determine the preset regression model after the model parameters are determined. As a prediction model of the item to be tested.
  • the present application also provides a prediction model establishment method.
  • FIG. 3 it is a flowchart of the first embodiment of the prediction model establishing method of the present application. The method can be performed by a device that can be implemented by software and/or hardware.
  • the prediction model establishing method includes:
  • Step S10 acquiring exogenous features of the item to be tested in one or more time units before the target time unit, and autoregressive time characteristics before the target time unit;
  • Step S20 preprocessing the exogenous feature, and normalizing the autoregressive time feature and the exogenous feature subjected to the preprocessing to obtain a normalized feature set;
  • Step S30 Perform feature filtering on features in the feature set according to a preset rule to obtain a pre- Measuring feature
  • Step S40 using the actual observation value of the item to be tested in the target time unit as a prediction target, and using the plurality of prediction features and the prediction target as one prediction sample;
  • Step S50 according to the step S10 to the step S40, respectively acquiring a plurality of prediction samples of the plurality of time units, inputting the plurality of prediction samples into the preset regression model for training to determine the model parameters, and determining the model parameters.
  • the preset regression model is used as a prediction model of the item to be tested.
  • the scheme of the present embodiment will be described with the influenza prediction item as the item to be tested. Since the historical data, that is, the data in the past time period, is used in training the model, when the feature values are collected from the historical data as test samples, the time unit in each test sample is also in the past. A time period. In the present embodiment, the week is taken as a time unit. Assume that the past 100 weeks of exogenous characteristics and the percentage of influenza-like cases are historical data, wherein the percentage of influenza-like cases is the total number of influenza-like cases in sentinel hospitals in a certain area/the total number of outpatient visits in outpatients in a certain area.
  • a week is determined from the above historical data as a target time unit, for example, the 100th week of the past 100 weeks, that is, the week closest to the current time point.
  • the target time unit the actual observation value of the percentage of influenza-like cases in the 100th week is taken as the prediction target.
  • Features were collected from multiple weeks of data prior to week 100 as predictive features.
  • the exogenous features include a search index, weather features, and environmental characteristics that have a certain degree of influence on the percentage of influenza-like cases. External features.
  • the search index is: in a time unit before the target time unit, preferably the last time unit of the target time unit, the search engine user searches for the frequency of the relevant keyword on the search engine, and the search engine generally uses the frequency.
  • Recording for example, Baidu's Baidu Index.
  • the relevant keywords are keywords that are pre-set by the user to have a causal relationship with the prediction of the flu, such as "nasal congestion”, “sneezing", “antiviral oral liquid”, “scorpion pain” and the like. Words are not listed here. Users can set multiple keywords according to their needs and extract the corresponding search index from keywords on some search websites to form a search index set.
  • Weather characteristics include, but are not limited to, the following characteristics: the number of sunny days per week, the number of cloudy days per week, the number of cloudy days per week, the weekly rainfall, the weekly air volume, the average weekly temperature in the week, the maximum difference in the mid-week temperatures, and the middle of the week.
  • the average weekly intermediate temperature is the average of the intermediate temperature of seven days a week, and the intermediate temperature is the middle value of the temperature range within one day.
  • the weather feature of the week before the week of the 100th week, or the weather feature of the week before the 100th week may be taken, that is, the number of the historical time units may be plural.
  • the environmental characteristics are the mean values of SO2, NO2, PM10, O3, CO, and PM2.5 in the historical time unit, respectively.
  • the predicted target for week 100 its autoregressive time characteristics may include, but are not limited to, the following characteristics: percentage of influenza-like cases in the previous week, and influenza-like illness in the last two weeks Percentage of cases, percentage of influenza-like cases in the last three weeks, mean percentage of influenza-like cases in the past three weeks, percentage variance of influenza-like cases in the past three weeks, and percentage of influenza-like cases in the same period last year.
  • these features are preprocessed and normalized to remove the anomalous features of the above features, and avoid some abnormalities caused by the training of these abnormal eigenvalues on the prediction model. Influence, and thus improve the prediction accuracy of the prediction model.
  • the pre-processing may include an extreme value optimization process and/or a missing value supplementation process.
  • Quantile Q3 and interquartile range also known as interquartile range
  • the search index of the first threshold is converted to the first threshold.
  • the first threshold may be taken as Q1-k*IQR
  • the second threshold may be taken as Q3+k*IQR, where k ⁇ 0.
  • k 1.5.
  • the missing value may be supplemented to the search index subjected to the extreme value optimization processing according to the neighboring kNN algorithm. Specifically, after obtaining the search index, detecting whether there is a corresponding index value in the search index, if any, acquiring a time unit in which the search index has no missing value in the plurality of time units adjacent to the target time unit, The missing data is replaced by the average or median or mode of the search index of the corresponding keyword in the search index of these time units.
  • the search index for the keyword “antiviral oral solution” in the search index for the 99th week is missing, while the search index for the keyword “antiviral oral solution” in the search index for the 98th week, the 97th week, and the 95th week is not. If you are missing, you can use the average or median or mode of the search index for the three-week keyword "antiviral oral solution” instead of the search index for the "antiviral oral solution” missing in the search index for the 99th week.
  • the missing values may be supplemented in other ways, such as the mean method, the average of the search indices of other keywords except the missing values, and the missing values in the target unit are replaced by the obtained average values. .
  • the autoregressive time feature and the exogenous features subjected to the above preprocessing are normalized, and the eigenvalues with different dimensions are converted into dimensionless eigenvalues conforming to the normal distribution, so as to avoid serious distortion of the features.
  • the accuracy of the prediction results is low.
  • the features in the feature set are filtered.
  • the features are filtered using a pearson correlation coefficient. Calculating a pearson correlation coefficient of each feature value in the feature set; and selecting, as the prediction feature, a feature that the pearson correlation coefficient is less than or equal to a preset correlation coefficient.
  • the preset correlation coefficient is preferably taken to be 0.2. It can be understood that in other embodiments, other feature screening methods may also be used to filter the feature set.
  • the remaining features after feature screening are used as prediction features, and the actual observation value of the percentage of influenza-like cases in the target time unit is used as a prediction target, and the above-mentioned prediction features and prediction targets constitute a prediction sample.
  • the prediction samples corresponding to the plurality of prediction time units are acquired from the historical data, for example, the 99th week, the 98th week, the 97th week, etc. are respectively taken as the target time units, and their prediction samples are respectively acquired, and The obtained predicted samples are input into a preset regression model for model training, and the model parameters of the model are obtained.
  • the model may be trained by k-fold cross-validation, and the plurality of predictions are performed. One sample in the sample is used as a test set, and the remaining samples are used as training sets to train the model.
  • the preset regression model may be a LASSO regression model, a ridge regression model, or the like. In an embodiment, a LASSO regression model is preferably employed. A regression model that determines the model parameters is used as a prediction model for the item to be tested.
  • the scheme of the present application is described by taking the prediction of the percentage of influenza-like cases as an example.
  • the solution proposed by the present application is not limited to this, and can be applied to prediction of other items, for example, weather prediction, prediction of advertisement click rate, etc., corresponding to different prediction items, and needs to collect changes that reflect the changes of the item.
  • the data is used as an exogenous feature, and the external features are processed according to the above scheme.
  • the regression model is trained to generate the prediction model based on the autoregressive time feature of the project itself in the time dimension.
  • the predictive model can be used to predict the project to be tested over a period of time in the future.
  • the prediction model establishing method proposed in this embodiment obtains the exogenous features of the test item in one or more time units before the target time unit, and the autoregressive time characteristics before the target time unit, the exogenous feature and the autoregressive
  • the temporal features are preprocessed and normalized to obtain a normalized feature set, and the features in the feature set are filtered according to a preset rule to obtain a predicted feature, and the predicted feature and the actual observed value corresponding to the target time unit constitute a prediction.
  • Samples according to the above process, acquiring a plurality of prediction samples of a plurality of time units, inputting the plurality of prediction samples into a preset regression model for training to determine model parameters, and using the preset regression model after determining the model parameters as a prediction model,
  • the scheme of the invention combines the exogenous features of the project to be tested and the autoregressive time features to form a feature set, and selects the qualified features from the feature set as the prediction features to train the regression model to generate a prediction model to avoid The singularity of the sample features improves the predictive precision of the predictive model Degree.
  • the embodiment of the present application further provides a computer readable storage medium, where the predictive model establishing program is stored on the computer readable storage medium, and the predictive model establishing program may be executed by one or more processors to implement the following step:
  • the actual observed value of the item to be tested in the target time unit is used as a prediction target, and the plurality of prediction features and the prediction target are used as one prediction sample;
  • the missing values are supplemented by the search index subjected to the extreme value optimization processing.
  • the feature is selected as a predictive feature by selecting a feature that the pearson correlation coefficient is less than or equal to the preset correlation coefficient.
  • the technical solution of the present application which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM as described above). , a disk, an optical disk, including a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the methods described in the various embodiments of the present application.
  • a terminal device which may be a mobile phone, a computer, a server, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Mathematical Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Finance (AREA)
  • Human Resources & Organizations (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Accounting & Taxation (AREA)
  • Computational Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Algebra (AREA)
  • Tourism & Hospitality (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种预测模型建立装置、方法及计算机可读介质,该装置包括:存储器、处理器及存储在存储器上并可在处理器上运行的预测模型建立程序,该程序被处理器执行时实现如下步骤:获取待测项目在目标时间单元之前的一个或多个时间单元内的外源特征,以及在目标时间单元前的自回归时间特征;对上述特征进行预处理和归一化处理,以获取归一化的特征集;按照预设规则对特征进行特征筛选获取预测特征;将待测项目的实际观测值作为预测目标,将预测特征和预测目标作为预测样本;按照以上步骤获取多个预测样本,将预测样本输入预设回归模型进行训练生成预测模型。提高了预测模型的预测准确率。

Description

预测模型建立装置、方法及计算机可读存储介质
优先权申明
本申请基于巴黎公约申明享有2017年08月20日递交的申请号为201710715445.1、名称为“预测模型建立装置、方法及计算机可读存储介质”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本申请涉及终端技术领域,尤其涉及一种预测模型建立装置、方法及计算机可读存储介质。
背景技术
目前,通过机器学习对数据进行预测的技术在越来越多的领域得到应用,例如广告点击率的预测、某种流行性疾病的发病率的预测等等,目前普遍采用的方式时,采集这些待预测项目的历史数据构成时间序列,基于这个时间序列本身的特征建立自相关回归时间序列模型(ARIMA)进行预测,但该模型只用到待预测项目自身的趋势特征进行预测,无法结合预测外部特征使用,导致预测准确性不高。
发明内容
本申请提供一种预测模型建立装置、方法及计算机可读存储介质,其主要目的在于结合待测项目的外源特征与自回归时间特征建立预测模型,提高预测模型的预测精准度。
为实现上述目的,本申请提供一种预测模型建立装置,该装置包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的预测模型建立程序,所述预测模型建立程序被所述处理器执行时实现如下步骤:
A、获取待测项目在目标时间单元之前的一个或多个时间单元内的外源特征,以及在目标时间单元前的自回归时间特征;
B、对所述外源特征进行预处理,并对所述自回归时间特征和经过所述预处理的外源特征进行归一化处理,以获取归一化的特征集;
C、按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征;
D、将所述待测项目在所述目标时间单元内的实际观测值作为预测目标,将所述多个预测特征和所述预测目标作为一个预测样本;
E、按照所述A至D的步骤分别获取多个时间单元的多个预测样本,将所述多个预测样本输入到预设回归模型中进行训练以确定模型参数,将确定模型参数后的所述预设回归模型作为所述待测项目的预测模型。
可选地,若所述外源特征包括所述待测项目对应的搜索指数集,所述预处理为极值优化处理,则所述对所述外源特征进行预处理的步骤包括:
获取所述搜索指数集的第一四分位数、第三四分位数和四分差;
根据所述第一四分位数、所述第三四分位数和所述四分差确定搜索指数的第一阈值和第二阈值,所述第一阈值小于所述第二阈值;
将所述搜索指数集中大于所述第二阈值的搜索指数转换为所述第二阈值,将所述搜索指数集中小于所述第一阈值的搜索指数转换为所述第一阈值
可选地,所述对所述外源特征进行预处理的步骤还包括:
根据邻近kNN算法,对经过极值优化处理的搜索指数进行缺失值补充。
可选地,所述按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征的步骤包括:
计算所述特征集中的各个特征值的pearson相关系数;
将所述特征集中选择pearson相关系数小于或者等于预设相关系数的特征,作为预测特征。
可选地,所述待测项目为流感预测项目,所述外源特征包括搜索指数、天气特征和环境特征,所述预设回归模型为LASSO回归模型。
此外,为实现上述目的,本申请还提供一种预测模型建立方法,该方法包括:
A、获取待测项目在目标时间单元之前的一个或多个时间单元内的外源特征,以及在目标时间单元前的自回归时间特征;
B、对所述外源特征进行预处理,并对所述自回归时间特征和经过所述预处理的外源特征进行归一化处理,以获取归一化的特征集;
C、按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征;
D、将所述待测项目在所述目标时间单元内的实际观测值作为预测目标,将所述多个预测特征和所述预测目标作为一个预测样本;
E、按照所述A至D的步骤分别获取多个时间单元的多个预测样本,将所述多个预测样本输入到预设回归模型中进行训练以确定模型参数,将确定模型参数后的所述预设回归模型作为所述待测项目的预测模型。
可选地,若所述外源特征包括所述待测项目对应的搜索指数集,所述预处理为极值优化处理,则所述对所述外源特征进行预处理的步骤包括:
获取所述搜索指数集的第一四分位数、第三四分位数和四分差;
根据所述第一四分位数、所述第三四分位数和所述四分差确定搜索指数的第一阈值和第二阈值,所述第一阈值小于所述第二阈值;
将所述搜索指数集中大于所述第二阈值的搜索指数转换为所述第二阈值,将所述搜索指数集中小于所述第一阈值的搜索指数转换为所述第一阈值。
可选地,所述对所述外源特征进行预处理的步骤还包括:
根据邻近kNN算法,对经过极值优化处理的搜索指数进行缺失值补充。
可选地,所述按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征的步骤包括:
计算所述特征集中的各个特征值的pearson相关系数;
将所述特征集中选择pearson相关系数小于或者等于预设相关系数的特征,作为预测特征。
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有预测模型建立程序,所述预测模型建立程序可被一个或多个处理器执行,以实现如下步骤:
A、获取待测项目在目标时间单元之前的一个或多个时间单元内的外源特征,以及在目标时间单元前的自回归时间特征;
B、对所述外源特征进行预处理,并对所述自回归时间特征和经过所述预处理的外源特征进行归一化处理,以获取归一化的特征集;
C、按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征;
D、将所述待测项目在所述目标时间单元内的实际观测值作为预测目标,将所述多个预测特征和所述预测目标作为一个预测样本;
E、按照所述A至D的步骤分别获取多个时间单元的多个预测样本,将所述多个预测样本输入到预设回归模型中进行训练以确定模型参数,将确定模型参数后的所述预设回归模型作为所述待测项目的预测模型。
本申请提出的预测模型建立装置、方法及计算机可读存储介质,获取待测项目在目标时间单元之前的一个或多个时间单元内的外源特征,以及在目标时间单元前的自回归时间特征,对外源特征和自回归时间特征进行预处理和归一化处理,以获取归一化的特征集,按照预设规则对特征集中的特征进行筛选得到预测特征,该预测特征与目标时间单元对应的实际观测值构成一个预测样本,按照上述过程获取多个时间单元的多个预测样本,将上述多个预测样本输入到预设回归模型中进行训练以确定模型参数,将确定模型参数后的预设回归模型作为预测模型,该发明的方案将待测项目的外源特征与自回归时间特征进行结合并进行预处理构成一个特征集,从特征集筛选出符合条件的特征作为预测特征对回归模型进行训练生成预测模型,避免了样本特征的单一性,提高了预测模型的预测精准度。
附图说明
图1为本申请预测模型建立装置较佳实施例的示意图;
图2为本申请预测模型建立装置一实施例中预测模型建立程序的程序模块示意图;
图3为本申请预测模型建立方法第一实施例的流程图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供一种预测模型建立装置。参照图1所示,为本申请预测模型建立装置较佳实施例的示意图。
在本实施例中,预测模型建立装置可以是PC(Personal Computer,个人电脑),也可以是智能手机、平板电脑、电子书阅读器、MP3(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)播放器、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、便携计算机等具有显示功能的可移动式终端设备。
该预测模型建立装置包括存储器11、处理器12,通信总线13,以及网络接口14。
其中,存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器11在一些实施例中可以是预测模型建立装置的内部存储单元,例如该预测模型建立装置的硬盘。存储器11在另一些实施例中也可以是预测模型建立装置的外部存储设备,例如预测模型建立装置上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器11还可以既包括预测模型建立装置的内部存储单元也包括外部存储设备。存储器11不仅可以用于存储安装于预测模型建立装置的应用软件及各类数据,例如预测模型建立程序的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行预测模型建立程序等。
通信总线13用于实现这些组件之间的连接通信。
网络接口14可选的可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该装置与其他电子设备之间建立通信连接。
图1仅示出了具有组件11-14以及预测模型建立程序的预测模型建立装置,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
可选地,该装置还可以包括用户接口,用户接口可以包括显示器(Display)、输入单元比如键盘(Keyboard),可选的用户接口还可以包括标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在预测模型建立装置中处理的信息以及用于显示可视化的用户界面。
可选地,该装置还可以包括触摸传感器。所述触摸传感器所提供的供用 户进行触摸操作的区域称为触控区域。此外,这里所述的触摸传感器可以为电阻式触摸传感器、电容式触摸传感器等。而且,所述触摸传感器不仅包括接触式的触摸传感器,也可包括接近式的触摸传感器等。此外,所述触摸传感器可以为单个传感器,也可以为例如阵列布置的多个传感器。该装置的显示器的面积可以与所述触摸传感器的面积相同,也可以不同。可选地,将显示器与所述触摸传感器层叠设置,以形成触摸显示屏。该装置基于触摸显示屏侦测用户触发的触控操作。
可选地,该装置还可以包括摄像头、RF(Radio Frequency,射频)电路,传感器、音频电路、WiFi模块等。其中,传感器比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,若该装置为移动终端,环境光传感器可根据环境光线的明暗来调节显示屏的亮度,接近传感器可在移动终端移动到耳边时,关闭显示屏和/或背光。当然,移动终端还可配置陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
在图1所示的装置实施例中,存储器11中存储有预测模型建立程序;处理器12执行存储器11中存储的预测模型建立程序时实现如下步骤:
A、获取待测项目在目标时间单元之前的一个或多个时间单元内的外源特征,以及在目标时间单元前的自回归时间特征;
B、对所述外源特征进行预处理,并对所述自回归时间特征和经过所述预处理的外源特征进行归一化处理,以获取归一化的特征集;
C、按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征;
D、将所述待测项目在所述目标时间单元内的实际观测值作为预测目标,将所述多个预测特征和所述预测目标作为一个预测样本;
E、按照所述A至D的步骤分别获取多个时间单元的多个预测样本,将所述多个预测样本输入到预设回归模型中进行训练以确定模型参数,将确定模型参数后的所述预设回归模型作为所述待测项目的预测模型。
在该实施例中,以流感预测项目作为待测项目对本实施例的方案进行说明。由于在对模型进行训练时采用的都是历史数据,即已经过去的时间段内的数据,因此,在从历史数据中采集特征值作为测试样本时,每一个测试样本中的时间单元也是过去的一个时间段。在本实施例中,以周作为一个时间单元。假设将过去的100个周的外源特征以及流感样病例百分比作为历史数据,其中,流感样病例百分比为某地区各哨点医院流感样病例总数/某地区各哨点医院门诊总就诊人次。
从上述历史数据中获取一个预测样本的过程如下:从上述历史数据中确定一个周作为目标时间单元,例如,以过去的100个周中的第100周,即与当前时间点最邻近的一个周,作为目标时间单元,则将第100周的流感样病例百分比的实际观测值作为预测目标。从第100周之前的多个周的数据中采 集特征作为预测特征。
具体地,获取与目标时间单元之前的一个或者多个时间单元对应的外源特征,优选地,外源特征包括搜索指数、天气特征和环境特征等对流感样病例的百分比造成一定程度的影响的外部特征。
搜索指数为:在目标时间单元之前的一个时间单元内,优选地为目标时间单元的上一个时间单元,搜索引擎用户在该搜索引擎上搜索相关关键词的频度,搜索引擎一般会对该频度进行记录,例如,百度的百度指数。其中,相关关键词为用户预先设置的与该流感的预测具有一定因果关系的关键词,例如“鼻塞”、“打喷嚏”、“抗病毒口服液”、“嗓子痛”等与流感相关的检索词,在此不再一一列举,用户可以根据需要设置多个关键词,并从一些搜索网站上按照关键词提取对应的搜索指数构成搜索指数集。
天气特征包括但不限于以下几个特征:每周晴天数、每周多云天数、每周阴天数、每周雨量、每周风量、每周平均中间气温、每周中间气温最大差以及周日中间气温与周一中间气温的差值。其中,每周平均中间气温为一周七天的中间气温的平均值,中间气温为一天内的温度区间的中间值。在获取天气特征时,可以取第100周的前一周,第99周的天气特征,也可以取第100周之前的多个周的天气特征,即历史时间单元的个数可以是多个。
环境特征为SO2、NO2、PM10、O3、CO和PM2.5的浓度分别在历史时间单元内的均值。
对于目标时间单元,即第100周的预测目标来说,其自回归时间特征可以包括但不限于以下特征:上一周的流感样病例百分比、上两周的流感样病例百分比、上三周的流感样病例百分比、临近三周的流感样病例百分比均值、临近三周的流感样病例百分比方差、去年同期的流感样病例百分比。
在获取到上述外源特征和自回归时间特征后,对这些特征进行预处理以及归一化处理,以去掉上述特征中的异常特征,避免这些异常特征值对预测模型的训练带来的一些异常影响,进而提高预测模型的预测精准度。
可选地,预处理可以包括极值优化处理和/或缺失值补充处理。
由于搜索指数的数量级比较大,可能出现的数据范围也比较大,因此优选地对搜索指数进行极值优化处理,具体地,获取所述搜索指数集的第一四分位数Q1、第三四分位数Q3和四分差IQR(interquartile range,又称为四分位距);根据所述第一四分位数、所述第三四分位数和所述四分差确定搜索指数的第一阈值和第二阈值,所述第一阈值小于所述第二阈值;将所述搜索指数集中大于所述第二阈值的搜索指数转换为所述第二阈值,将所述搜索指数集中小于所述第一阈值的搜索指数转换为所述第一阈值。上述第一阈值可以取Q1-k*IQR,上述第二阈值可以取Q3+k*IQR,其中,k≥0。优选地,在一实施例中,k=1.5。
关于缺失值补充处理,可以根据kNN(k-NearestNeighbor,k最近邻)算法,对经过极值优化处理的搜索指数进行缺失值补充。具体地,在获取到搜索指 数后,检测搜索指数中是否有关键字没有对应的指数值,若有,则获取与该目标时间单元相邻的多个时间单元中搜索指数没有缺失值时间单元,用这些时间单元的搜索指数中对应关键字的搜索指数的平均值或者中位数或者众数代替缺失的数据。例如,第99周的搜索指数中关键字“抗病毒口服液”的搜索指数缺失,而第98周、第97周、第95周的搜索指数中关键字“抗病毒口服液”的搜索指数没有缺失,则可以用这三个周的关键字“抗病毒口服液”的搜索指数的平均值或者中位数或者众数代替第99周的搜索指数中缺失的“抗病毒口服液”的搜索指数。在其他实施例中,也可以采用其他方式对缺失值进行补充,例如均值法,取除缺失值之外的其他关键字的搜索指数的平均值,用获取的平均值替代目标单元中的缺失值。
然后,对于自回归时间特征以及经过上述预处理的外源特征进行归一化处理,将上述具有不同量纲的特征值转换为无量纲的符合正态分布的特征值,避免因为特征严重歪曲而造成预测结果的精准度低。
获取到归一化处理后的特征集之后,对特征集中的特征进行筛选。可选地,在一些实施例中,采用pearson相关系数(Pearson correlation coefficient,皮尔森相关系数)对特征进行过滤。计算所述特征集中的各个特征值的pearson相关系数;将所述特征集中选择pearson相关系数小于或者等于预设相关系数的特征,作为预测特征。在一些实施例中,预设相关系数优选地取0.2。可以理解的是,在其他实施例中,也可以采用其他的特征筛选方法对特征集进行筛选。将经过特征筛选后剩余的特征作为预测特征,将目标时间单元的流感样病例百分比的实际观测值作为预测目标,上述预测特征和预测目标构成一个预测样本。
按照上述方式,从历史数据中获取多个预测时间单元对应的预测样本,例如,分别将第99周、第98周、第97周……等作为目标时间单元,分别获取他们的预测样本,将获取的这些预测样本输入到预设回归模型中进行模型训练,得到该模型的模型参数,其中,在一些实施例中,可以采用k-折交叉验证的方式对模型进行训练,将上述多个预测样本中的一个样本作为测试集,剩余的样本作为训练集,对模型进行训练。或者在其他实施例中,预设的回归模型可以是LASSO回归模型、岭回归模型等。在一实施例中,优选地采用LASSO回归模型。将确定了模型参数的回归模型作为该待测项目的预测模型。
可以理解的是,上述实施例中,以流感样病例百分比的预测为例对本申请的方案进行了说明。但是本申请提出的方案并不仅限于此,还可以应用于其他的项目的预测,例如,天气的预测、广告点击率的预测等等,对应不同的预测项目,需要采集能够反映该项目的变化的数据作为外源特征,并按照上述方案对外源特征进行处理,然后结合项目本身在时间维度上的自回归时间特征对回归模型进行训练以生成预测模型。
本实施例提出的预测模型建立装置,获取待测项目在目标时间单元之前的一个或多个时间单元内的外源特征,以及在目标时间单元前的自回归时间 特征,对外源特征和自回归时间特征进行预处理和归一化处理,以获取归一化的特征集,按照预设规则对特征集中的特征进行筛选得到预测特征,该预测特征与目标时间单元对应的实际观测值构成一个预测样本,按照上述过程获取多个时间单元的多个预测样本,将上述多个预测样本输入到预设回归模型中进行训练以确定模型参数,将确定模型参数后的预设回归模型作为预测模型,该发明的方案将待测项目的外源特征与自回归时间特征进行结合并进行预处理构成一个特征集,从特征集筛选出符合条件的特征作为预测特征对回归模型进行训练生成预测模型,避免了样本特征的单一性,提高了预测模型的预测精准度。
可选地,在其他的实施例中,预测模型建立程序还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器11中,并由一个或多个处理器(本实施例为处理器12)所执行以完成本申请,本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。
参照图2所示,为本申请预测模型建立装置一实施例中的预测模型建立程序的程序模块示意图,该实施例中,预测模型建立程序可以被分割为获取模块10、处理模块20、筛选模块30、形成模块40以及训练模块50,其中:
获取模块10用于获取待测项目在目标时间单元之前的一个或多个时间单元内的外源特征,以及在目标时间单元前的自回归时间特征;
处理模块20用于对所述外源特征进行预处理,并对所述自回归时间特征和经过所述预处理的外源特征进行归一化处理,以获取归一化的特征集;
筛选模块30用于按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征;
形成模块40用于将所述待测项目在所述目标时间单元内的实际观测值作为预测目标,将所述多个预测特征和所述预测目标作为一个预测样本;
训练模块50用于分别获取多个时间单元的多个预测样本,将所述多个预测样本输入到预设回归模型中进行训练以确定模型参数,将确定模型参数后的所述预设回归模型作为所述待测项目的预测模型。
此外,本申请还提供一种预测模型建立方法。参照图3所示,为本申请预测模型建立方法第一实施例的流程图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。
在本实施例中,预测模型建立方法包括:
步骤S10,获取待测项目在目标时间单元之前的一个或多个时间单元内的外源特征,以及在目标时间单元前的自回归时间特征;
步骤S20,对所述外源特征进行预处理,并对所述自回归时间特征和经过所述预处理的外源特征进行归一化处理,以获取归一化的特征集;
步骤S30,按照预设规则对所述特征集中的特征进行特征筛选,以获取预 测特征;
步骤S40,将所述待测项目在所述目标时间单元内的实际观测值作为预测目标,将所述多个预测特征和所述预测目标作为一个预测样本;
步骤S50,按照所述步骤S10至步骤S40,分别获取多个时间单元的多个预测样本,将所述多个预测样本输入到预设回归模型中进行训练以确定模型参数,将确定模型参数后的所述预设回归模型作为所述待测项目的预测模型。
在该实施例中,以流感预测项目作为待测项目对本实施例的方案进行说明。由于在对模型进行训练时采用的都是历史数据,即已经过去的时间段内的数据,因此,在从历史数据中采集特征值作为测试样本时,每一个测试样本中的时间单元也是过去的一个时间段。在本实施例中,以周作为一个时间单元。假设将过去的100个周的外源特征以及流感样病例百分比作为历史数据,其中,流感样病例百分比为某地区各哨点医院流感样病例总数/某地区各哨点医院门诊总就诊人次。
从上述历史数据中获取一个预测样本的过程如下:从上述历史数据中确定一个周作为目标时间单元,例如,以过去的100个周中的第100周,即与当前时间点最邻近的一个周,作为目标时间单元,则将第100周的流感样病例百分比的实际观测值作为预测目标。从第100周之前的多个周的数据中采集特征作为预测特征。
具体地,获取与目标时间单元之前的一个或者多个时间单元对应的外源特征,优选地,外源特征包括搜索指数、天气特征和环境特征等对流感样病例的百分比造成一定程度的影响的外部特征。
搜索指数为:在目标时间单元之前的一个时间单元内,优选地为目标时间单元的上一个时间单元,搜索引擎用户在该搜索引擎上搜索相关关键词的频度,搜索引擎一般会对该频度进行记录,例如,百度的百度指数。其中,相关关键词为用户预先设置的与该流感的预测具有一定因果关系的关键词,例如“鼻塞”、“打喷嚏”、“抗病毒口服液”、“嗓子痛”等与流感相关的检索词,在此不再一一列举,用户可以根据需要设置多个关键词,并从一些搜索网站上按照关键词提取对应的搜索指数构成搜索指数集。
天气特征包括但不限于以下几个特征:每周晴天数、每周多云天数、每周阴天数、每周雨量、每周风量、每周平均中间气温、每周中间气温最大差以及周日中间气温与周一中间气温的差值。其中,每周平均中间气温为一周七天的中间气温的平均值,中间气温为一天内的温度区间的中间值。在获取天气特征时,可以取第100周的前一周,第99周的天气特征,也可以取第100周之前的多个周的天气特征,即历史时间单元的个数可以是多个。
环境特征为SO2、NO2、PM10、O3、CO和PM2.5的浓度分别在历史时间单元内的均值。
对于目标时间单元,即第100周的预测目标来说,其自回归时间特征可以包括但不限于以下特征:上一周的流感样病例百分比、上两周的流感样病 例百分比、上三周的流感样病例百分比、临近三周的流感样病例百分比均值、临近三周的流感样病例百分比方差、去年同期的流感样病例百分比。
在获取到上述外源特征和自回归时间特征后,对这些特征进行预处理以及归一化处理,以去掉上述特征中的异常特征,避免这些异常特征值对预测模型的训练带来的一些异常影响,进而提高预测模型的预测精准度。
可选地,预处理可以包括极值优化处理和/或缺失值补充处理。
由于搜索指数的数量级比较大,可能出现的数据范围也比较大,因此优选地对搜索指数进行极值优化处理,具体地,获取所述搜索指数集的第一四分位数Q1、第三四分位数Q3和四分差IQR(interquartile range,又称为四分位距);根据所述第一四分位数、所述第三四分位数和所述四分差确定搜索指数的第一阈值和第二阈值,所述第一阈值小于所述第二阈值;将所述搜索指数集中大于所述第二阈值的搜索指数转换为所述第二阈值,将所述搜索指数集中小于所述第一阈值的搜索指数转换为所述第一阈值。上述第一阈值可以取Q1-k*IQR,上述第二阈值可以取Q3+k*IQR,其中,k≥0。优选地,在一实施例中,k=1.5。
关于缺失值补充处理,可以根据邻近kNN算法,对经过极值优化处理的搜索指数进行缺失值补充。具体地,在获取到搜索指数后,检测搜索指数中是否有关键字没有对应的指数值,若有,则获取与该目标时间单元相邻的多个时间单元中搜索指数没有缺失值时间单元,用这些时间单元的搜索指数中对应关键字的搜索指数的平均值或者中位数或者众数代替缺失的数据。例如,第99周的搜索指数中关键字“抗病毒口服液”的搜索指数缺失,而第98周、第97周、第95周的搜索指数中关键字“抗病毒口服液”的搜索指数没有缺失,则可以用这三个周的关键字“抗病毒口服液”的搜索指数的平均值或者中位数或者众数代替第99周的搜索指数中缺失的“抗病毒口服液”的搜索指数。在其他实施例中,也可以采用其他方式对缺失值进行补充,例如均值法,取除缺失值之外的其他关键字的搜索指数的平均值,用获取的平均值替代目标单元中的缺失值。
然后,对于自回归时间特征以及经过上述预处理的外源特征进行归一化处理,将上述具有不同量纲的特征值转换为无量纲的符合正态分布的特征值,避免因为特征严重歪曲而造成预测结果的精准度低。
获取到归一化处理后的特征集之后,对特征集中的特征进行筛选。可选地,在一些实施例中,采用pearson相关系数对特征进行过滤。计算所述特征集中的各个特征值的pearson相关系数;将所述特征集中选择pearson相关系数小于或者等于预设相关系数的特征,作为预测特征。在一些实施例中,预设相关系数优选地取0.2。可以理解的是,在其他实施例中,也可以采用其他的特征筛选方法对特征集进行筛选。将经过特征筛选后剩余的特征作为预测特征,将目标时间单元的流感样病例百分比的实际观测值作为预测目标,上述预测特征和预测目标构成一个预测样本。
按照上述方式,从历史数据中获取多个预测时间单元对应的预测样本,例如,分别将第99周、第98周、第97周……等作为目标时间单元,分别获取他们的预测样本,将获取的这些预测样本输入到预设回归模型中进行模型训练,得到该模型的模型参数,其中,在一些实施例中,可以采用k-折交叉验证的方式对模型进行训练,将上述多个预测样本中的一个样本作为测试集,剩余的样本作为训练集,对模型进行训练。或者在其他实施例中,预设的回归模型可以是LASSO回归模型、岭回归模型等。在一实施例中,优选地采用LASSO回归模型。将确定了模型参数的回归模型作为该待测项目的预测模型。
可以理解的是,上述实施例中,以流感样病例百分比的预测为例对本申请的方案进行了说明。但是本申请提出的方案并不仅限于此,还可以应用于其他的项目的预测,例如,天气的预测、广告点击率的预测等等,对应不同的预测项目,需要采集能够反映该项目的变化的数据作为外源特征,并按照上述方案对外源特征进行处理,然后结合项目本身在时间维度上的自回归时间特征对回归模型进行训练以生成预测模型。该预测模型可以用于对未来一段时间内待测项目的预测。
本实施例提出的预测模型建立方法,获取待测项目在目标时间单元之前的一个或多个时间单元内的外源特征,以及在目标时间单元前的自回归时间特征,对外源特征和自回归时间特征进行预处理和归一化处理,以获取归一化的特征集,按照预设规则对特征集中的特征进行筛选得到预测特征,该预测特征与目标时间单元对应的实际观测值构成一个预测样本,按照上述过程获取多个时间单元的多个预测样本,将上述多个预测样本输入到预设回归模型中进行训练以确定模型参数,将确定模型参数后的预设回归模型作为预测模型,该发明的方案将待测项目的外源特征与自回归时间特征进行结合并进行预处理构成一个特征集,从特征集筛选出符合条件的特征作为预测特征对回归模型进行训练生成预测模型,避免了样本特征的单一性,提高了预测模型的预测精准度。
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有预测模型建立程序,所述预测模型建立程序可被一个或多个处理器执行,以实现如下步骤:
A、获取待测项目在目标时间单元之前的一个或多个时间单元内的外源特征,以及在目标时间单元前的自回归时间特征;
B、对所述外源特征进行预处理,并对所述自回归时间特征和经过所述预处理的外源特征进行归一化处理,以获取归一化的特征集;
C、按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征;
D、将所述待测项目在所述目标时间单元内的实际观测值作为预测目标,将所述多个预测特征和所述预测目标作为一个预测样本;
E、按照所述A至D的步骤分别获取多个时间单元的多个预测样本,将 所述多个预测样本输入到预设回归模型中进行训练以确定模型参数,将确定模型参数后的所述预设回归模型作为所述待测项目的预测模型。
进一步地,所述预测模型建立程序被处理器执行时还实现如下操作:
获取所述搜索指数集的第一四分位数、第三四分位数和四分差;
根据所述第一四分位数、所述第三四分位数和所述四分差确定搜索指数的第一阈值和第二阈值,所述第一阈值小于所述第二阈值;
将所述搜索指数集中大于所述第二阈值的搜索指数转换为所述第二阈值,将所述搜索指数集中小于所述第一阈值的搜索指数转换为所述第一阈值。
进一步地,所述预测模型建立程序被处理器执行时还实现如下操作:
根据邻近kNN算法,对经过极值优化处理的搜索指数进行缺失值补充。
进一步地,所述预测模型建立程序被处理器执行时还实现如下操作:
计算所述特征集中的各个特征值的pearson相关系数;
将所述特征集中选择pearson相关系数小于或者等于预设相关系数的特征,作为预测特征。
需要说明的是,上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。并且本文中的术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种预测模型建立装置,其特征在于,所述装置包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的预测模型建立程序,所述预测模型建立程序被所述处理器执行时实现如下步骤:
    A、获取待测项目在目标时间单元之前的一个或多个时间单元内的外源特征,以及在目标时间单元前的自回归时间特征;
    B、对所述外源特征进行预处理,并对所述自回归时间特征和经过所述预处理的外源特征进行归一化处理,以获取归一化的特征集;
    C、按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征;
    D、将所述待测项目在所述目标时间单元内的实际观测值作为预测目标,将所述多个预测特征和所述预测目标作为一个预测样本;
    E、按照所述A至D的步骤分别获取多个时间单元的多个预测样本,将所述多个预测样本输入到预设回归模型中进行训练以确定模型参数,将确定模型参数后的所述预设回归模型作为所述待测项目的预测模型。
  2. 根据权利要求1所述的预测模型建立装置,其特征在于,若所述外源特征包括所述待测项目对应的搜索指数集,所述预处理为极值优化处理,则所述对所述外源特征进行预处理的步骤包括:
    获取所述搜索指数集的第一四分位数、第三四分位数和四分差;
    根据所述第一四分位数、所述第三四分位数和所述四分差确定搜索指数的第一阈值和第二阈值,所述第一阈值小于所述第二阈值;
    将所述搜索指数集中大于所述第二阈值的搜索指数转换为所述第二阈值,将所述搜索指数集中小于所述第一阈值的搜索指数转换为所述第一阈值。
  3. 根据权利要求2所述的预测模型建立装置,其特征在于,所述对所述外源特征进行预处理的步骤还包括:
    根据邻近kNN算法,对经过极值优化处理的搜索指数进行缺失值补充。
  4. 根据权利要求1所述的预测模型建立装置,其特征在于,所述按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征的步骤包括:
    计算所述特征集中的各个特征值的pearson相关系数;
    将所述特征集中选择pearson相关系数小于或者等于预设相关系数的特征,作为预测特征。
  5. 根据权利要求2所述的预测模型建立装置,其特征在于,所述按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征的步骤包括:
    计算所述特征集中的各个特征值的pearson相关系数;
    将所述特征集中选择pearson相关系数小于或者等于预设相关系数的特征, 作为预测特征。
  6. 根据权利要求3所述的预测模型建立装置,其特征在于,所述按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征的步骤包括:
    计算所述特征集中的各个特征值的pearson相关系数;
    将所述特征集中选择pearson相关系数小于或者等于预设相关系数的特征,作为预测特征。
  7. 根据权利要求1所述的预测模型建立装置,其特征在于,所述待测项目为流感预测项目,所述外源特征包括搜索指数、天气特征和环境特征,所述预设回归模型为LASSO回归模型。
  8. 一种预测模型建立方法,其特征在于,所述方法包括:
    A、获取待测项目在目标时间单元之前的一个或多个时间单元内的外源特征,以及在目标时间单元前的自回归时间特征;
    B、对所述外源特征进行预处理,并对所述自回归时间特征和经过所述预处理的外源特征进行归一化处理,以获取归一化的特征集;
    C、按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征;
    D、将所述待测项目在所述目标时间单元内的实际观测值作为预测目标,将所述多个预测特征和所述预测目标作为一个预测样本;
    E、按照所述A至D的步骤分别获取多个时间单元的多个预测样本,将所述多个预测样本输入到预设回归模型中进行训练以确定模型参数,将确定模型参数后的所述预设回归模型作为所述待测项目的预测模型。
  9. 根据权利要求8所述的预测模型建立方法,其特征在于,若所述外源特征包括所述待测项目对应的搜索指数集,所述预处理为极值优化处理,则所述对所述外源特征进行预处理的步骤包括:
    获取所述搜索指数集的第一四分位数、第三四分位数和四分差;
    根据所述第一四分位数、所述第三四分位数和所述四分差确定搜索指数的第一阈值和第二阈值,所述第一阈值小于所述第二阈值;
    将所述搜索指数集中大于所述第二阈值的搜索指数转换为所述第二阈值,将所述搜索指数集中小于所述第一阈值的搜索指数转换为所述第一阈值。
  10. 根据权利要求9所述的预测模型建立方法,其特征在于,所述对所述外源特征进行预处理的步骤还包括:
    根据邻近kNN算法,对经过极值优化处理的搜索指数进行缺失值补充。
  11. 根据权利要求8所述的预测模型建立方法,其特征在于,所述按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征的步骤包括:
    计算所述特征集中的各个特征值的pearson相关系数;
    将所述特征集中选择pearson相关系数小于或者等于预设相关系数的特征,作为预测特征。
  12. 根据权利要求9所述的预测模型建立方法,其特征在于,所述按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征的步骤包括:
    计算所述特征集中的各个特征值的pearson相关系数;
    将所述特征集中选择pearson相关系数小于或者等于预设相关系数的特征,作为预测特征。
  13. 根据权利要求10所述的预测模型建立方法,其特征在于,所述按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征的步骤包括:
    计算所述特征集中的各个特征值的pearson相关系数;
    将所述特征集中选择pearson相关系数小于或者等于预设相关系数的特征,作为预测特征。
  14. 根据权利要求8所述的预测模型建立方法,其特征在于,所述待测项目为流感预测项目,所述外源特征包括搜索指数、天气特征和环境特征,所述预设回归模型为LASSO回归模型。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有预测模型建立程序,所述预测模型建立程序可被一个或多个处理器执行,以实现如下步骤:
    A、获取待测项目在目标时间单元之前的一个或多个时间单元内的外源特征,以及在目标时间单元前的自回归时间特征;
    B、对所述外源特征进行预处理,并对所述自回归时间特征和经过所述预处理的外源特征进行归一化处理,以获取归一化的特征集;
    C、按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征;
    D、将所述待测项目在所述目标时间单元内的实际观测值作为预测目标,将所述多个预测特征和所述预测目标作为一个预测样本;
    E、按照所述A至D的步骤分别获取多个时间单元的多个预测样本,将所述多个预测样本输入到预设回归模型中进行训练以确定模型参数,将确定模型参数后的所述预设回归模型作为所述待测项目的预测模型。
  16. 根据权利要求15所述的计算机可读存储介质,其特征在于,若所述外源特征包括所述待测项目对应的搜索指数集,所述预处理为极值优化处理,则所述对所述外源特征进行预处理的步骤包括:
    获取所述搜索指数集的第一四分位数、第三四分位数和四分差;
    根据所述第一四分位数、所述第三四分位数和所述四分差确定搜索指数 的第一阈值和第二阈值,所述第一阈值小于所述第二阈值;
    将所述搜索指数集中大于所述第二阈值的搜索指数转换为所述第二阈值,将所述搜索指数集中小于所述第一阈值的搜索指数转换为所述第一阈值。
  17. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述对所述外源特征进行预处理的步骤还包括:
    根据邻近kNN算法,对经过极值优化处理的搜索指数进行缺失值补充。
  18. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征的步骤包括:
    计算所述特征集中的各个特征值的pearson相关系数;
    将所述特征集中选择pearson相关系数小于或者等于预设相关系数的特征,作为预测特征。
  19. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述按照预设规则对所述特征集中的特征进行特征筛选,以获取预测特征的步骤包括:
    计算所述特征集中的各个特征值的pearson相关系数;
    将所述特征集中选择pearson相关系数小于或者等于预设相关系数的特征,作为预测特征。
  20. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述待测项目为流感预测项目,所述外源特征包括搜索指数、天气特征和环境特征,所述预设回归模型为LASSO回归模型。
PCT/CN2017/108801 2017-08-20 2017-10-31 预测模型建立装置、方法及计算机可读存储介质 WO2019037260A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710715445.1 2017-08-20
CN201710715445.1A CN107688872A (zh) 2017-08-20 2017-08-20 预测模型建立装置、方法及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2019037260A1 true WO2019037260A1 (zh) 2019-02-28

Family

ID=61153475

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/108801 WO2019037260A1 (zh) 2017-08-20 2017-10-31 预测模型建立装置、方法及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN107688872A (zh)
WO (1) WO2019037260A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308278A (zh) * 2019-08-02 2021-02-02 中移信息技术有限公司 预测模型的优化方法、装置、设备和介质
CN112545461A (zh) * 2020-12-05 2021-03-26 深圳市美的连医疗电子股份有限公司 一种无创血红蛋白浓度值的检测方法、装置、系统及计算机可读存储介质

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108492138B (zh) * 2018-03-19 2020-03-24 平安科技(深圳)有限公司 产品购买预测方法、服务器及存储介质
CN108630321B (zh) * 2018-04-11 2020-02-21 平安科技(深圳)有限公司 流行病预测方法、计算机装置及计算机可读存储介质
CN108597617B (zh) * 2018-04-11 2022-05-20 平安科技(深圳)有限公司 流行病分级预测方法及装置、计算机装置和可读存储介质
CN108597618B (zh) * 2018-04-20 2021-09-03 杭州恒生数字设备科技有限公司 一种具有自动学习功能的流感预测摄像机
CN109905271B (zh) * 2018-05-18 2021-01-12 华为技术有限公司 一种预测方法、训练方法、装置及计算机存储介质
CN108766585A (zh) * 2018-05-31 2018-11-06 平安科技(深圳)有限公司 流感预测模型的生成方法、装置及计算机可读存储介质
CN108831561A (zh) * 2018-05-31 2018-11-16 平安科技(深圳)有限公司 流感预测模型的生成方法、装置及计算机可读存储介质
CN109243619B (zh) * 2018-07-13 2023-03-31 平安科技(深圳)有限公司 预测模型的生成方法、装置及计算机可读存储介质
CN109192306A (zh) * 2018-09-21 2019-01-11 广东工业大学 一种糖尿病的判断装置、设备及计算机可读存储介质
CN109409596B (zh) * 2018-10-22 2021-04-13 东软集团股份有限公司 预测风速的处理方法、装置、设备和计算机可读存储介质
CN109493979A (zh) * 2018-10-23 2019-03-19 平安科技(深圳)有限公司 一种基于智能决策的疾病预测方法和装置
CN109754175B (zh) * 2018-12-28 2023-04-07 广州明动软件股份有限公司 用于对行政审批事项的办结时限进行压缩预测的计算模型及其应用
CN110136841B (zh) * 2019-03-27 2022-07-08 平安科技(深圳)有限公司 疾病发病预测方法、装置及计算机可读存储介质
CN110111902B (zh) * 2019-04-04 2022-05-27 平安科技(深圳)有限公司 急性传染病的发病周期预测方法、装置及存储介质
CN110979589B (zh) * 2019-12-16 2020-10-30 上海船舶研究设计院(中国船舶工业集团公司第六0四研究院) 基于波浪增阻预测的船舶控制方法、装置及智能终端
CN111415752B (zh) * 2020-03-01 2023-05-12 集美大学 一种融合气象因素和搜索指数的手足口病预测方法
CN111524599A (zh) * 2020-04-24 2020-08-11 中国地质大学(武汉) 一种基于机器学习的新冠肺炎数据处理方法及预测系统
CN111524600A (zh) * 2020-04-24 2020-08-11 中国地质大学(武汉) 基于neighbor2vec的肝癌术后复发风险预测系统
CN112802603A (zh) * 2021-02-04 2021-05-14 北京深演智能科技股份有限公司 预测流感程度的方法和装置
CN113436751A (zh) * 2021-06-29 2021-09-24 山东健康医疗大数据有限公司 一种周ili占比趋势预测系统及方法
CN113707337B (zh) * 2021-08-30 2024-05-10 平安科技(深圳)有限公司 基于多源数据的疾病预警方法、装置、设备及存储介质
CN117274798B (zh) * 2023-09-06 2024-03-29 中国农业科学院农业信息研究所 基于正则化的时序变分模型的遥感水稻识别方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809335A (zh) * 2015-04-10 2015-07-29 上海卫生信息工程技术研究中心有限公司 一种环境变化对疾病发病影响的分析预测模型
CN106709588A (zh) * 2015-11-13 2017-05-24 日本电气株式会社 预测模型构建方法和设备以及实时预测方法和设备
CN106777874A (zh) * 2016-11-18 2017-05-31 中国科学院自动化研究所 基于循环神经网络构建预测模型的方法
CN106777891A (zh) * 2016-11-21 2017-05-31 中国科学院自动化研究所 一种数据特征选择和预测方法及装置
CN106980914A (zh) * 2017-06-05 2017-07-25 厦门美柚信息科技有限公司 女性生理周期的预测方法及装置、终端

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809335A (zh) * 2015-04-10 2015-07-29 上海卫生信息工程技术研究中心有限公司 一种环境变化对疾病发病影响的分析预测模型
CN106709588A (zh) * 2015-11-13 2017-05-24 日本电气株式会社 预测模型构建方法和设备以及实时预测方法和设备
CN106777874A (zh) * 2016-11-18 2017-05-31 中国科学院自动化研究所 基于循环神经网络构建预测模型的方法
CN106777891A (zh) * 2016-11-21 2017-05-31 中国科学院自动化研究所 一种数据特征选择和预测方法及装置
CN106980914A (zh) * 2017-06-05 2017-07-25 厦门美柚信息科技有限公司 女性生理周期的预测方法及装置、终端

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308278A (zh) * 2019-08-02 2021-02-02 中移信息技术有限公司 预测模型的优化方法、装置、设备和介质
CN112545461A (zh) * 2020-12-05 2021-03-26 深圳市美的连医疗电子股份有限公司 一种无创血红蛋白浓度值的检测方法、装置、系统及计算机可读存储介质

Also Published As

Publication number Publication date
CN107688872A (zh) 2018-02-13

Similar Documents

Publication Publication Date Title
WO2019037260A1 (zh) 预测模型建立装置、方法及计算机可读存储介质
WO2019041773A1 (zh) 预测模型的更新装置、方法及计算机可读存储介质
US10402031B2 (en) Method and system for thermal drift correction
WO2019153604A1 (zh) 人机识别模型的建立装置、方法及计算机可读存储介质
US9197511B2 (en) Anomaly detection in network-site metrics using predictive modeling
US20150127595A1 (en) Modeling and detection of anomaly based on prediction
WO2022213465A1 (zh) 基于神经网络的图像识别方法、装置、电子设备及介质
WO2019041519A1 (zh) 目标跟踪装置、方法及计算机可读存储介质
CN106856015B (zh) 一种考勤方法及装置
CN113687897A (zh) 用于向计算设备的用户前摄性地提供推荐的系统和方法
JP6815708B2 (ja) インフルエンザ予測モデルの生成方法、装置及びコンピュータ可読記憶媒体
US20140195977A1 (en) User interface content personalization system
WO2018120425A1 (zh) 个人财产状态评估方法、装置、设备和存储介质
WO2022016556A1 (zh) 一种神经网络蒸馏方法以及装置
WO2019056793A1 (zh) 简历识别装置、方法及计算机可读存储介质
CN109684302B (zh) 数据预测方法、装置、设备及计算机可读存储介质
WO2019227711A1 (zh) 流感预测模型的生成方法、装置及计算机可读存储介质
WO2019071890A1 (zh) 产品推荐装置、方法及计算机可读存储介质
US11023495B2 (en) Automatically generating meaningful user segments
US20150169072A1 (en) Method, apparatus and computer readable medium for polygon gesture detection and interaction
US20210081844A1 (en) System and method for categorical time-series clustering
US20140012593A1 (en) Apparatuds and method for lifestyle management based on model
US20190156955A1 (en) Identifying program member data records for targeted operations
CN113765873A (zh) 用于检测异常访问流量的方法和装置
US10229212B2 (en) Identifying Abandonment Using Gesture Movement

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17922528

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22/09/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17922528

Country of ref document: EP

Kind code of ref document: A1