CN108831561A - Generation method, device and the computer readable storage medium of influenza prediction model - Google Patents

Generation method, device and the computer readable storage medium of influenza prediction model Download PDF

Info

Publication number
CN108831561A
CN108831561A CN201810543750.1A CN201810543750A CN108831561A CN 108831561 A CN108831561 A CN 108831561A CN 201810543750 A CN201810543750 A CN 201810543750A CN 108831561 A CN108831561 A CN 108831561A
Authority
CN
China
Prior art keywords
candidate feature
candidate
prediction model
model
influenza
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810543750.1A
Other languages
Chinese (zh)
Inventor
李弦
徐亮
阮晓雯
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201810543750.1A priority Critical patent/CN108831561A/en
Priority to PCT/CN2018/102119 priority patent/WO2019227711A1/en
Publication of CN108831561A publication Critical patent/CN108831561A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu

Landscapes

  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of generation method of influenza prediction model, this method includes:Determine target area to be predicted and the public sentiment keyword with influenza prediction and matching, target area public sentiment data sequence in continuous multiple time quantums is obtained according to public sentiment keyword, and constructs candidate feature set for the public sentiment data in public sentiment data sequence as candidate feature;Wavelet Denoising Method processing is carried out to candidate feature;It handles trend is carried out by the candidate feature in Wavelet Denoising Method treated candidate feature set;The preset quantity for determining feature, the feature filtered out from candidate feature set equal to preset quantity constitute predicted characteristics set;According to predicted characteristics set as the trained prediction model constructed based on xgboost algorithm of training sample to determine model parameter.The present invention also propose a kind of influenza prediction model generating means and a kind of computer readable storage medium.The present invention improves the prediction accuracy of influenza prediction model.

Description

Generation method, device and the computer readable storage medium of influenza prediction model
Technical field
The present invention relates to field of computer technology more particularly to a kind of generation method, device and the meters of influenza prediction model Calculation machine readable storage medium storing program for executing.
Background technique
With the development of artificial intelligence technology, pass through the correlative study number of monitoring network data source finding publilc health event Amount increases, and the company that especially search service provider etc. holds a large number of users behavioral data is made that many in that direction It attempts, academia also has relevant research to follow up.But the precision of prediction of influenza prediction model is still limited, main reason is that, Public sentiment data fluctuation is big, Long-term change trend influences to be to cause to build using public sentiment data progress influenza prediction vulnerable to platform user usage amount One of the main reason for fruit is limited is imitated, for example, the user volume of microblogging in recent years is sharply reduced, microblogging factor overall trend is therewith On a declining curve, this Long-term change trend affects the accuracy of prediction.The influenza based on public sentiment existing at present predicts not needle The methods of the fluctuation of the public sentiment factor is handled with Long-term change trend problem greatly, but uses simple linear regression or random forest The public sentiment factor is modeled and carries out influenza prediction, causes influenza prediction accuracy lower.
Summary of the invention
The present invention provides generation method, device and the computer readable storage medium of a kind of influenza prediction model, main Purpose is to improve the prediction accuracy of influenza prediction model.
To achieve the above object, the present invention also provides a kind of generation method of influenza prediction model, this method includes:
Target area to be predicted and the public sentiment keyword with influenza prediction and matching are determined, according to the public sentiment keyword Obtain public sentiment data sequence of the target area in continuous multiple time quantums, and by the carriage in the public sentiment data sequence Feelings data construct candidate feature set as candidate feature;
Wavelet Denoising Method processing is carried out to the candidate feature in the candidate feature set;
It handles trend is carried out by the candidate feature in Wavelet Denoising Method treated candidate feature set, acquisition goes Gesture treated candidate feature set;
It determines the preset quantity of feature, and filters out the candidate spy equal to the preset quantity from candidate feature set Sign constitutes predicted characteristics set;
The actual observed value for obtaining the influenza-like case percentage in continuous multiple time quantums, according to the prediction Characteristic set and the actual observed value are as training sample, and the trained prediction model constructed based on xgboost algorithm is with determination Model parameter, and the prediction model of model parameter will have been determined as influenza prediction model.
Optionally, the step of candidate feature in the candidate feature set carries out Wavelet Denoising Method processing include:
It determines wavelet basis function, each feature in the candidate feature set is formed according to the wavelet basis function Sequence carries out wavelet decomposition, and determines Decomposition order;
The threshold value for determining Wavelet Denoising Method, according to determining threshold value to the coefficient of each level of the predicted characteristics after wavelet decomposition It is adjusted;
Inverse transformation reconstruct is done to adjusted wavelet coefficient, the candidate feature after being denoised.
Optionally, it is carried out at trend by the candidate feature in Wavelet Denoising Method treated candidate feature set for described pair The step of reason, candidate feature set is gone trend treated in acquisition includes:
For the corresponding candidate feature of time quantum each in Wavelet Denoising Method treated candidate feature set, when obtaining this Between the data of continuous multiple time quantums before unit carry out linear regression, to construct trend prediction model, according to it is described become Gesture prediction model obtains the corresponding baseline forecast value of the time quantum;
The baseline forecast value is subtracted using the actual value of the candidate feature of the time quantum, obtains the time after trend Select feature.
Optionally, the step of preset quantity of the determining feature includes:
Based on xgboost algorithm building model as learner, the candidate feature in the candidate feature set is inputted The learner, feature quantity when being reached preset condition using feature recursion elimination cross validation algorithms selection model performance are made For the preset quantity.
Optionally, described that the candidate feature equal to the preset quantity is filtered out from candidate feature set, constitute prediction The step of characteristic set includes:
Based on xgboost algorithm building model as learner, the candidate feature in the candidate feature set is inputted The learner, and operation is iterated according to feature recursion elimination algorithm;
It obtains the learner and passes through the model coefficient that operation returns, each candidate feature is determined according to the model coefficient The significance level of each candidate feature in set;
The smallest K time of significance level is removed from current candidate feature set according to the significance level of each candidate feature Select feature;
Above-mentioned steps are repeated, until the quantity for the candidate feature that screening obtains reaches the preset quantity;
The candidate feature of the preset quantity constitutes predicted characteristics set.
In addition, to achieve the above object, the present invention also provides a kind of generating means of influenza prediction model, which includes Memory and processor are stored with the model generator that can be run on the processor, the model in the memory It generates when program is executed by the processor and realizes following steps:
Target area to be predicted and the public sentiment keyword with influenza prediction and matching are determined, according to the public sentiment keyword Obtain public sentiment data sequence of the target area in continuous multiple time quantums, and by the carriage in the public sentiment data sequence Feelings data construct candidate feature set as candidate feature;
Wavelet Denoising Method processing is carried out to the candidate feature in the candidate feature set;
It handles trend is carried out by the candidate feature in Wavelet Denoising Method treated candidate feature set, acquisition goes Gesture treated candidate feature set;
It determines the preset quantity of feature, and filters out the candidate spy equal to the preset quantity from candidate feature set Sign constitutes predicted characteristics set;
The actual observed value for obtaining the influenza-like case percentage in continuous multiple time quantums, according to the prediction Characteristic set and the actual observed value are as training sample, and the trained prediction model constructed based on xgboost algorithm is with determination Model parameter, and the prediction model of model parameter will have been determined as influenza prediction model.
Optionally, the step of candidate feature in the candidate feature set carries out Wavelet Denoising Method processing include:
It determines wavelet basis function, each feature in the candidate feature set is formed according to the wavelet basis function Sequence carries out wavelet decomposition, and determines Decomposition order;
The threshold value for determining Wavelet Denoising Method, according to determining threshold value to the coefficient of each level of the predicted characteristics after wavelet decomposition It is adjusted;
Inverse transformation reconstruct is done to adjusted wavelet coefficient, the candidate feature after being denoised.
Optionally, it is carried out at trend by the candidate feature in Wavelet Denoising Method treated candidate feature set for described pair The step of reason, candidate feature set is gone trend treated in acquisition includes:
For the corresponding candidate feature of time quantum each in Wavelet Denoising Method treated candidate feature set, when obtaining this Between the data of continuous multiple time quantums before unit carry out linear regression, to construct trend prediction model, according to it is described become Gesture prediction model obtains the corresponding baseline forecast value of the time quantum;
The baseline forecast value is subtracted using the actual value of the candidate feature of the time quantum, obtains the time after trend Select feature.
Optionally, described that the candidate feature equal to the preset quantity is filtered out from candidate feature set, constitute prediction The step of characteristic set includes:
Based on xgboost algorithm building model as learner, the candidate feature in the candidate feature set is inputted The learner, and operation is iterated according to feature recursion elimination algorithm;
It obtains the learner and passes through the model coefficient that operation returns, each candidate feature is determined according to the model coefficient The significance level of each candidate feature in set;
The smallest K time of significance level is removed from current candidate feature set according to the significance level of each candidate feature Select feature;
Above-mentioned steps are repeated, until the quantity for the candidate feature that screening obtains reaches the preset quantity;
The candidate feature of the preset quantity constitutes predicted characteristics set.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium Model generator is stored on storage medium, the model generator can be executed by one or more processor, with reality Now the step of generation method of influenza prediction model as described above.
Generation method, device and the computer readable storage medium of influenza prediction model proposed by the present invention are determined to pre- The target area of survey and public sentiment keyword with influenza prediction and matching obtain target area continuous more according to public sentiment keyword Public sentiment data sequence in a time quantum, and candidate spy is constructed using the public sentiment data in public sentiment data sequence as candidate feature Collection is closed;Candidate feature in candidate feature set is carried out Wavelet Denoising Method processing and trend to be gone to handle, obtains that treated waits Select characteristic set;The candidate feature that preset quantity is filtered out from candidate feature set constitutes predicted characteristics set;It obtains continuous The actual observed value of influenza-like case percentage in multiple time quantums, according to predicted characteristics set and actual observed value conduct Training sample, the trained prediction model constructed based on xgboost algorithm will determine model parameter to determine model parameter Prediction model as influenza prediction model, the solution of the present invention by using Wavelet Denoising Method and the data processing method for going trend, It restrained effectively data fluctuations existing for public sentiment data to interfere to the modeling bring of prediction model greatly;And on this basis It carries out Feature Selection and prediction model is constructed based on xgboost algorithm, can more accurately reflect the public sentiment factor and influenza sample Correlation between case percentage effectively promotes the influenza prediction accuracy based on public sentiment data.
Detailed description of the invention
Fig. 1 is the flow diagram of the generation method for the influenza prediction model that one embodiment of the invention provides;
Fig. 2 is the schematic diagram of internal structure of the generating means for the influenza prediction model that one embodiment of the invention provides;
Fig. 3 is that the module of model generator in the generating means for the influenza prediction model that one embodiment of the invention provides is shown It is intended to.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The present invention provides a kind of generation method of influenza prediction model.Shown in referring to Fig.1, provided for one embodiment of the invention Influenza prediction model generation method flow diagram.This method can be executed by a device, which can be by soft Part and/or hardware realization.
In the present embodiment, the generation method of influenza prediction model includes:
Step S10 determines target area to be predicted and the public sentiment keyword with influenza prediction and matching, according to the carriage Feelings keyword obtains public sentiment data sequence of the target area in continuous multiple time quantums, and by the public sentiment data sequence Public sentiment data in column constructs candidate feature set as candidate feature.
In the embodiment of the present invention, the relevant public sentiment keyword of influenza mainly includes influenza virus, high fever, cough, nasal obstruction, fast Gram, Tylenol, the infection of the upper respiratory tract, cough-relieving, multiple keywords such as Flu-A;It is obtained according to above-mentioned public sentiment keyword from default channel Take the public sentiment data of target area to be predicted, wherein default channel includes the social networks such as Baidu search and microblogging, public sentiment number According to mainly including Baidu search index of the above-mentioned public sentiment keyword in Baidu, and the publication number on microblogging.Such as gynophore To a certain area as analysis object, then it regard area as target area, obtains the Baidu search of the public sentiment keyword of this area Index and microblogging issue number.
In addition, by Zhou Zuowei time quantum, being obtained in the past in 5 years, the above-mentioned public sentiment keyword in each week in the present embodiment Baidu search index in Baidu and the publication number on microblogging are as public sentiment data, for each public sentiment keyword For, public sentiment data of the public sentiment keyword on a default channel can be formed one include 260 data sequence, Each of sequence data are a candidate features, and all candidate features constitute candidate feature set.
Step S20 carries out Wavelet Denoising Method processing to the candidate feature in the candidate feature set.
After getting candidate feature set, Wavelet Denoising Method processing is carried out to candidate feature therein, to improve feature Correlation.Specifically, step S20 may include following refinement step:
It determines wavelet basis function, each feature in the candidate feature set is formed according to the wavelet basis function Sequence carries out wavelet decomposition, and determines Decomposition order.For example, the sequence formed to the index of Baidu weekly of public sentiment keyword " high fever " Column carry out wavelet decomposition, and based on the principle close with measured signal waveform, selecting db4 is the wavelet basis letter that public sentiment data is decomposed Number.And in the selection of decomposition scale, then according under the length testing of public sentiment data in a certain range different decomposition scale, select Remove the effect preferably lower Decomposition order of signal distortion of making an uproar.The threshold value for determining Wavelet Denoising Method, according to determining threshold value pair The coefficient of each level of candidate feature after wavelet decomposition is adjusted.Specifically:According to the length of the sequence of each feature N determines the threshold value thr of Wavelet Denoising Method, it is assumed that uses over 52 all historical datas, then the length of each characteristic sequence Spend N=52:
Using soft-threshold algorithm, by lesser wavelet coefficient zero setting, shrink process is made to zero to biggish wavelet coefficient, with The coefficient of each level of candidate feature after adjustment decomposition, specific formula are as follows, wherein w is the coefficient before adjustment, and d is adjustment Coefficient afterwards:
Inverse transformation reconstruct is done to adjusted wavelet coefficient, the candidate feature after being denoised.
Step S30 is handled trend is carried out by the candidate feature in Wavelet Denoising Method treated candidate feature set, Candidate feature set is gone trend treated in acquisition.
For the corresponding candidate feature of time quantum each in Wavelet Denoising Method treated candidate feature set, when obtaining this Between the data of continuous multiple time quantums before unit carry out linear regression, it is pre- according to trend to construct trend prediction model It surveys model and obtains the corresponding baseline forecast value of the time quantum;Baseline is subtracted using the actual value of the candidate feature of the time quantum Predicted value obtains the candidate feature after trend.
For example, for each data point of the pretreated candidate feature of Wavelet Denoising Method, (i.e. a time quantum is corresponding Candidate feature), take its preceding 52 weeks data to carry out linear regression building trend prediction model, it is to be understood that if a certain The historical data of data point then carried out linear regression with all historical datas and constructs trend prediction model less than 52 weeks.By becoming Gesture prediction model obtains the baseline forecast value of current data point.Baseline forecast is subtracted with the actual value of the predicted characteristics of current point Value, obtains the predicted characteristics after trend.
Step S40 determines the preset quantity of feature, and filters out from candidate feature set equal to the preset quantity Candidate feature constitutes predicted characteristics set.
Mainly judged by the significance level to candidate feature in the present embodiment, and then is sieved from candidate feature set The higher feature of significance level is selected as predicted characteristics.Based on xgboost (extreme gradient boosting) algorithm It constructs model and the candidate feature in candidate feature set is inputted into learner as learner, and calculated according to feature recursion elimination Method is iterated operation;It obtains learner and passes through the model coefficient that operation returns, each candidate feature is determined according to model coefficient The significance level of each candidate feature in set;It is removed from current candidate feature set according to the significance level of each candidate feature The smallest K candidate feature of significance level;Above-mentioned steps are repeated, until the quantity for the candidate feature that screening obtains reaches pre- If quantity;The candidate feature of preset quantity constitutes predicted characteristics set.Wherein, about the setting of feature quantity, it is based on xgboost Algorithm constructs model as learner, and the candidate feature in candidate feature set is inputted learner, uses feature recursion elimination Feature quantity when cross validation algorithms selection model performance reaches preset condition is as preset quantity.
Step S50 obtains the actual observed value of the influenza-like case percentage in continuous multiple time quantums, according to The prediction mould that the predicted characteristics set and the actual observed value are constructed as training sample, training based on xgboost algorithm Type will determine the prediction model of model parameter as influenza prediction model to determine model parameter.
Specifically, the actual observed value for obtaining the influenza-like case percentage in continuous multiple time quantums, by one The influenza-like case percentage in next week of the predicted characteristics and this week that obtain in week can reflect most as a training sample, selection The data in preceding continuous multiple weeks in the current predictive week of new influenza variation tendency, such as preceding 52 weeks numbers in current predictive week According to as training set progress rolling forecast.Prediction model is constructed based on xgboost algorithm, with gbtree (general Balanced trees, general binary search tree) it is used as booster (accelerator), it should based on the training of square error loss function Prediction model obtains final xgboost prediction model so that above-mentioned loss function minimization, determines model parameter.In addition, Using preceding to Distribution Algorithm, the residual error or residual error approximation for being fitted "current" model by constructing new regression tree, and pass through optimization Regular terms inhibits over-fitting and parallelization to handle boosting algorithm performance.
The generation method for the influenza prediction model that the present embodiment proposes goes trend by using Wavelet Denoising Method and fitting baseline Data preprocessing method, restrained effectively that data fluctuations existing for public sentiment data are big, are become by carrying platform user's usage amount The problems such as gesture variation influences gives prediction modeling bring interference;Moreover, the feature sieve for combining recursive feature to eliminate on this basis Choosing method and based on xgboost algorithm carry out modeling and forecasting, this prediction mode can more accurately reflect the public sentiment factor with Correlation between influenza-like case percentage effectively improves the influenza prediction effect based on public sentiment, can effectively be promoted Influenza prediction accuracy based on public sentiment data.
The present invention also provides a kind of generating means of influenza prediction model.Referring to shown in Fig. 2, mentioned for one embodiment of the invention The schematic diagram of internal structure of the generating means of the influenza prediction model of confession.
In the present embodiment, the generating means 1 of influenza prediction model can be PC (Personal Computer, personal electricity Brain), it is also possible to the terminal devices such as smart phone, tablet computer, portable computer.The generating means 1 of the influenza prediction model Including at least memory 11, processor 12, communication bus 13 and network interface 14.
Wherein, memory 11 include at least a type of readable storage medium storing program for executing, the readable storage medium storing program for executing include flash memory, Hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), magnetic storage, disk, CD etc..Memory 11 It can be the internal storage unit of the generating means 1 of influenza prediction model, such as the influenza prediction model in some embodiments Generating means 1 hard disk.Memory 11 is also possible to the outer of the generating means 1 of influenza prediction model in further embodiments The plug-in type hard disk being equipped in portion's storage equipment, such as the generating means 1 of influenza prediction model, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Further, Memory 11 can also both including influenza prediction model generating means 1 internal storage unit and also including External memory equipment. Memory 11 can be not only used for the application software and Various types of data that storage is installed on the generating means 1 of influenza prediction model, example Such as code of model generator 01 can be also used for temporarily storing the data that has exported or will export.
Processor 12 can be in some embodiments a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chips, the program for being stored in run memory 11 Code or processing data, such as execute model generator 01 etc..
Communication bus 13 is for realizing the connection communication between these components.
Network interface 14 optionally may include standard wireline interface and wireless interface (such as WI-FI interface), be commonly used in Communication connection is established between the device 1 and other electronic equipments.
Optionally, which can also include user interface, and user interface may include display (Display), input Unit such as keyboard (Keyboard), optional user interface can also include standard wireline interface and wireless interface.It is optional Ground, in some embodiments, display can be light-emitting diode display, liquid crystal display, touch-control liquid crystal display and OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) touches device etc..Wherein, display can also be appropriate Referred to as display screen or display unit, for being shown in the information handled in the generating means 1 of influenza prediction model and for showing Show visual user interface.
Fig. 2 illustrates only the generating means 1 of the influenza prediction model with component 11-14 and model generator 01, It will be appreciated by persons skilled in the art that structure shown in fig. 1 does not constitute the limit to the generating means 1 of influenza prediction model It is fixed, it may include perhaps combining certain components or different component layouts than illustrating less perhaps more components.
In 1 embodiment of device shown in Fig. 2, model generator 01 is stored in memory 11;Processor 12 executes Following steps are realized when the model generator 01 stored in memory 11:
Target area to be predicted and the public sentiment keyword with influenza prediction and matching are determined, according to the public sentiment keyword Obtain public sentiment data sequence of the target area in continuous multiple time quantums, and by the carriage in the public sentiment data sequence Feelings data construct candidate feature set as candidate feature.
In the embodiment of the present invention, the relevant public sentiment keyword of influenza mainly includes influenza virus, high fever, cough, nasal obstruction, fast Gram, Tylenol, the infection of the upper respiratory tract, cough-relieving, multiple keywords such as Flu-A;It is obtained according to above-mentioned public sentiment keyword from default channel Take the public sentiment data of target area to be predicted, wherein default channel includes the social networks such as Baidu search and microblogging, public sentiment number According to mainly including Baidu search index of the above-mentioned public sentiment keyword in Baidu, and the publication number on microblogging.Such as gynophore To a certain area as analysis object, then it regard area as target area, obtains the Baidu search of the public sentiment keyword of this area Index and microblogging issue number.
In addition, by Zhou Zuowei time quantum, being obtained in the past in 5 years, the above-mentioned public sentiment keyword in each week in the present embodiment Baidu search index in Baidu and the publication number on microblogging are as public sentiment data, for each public sentiment keyword For, public sentiment data of the public sentiment keyword on a default channel can be formed one include 260 data sequence, Each of sequence data are a candidate features, and all candidate features constitute candidate feature set.
Wavelet Denoising Method processing is carried out to the candidate feature in the candidate feature set.
After getting candidate feature set, Wavelet Denoising Method processing is carried out to candidate feature therein, to improve feature Correlation.Specifically, which may include following refinement step:
It determines wavelet basis function, each feature in the candidate feature set is formed according to the wavelet basis function Sequence carries out wavelet decomposition, and determines Decomposition order.For example, the sequence formed to the index of Baidu weekly of public sentiment keyword " high fever " Column carry out wavelet decomposition, and based on the principle close with measured signal waveform, selecting db4 is the wavelet basis letter that public sentiment data is decomposed Number.And in the selection of decomposition scale, then according under the length testing of public sentiment data in a certain range different decomposition scale, select Remove the effect preferably lower Decomposition order of signal distortion of making an uproar.The threshold value for determining Wavelet Denoising Method, according to determining threshold value pair The coefficient of each level of candidate feature after wavelet decomposition is adjusted.Specifically:According to the length of the sequence of each feature N determines the threshold value thr of Wavelet Denoising Method, it is assumed that uses over 52 all historical datas, then the length of each characteristic sequence Spend N=52:
Using soft-threshold algorithm, by lesser wavelet coefficient zero setting, shrink process is made to zero to biggish wavelet coefficient, with The coefficient of each level of candidate feature after adjustment decomposition, specific formula are as follows, wherein w is the coefficient before adjustment, and d is adjustment Coefficient afterwards:
Inverse transformation reconstruct is done to adjusted wavelet coefficient, the candidate feature after being denoised.
It handles trend is carried out by the candidate feature in Wavelet Denoising Method treated candidate feature set, acquisition goes Gesture treated candidate feature set.
For the corresponding candidate feature of time quantum each in Wavelet Denoising Method treated candidate feature set, when obtaining this Between the data of continuous multiple time quantums before unit carry out linear regression, it is pre- according to trend to construct trend prediction model It surveys model and obtains the corresponding baseline forecast value of the time quantum;Baseline is subtracted using the actual value of the candidate feature of the time quantum Predicted value obtains the candidate feature after trend.
For example, for each data point of the pretreated candidate feature of Wavelet Denoising Method, (i.e. a time quantum is corresponding Candidate feature), take its preceding 52 weeks data to carry out linear regression building trend prediction model, it is to be understood that if a certain The historical data of data point then carried out linear regression with all historical datas and constructs trend prediction model less than 52 weeks.By becoming Gesture prediction model obtains the baseline forecast value of current data point.Baseline forecast is subtracted with the actual value of the predicted characteristics of current point Value, obtains the predicted characteristics after trend.
It determines the preset quantity of feature, and filters out the candidate spy equal to the preset quantity from candidate feature set Sign constitutes predicted characteristics set.
Mainly judged by the significance level to candidate feature in the present embodiment, and then is sieved from candidate feature set The higher feature of significance level is selected as predicted characteristics.Based on xgboost (extreme gradient boosting) algorithm It constructs model and the candidate feature in candidate feature set is inputted into learner as learner, and calculated according to feature recursion elimination Method is iterated operation;It obtains learner and passes through the model coefficient that operation returns, each candidate feature is determined according to model coefficient The significance level of each candidate feature in set;It is removed from current candidate feature set according to the significance level of each candidate feature The smallest K candidate feature of significance level;Above-mentioned steps are repeated, until the quantity for the candidate feature that screening obtains reaches pre- If quantity;The candidate feature of preset quantity constitutes predicted characteristics set.Wherein, about the setting of feature quantity, it is based on xgboost Algorithm constructs model as learner, and the candidate feature in candidate feature set is inputted learner, uses feature recursion elimination Feature quantity when cross validation algorithms selection model performance reaches preset condition is as preset quantity.
The actual observed value for obtaining the influenza-like case percentage in continuous multiple time quantums, according to the prediction Characteristic set and the actual observed value are as training sample, and the trained prediction model constructed based on xgboost algorithm is with determination Model parameter, and the prediction model of model parameter will have been determined as influenza prediction model.
Specifically, the actual observed value for obtaining the influenza-like case percentage in continuous multiple time quantums, by one The influenza-like case percentage in next week of the predicted characteristics and this week that obtain in week can reflect most as a training sample, selection The data in preceding continuous multiple weeks in the current predictive week of new influenza variation tendency, such as preceding 52 weeks numbers in current predictive week According to as training set progress rolling forecast.Prediction model is constructed based on xgboost algorithm, with gbtree (general Balanced trees, general binary search tree) it is used as booster (accelerator), it should based on the training of square error loss function Prediction model obtains final xgboost prediction model so that above-mentioned loss function minimization, determines model parameter.In addition, Using preceding to Distribution Algorithm, the residual error or residual error approximation for being fitted "current" model by constructing new regression tree, and pass through optimization Regular terms inhibits over-fitting and parallelization to handle boosting algorithm performance.
The generating means for the influenza prediction model that the present embodiment proposes go trend by using Wavelet Denoising Method and fitting baseline Data preprocessing method, restrained effectively that data fluctuations existing for public sentiment data are big, are become by carrying platform user's usage amount The problems such as gesture variation influences gives prediction modeling bring interference;Moreover, the feature sieve for combining recursive feature to eliminate on this basis Choosing method and based on xgboost algorithm carry out modeling and forecasting, this prediction mode can more accurately reflect the public sentiment factor with Correlation between influenza-like case percentage effectively improves the influenza prediction effect based on public sentiment, can effectively be promoted Influenza prediction accuracy based on public sentiment data.
Optionally, in other examples, model generator can also be divided into one or more module, and one A or multiple modules are stored in memory 11, and are held by one or more processors (the present embodiment is by processor 12) For row to complete the present invention, the so-called module of the present invention is the series of computation machine program instruction section for referring to complete specific function, Implementation procedure of the program in the generating means of influenza prediction model is generated for descriptive model.
For example, referring to shown in Fig. 3, journey is generated for the model in one embodiment of generating means of influenza prediction model of the present invention The program module schematic diagram of sequence, in the embodiment, model generator can be divided into feature and obtain module 10, first to be located in advance Module 20, the second preprocessing module 30, Feature Selection module 40 and model training module 50 are managed, illustratively:
Feature obtains module 10 and is used for:Determine target area to be predicted and crucial with the public sentiment of influenza prediction and matching Word obtains public sentiment data sequence of the target area in continuous multiple time quantums according to the public sentiment keyword, and will Public sentiment data in the public sentiment data sequence constructs candidate feature set as candidate feature;
First preprocessing module 20 is used for:Wavelet Denoising Method processing is carried out to the candidate feature in the candidate feature set;
Second preprocessing module 30 is used for:To by the candidate feature in Wavelet Denoising Method treated candidate feature set into Row goes trend to handle, and candidate feature set is gone trend treated in acquisition;
Feature Selection module 40 is used for:It determines the preset quantity of feature, and filters out from candidate feature set equal to institute The candidate feature of preset quantity is stated, predicted characteristics set is constituted;
Model training module 50 is used for:Obtain the reality of the influenza-like case percentage in continuous multiple time quantums Observation, according to the predicted characteristics set and the actual observed value as training sample, training is based on xgboost algorithm structure The prediction model built will determine the prediction model of model parameter as influenza prediction model to determine model parameter.
Features described above obtains module 10, the first preprocessing module 20, the second preprocessing module 30,40 and of Feature Selection module The program modules such as model training module 50 are performed realized functions or operations step and are substantially the same with above-described embodiment, This is repeated no more.
In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium On be stored with model generator, the model generator can be executed by one or more processors, to realize following operation:
Target area to be predicted and the public sentiment keyword with influenza prediction and matching are determined, according to the public sentiment keyword Obtain public sentiment data sequence of the target area in continuous multiple time quantums, and by the carriage in the public sentiment data sequence Feelings data construct candidate feature set as candidate feature;
Wavelet Denoising Method processing is carried out to the candidate feature in the candidate feature set;
It handles trend is carried out by the candidate feature in Wavelet Denoising Method treated candidate feature set, acquisition goes Gesture treated candidate feature set;
It determines the preset quantity of feature, and filters out the candidate spy equal to the preset quantity from candidate feature set Sign constitutes predicted characteristics set;
The actual observed value for obtaining the influenza-like case percentage in continuous multiple time quantums, according to the prediction Characteristic set and the actual observed value are as training sample, and the trained prediction model constructed based on xgboost algorithm is with determination Model parameter, and the prediction model of model parameter will have been determined as influenza prediction model.
The generating means and side of computer readable storage medium specific embodiment of the present invention and above-mentioned influenza prediction model Each embodiment of method is essentially identical, does not make tired state herein.
It should be noted that the serial number of the above embodiments of the invention is only for description, do not represent the advantages or disadvantages of the embodiments.And The terms "include", "comprise" herein or any other variant thereof is intended to cover non-exclusive inclusion, so that packet Process, device, article or the method for including a series of elements not only include those elements, but also including being not explicitly listed Other element, or further include for this process, device, article or the intrinsic element of method.Do not limiting more In the case where, the element that is limited by sentence "including a ...", it is not excluded that including process, device, the article of the element Or there is also other identical elements in method.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone, Computer, server or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of generation method of influenza prediction model, which is characterized in that the method includes:
It determines target area to be predicted and the public sentiment keyword with influenza prediction and matching, is obtained according to the public sentiment keyword Public sentiment data sequence of the target area in continuous multiple time quantums, and by the public sentiment number in the public sentiment data sequence According to as candidate feature, candidate feature set is constructed;
Wavelet Denoising Method processing is carried out to the candidate feature in the candidate feature set;
It handles trend is carried out by the candidate feature in Wavelet Denoising Method treated candidate feature set, acquisition is gone at trend Candidate feature set after reason;
It determines the preset quantity of feature, and filters out the candidate feature equal to the preset quantity, structure from candidate feature set At predicted characteristics set;
The actual observed value for obtaining the influenza-like case percentage in continuous multiple time quantums, according to the predicted characteristics Set and the actual observed value are as training sample, and the trained prediction model constructed based on xgboost algorithm is to determine model Parameter, and the prediction model of model parameter will have been determined as influenza prediction model.
2. the generation method of influenza prediction model as described in claim 1, which is characterized in that described to the candidate characteristic set Candidate feature in conjunction carries out the step of Wavelet Denoising Method processing and includes:
Determine wavelet basis function, the sequence formed according to the wavelet basis function to each feature in the candidate feature set Wavelet decomposition is carried out, and determines Decomposition order;
The threshold value for determining Wavelet Denoising Method is carried out according to coefficient of the determining threshold value to each level of the predicted characteristics after wavelet decomposition Adjustment;
Inverse transformation reconstruct is done to adjusted wavelet coefficient, the candidate feature after being denoised.
3. the generation method of influenza prediction model as described in claim 1, which is characterized in that described pair by Wavelet Denoising Method The candidate feature in candidate feature set after reason carries out trend and handles, and obtains and goes trend treated candidate feature set Step includes:
For the corresponding candidate feature of time quantum each in Wavelet Denoising Method treated candidate feature set, time list is obtained The data of continuous multiple time quantums before member carry out linear regression, pre- according to the trend to construct trend prediction model It surveys model and obtains the corresponding baseline forecast value of the time quantum;
The baseline forecast value is subtracted using the actual value of the candidate feature of the time quantum, obtains the candidate spy after trend Sign.
4. the generation method of influenza prediction model as claimed any one in claims 1 to 3, which is characterized in that the determination The step of preset quantity of feature includes:
It, will be described in the candidate feature input in the candidate feature set based on xgboost algorithm building model as learner Learner, feature quantity when feature recursion elimination cross validation algorithms selection model performance being used to reach preset condition is as institute State preset quantity.
5. the generation method of influenza prediction model as claimed in claim 4, which is characterized in that described from candidate feature set The step of filtering out the candidate feature equal to the preset quantity, constituting predicted characteristics set include:
It, will be described in the candidate feature input in the candidate feature set based on xgboost algorithm building model as learner Learner, and operation is iterated according to feature recursion elimination algorithm;
It obtains the learner and passes through the model coefficient that operation returns, each candidate feature set is determined according to the model coefficient In each candidate feature significance level;
Significance level the smallest K candidate spy is removed from current candidate feature set according to the significance level of each candidate feature Sign;
Above-mentioned steps are repeated, until the quantity for the candidate feature that screening obtains reaches the preset quantity;
The candidate feature of the preset quantity constitutes predicted characteristics set.
6. a kind of generating means of influenza prediction model, which is characterized in that described device includes memory and processor, described to deposit The model generator that can be run on the processor is stored on reservoir, the model generator is held by the processor Following steps are realized when row:
It determines target area to be predicted and the public sentiment keyword with influenza prediction and matching, is obtained according to the public sentiment keyword Public sentiment data sequence of the target area in continuous multiple time quantums, and by the public sentiment number in the public sentiment data sequence According to as candidate feature, candidate feature set is constructed;
Wavelet Denoising Method processing is carried out to the candidate feature in the candidate feature set;
It handles trend is carried out by the candidate feature in Wavelet Denoising Method treated candidate feature set, acquisition is gone at trend Candidate feature set after reason;
It determines the preset quantity of feature, and filters out the candidate feature equal to the preset quantity, structure from candidate feature set At predicted characteristics set;
The actual observed value for obtaining the influenza-like case percentage in continuous multiple time quantums, according to the predicted characteristics Set and the actual observed value are as training sample, and the trained prediction model constructed based on xgboost algorithm is to determine model Parameter, and the prediction model of model parameter will have been determined as influenza prediction model.
7. the generating means of influenza prediction model as claimed in claim 6, which is characterized in that described to the candidate characteristic set Candidate feature in conjunction carries out the step of Wavelet Denoising Method processing and includes:
Determine wavelet basis function, the sequence formed according to the wavelet basis function to each feature in the candidate feature set Wavelet decomposition is carried out, and determines Decomposition order;
The threshold value for determining Wavelet Denoising Method is carried out according to coefficient of the determining threshold value to each level of the predicted characteristics after wavelet decomposition Adjustment;
Inverse transformation reconstruct is done to adjusted wavelet coefficient, the candidate feature after being denoised.
8. the generating means of influenza prediction model as claimed in claim 6, which is characterized in that described pair by Wavelet Denoising Method The candidate feature in candidate feature set after reason carries out trend and handles, and obtains and goes trend treated candidate feature set Step includes:
For the corresponding candidate feature of time quantum each in Wavelet Denoising Method treated candidate feature set, time list is obtained The data of continuous multiple time quantums before member carry out linear regression, pre- according to the trend to construct trend prediction model It surveys model and obtains the corresponding baseline forecast value of the time quantum;
The baseline forecast value is subtracted using the actual value of the candidate feature of the time quantum, obtains the candidate spy after trend Sign.
9. the generating means of the influenza prediction model as described in any one of claim 6 to 8, which is characterized in that described from time The step of selecting the candidate feature filtered out in characteristic set equal to the preset quantity, constituting predicted characteristics set include:
It, will be described in the candidate feature input in the candidate feature set based on xgboost algorithm building model as learner Learner, and operation is iterated according to feature recursion elimination algorithm;
It obtains the learner and passes through the model coefficient that operation returns, each candidate feature set is determined according to the model coefficient In each candidate feature significance level;
Significance level the smallest K candidate spy is removed from current candidate feature set according to the significance level of each candidate feature Sign;
Above-mentioned steps are repeated, until the quantity for the candidate feature that screening obtains reaches the preset quantity;
The candidate feature of the preset quantity constitutes predicted characteristics set.
10. a kind of computer readable storage medium, which is characterized in that it is raw to be stored with model on the computer readable storage medium At program, the model generator can be executed by one or more processor, to realize as any in claim 1 to 5 The step of generation method of influenza prediction model described in.
CN201810543750.1A 2018-05-31 2018-05-31 Generation method, device and the computer readable storage medium of influenza prediction model Pending CN108831561A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810543750.1A CN108831561A (en) 2018-05-31 2018-05-31 Generation method, device and the computer readable storage medium of influenza prediction model
PCT/CN2018/102119 WO2019227711A1 (en) 2018-05-31 2018-08-24 Method and apparatus for generating influenza prediction model, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810543750.1A CN108831561A (en) 2018-05-31 2018-05-31 Generation method, device and the computer readable storage medium of influenza prediction model

Publications (1)

Publication Number Publication Date
CN108831561A true CN108831561A (en) 2018-11-16

Family

ID=64147082

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810543750.1A Pending CN108831561A (en) 2018-05-31 2018-05-31 Generation method, device and the computer readable storage medium of influenza prediction model

Country Status (2)

Country Link
CN (1) CN108831561A (en)
WO (1) WO2019227711A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109493975A (en) * 2018-12-20 2019-03-19 广州天鹏计算机科技有限公司 Chronic disease recurrence prediction method, apparatus and computer equipment based on xgboost model
CN110111902A (en) * 2019-04-04 2019-08-09 平安科技(深圳)有限公司 Disease cycle prediction technique, device and the storage medium of acute infectious disease
CN112802603A (en) * 2021-02-04 2021-05-14 北京深演智能科技股份有限公司 Method and device for predicting influenza degree
CN113257426A (en) * 2021-06-30 2021-08-13 杭州华网信息技术有限公司 Aggregated group flu prediction system, storage medium and device
CN113704256A (en) * 2021-08-05 2021-11-26 北京百度网讯科技有限公司 Data identification method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101847179A (en) * 2010-04-13 2010-09-29 中国疾病预防控制中心病毒病预防控制所 Method for predicting flu antigen through model and application thereof
CN104268408A (en) * 2014-09-28 2015-01-07 江南大学 Energy consumption data macro-forecast method based on wavelet coefficient ARMA model
CN106096623A (en) * 2016-05-25 2016-11-09 中山大学 A kind of crime identifies and Forecasting Methodology
WO2017120579A1 (en) * 2016-01-10 2017-07-13 Presenso, Ltd. System and method for validating unsupervised machine learning models
CN107688872A (en) * 2017-08-20 2018-02-13 平安科技(深圳)有限公司 Forecast model establishes device, method and computer-readable recording medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101728506B1 (en) * 2015-12-21 2017-05-02 고려대학교 산학협력단 Prediction of hpai outbreak route systme and method
CN107871538A (en) * 2016-12-19 2018-04-03 平安科技(深圳)有限公司 Big data Forecasting Methodology and system based on macroscopical factor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101847179A (en) * 2010-04-13 2010-09-29 中国疾病预防控制中心病毒病预防控制所 Method for predicting flu antigen through model and application thereof
CN104268408A (en) * 2014-09-28 2015-01-07 江南大学 Energy consumption data macro-forecast method based on wavelet coefficient ARMA model
WO2017120579A1 (en) * 2016-01-10 2017-07-13 Presenso, Ltd. System and method for validating unsupervised machine learning models
CN106096623A (en) * 2016-05-25 2016-11-09 中山大学 A kind of crime identifies and Forecasting Methodology
CN107688872A (en) * 2017-08-20 2018-02-13 平安科技(深圳)有限公司 Forecast model establishes device, method and computer-readable recording medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘向: ""基于脑电的计算机辅助自动睡眠评分系统"" *
张洪侠 等: ""基于XGBoost算法的2型糖尿病精准预测模型研究"" *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109493975A (en) * 2018-12-20 2019-03-19 广州天鹏计算机科技有限公司 Chronic disease recurrence prediction method, apparatus and computer equipment based on xgboost model
CN110111902A (en) * 2019-04-04 2019-08-09 平安科技(深圳)有限公司 Disease cycle prediction technique, device and the storage medium of acute infectious disease
CN110111902B (en) * 2019-04-04 2022-05-27 平安科技(深圳)有限公司 Acute infectious disease attack period prediction method, device and storage medium
CN112802603A (en) * 2021-02-04 2021-05-14 北京深演智能科技股份有限公司 Method and device for predicting influenza degree
CN113257426A (en) * 2021-06-30 2021-08-13 杭州华网信息技术有限公司 Aggregated group flu prediction system, storage medium and device
CN113257426B (en) * 2021-06-30 2021-09-21 杭州华网信息技术有限公司 Aggregated group flu prediction system, storage medium and device
CN113704256A (en) * 2021-08-05 2021-11-26 北京百度网讯科技有限公司 Data identification method and device, electronic equipment and storage medium
CN113704256B (en) * 2021-08-05 2023-05-23 北京百度网讯科技有限公司 Data identification method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2019227711A1 (en) 2019-12-05

Similar Documents

Publication Publication Date Title
CN108831561A (en) Generation method, device and the computer readable storage medium of influenza prediction model
CN108491928B (en) Model parameter sending method, device, server and storage medium
CN110163476A (en) Project intelligent recommendation method, electronic device and storage medium
CN110413786B (en) Data processing method based on webpage text classification, intelligent terminal and storage medium
WO2019153604A1 (en) Device and method for creating human/machine identification model, and computer readable storage medium
CN110175628A (en) A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation
CN110032632A (en) Intelligent customer service answering method, device and storage medium based on text similarity
CN111027714A (en) Artificial intelligence-based object recommendation model training method, recommendation method and device
CN107292528A (en) Vehicle insurance Risk Forecast Method, device and server
CN107273503A (en) Method and apparatus for generating the parallel text of same language
CN108766585A (en) Generation method, device and the computer readable storage medium of influenza prediction model
CN114048331A (en) Knowledge graph recommendation method and system based on improved KGAT model
CN109918554A (en) Web data crawling method, device, system and computer readable storage medium
US9697204B2 (en) Automatic placement of hyperlinks on words and phrases in documents
CN109871491A (en) Forum postings recommended method, system, equipment and storage medium
CN109190754A (en) Quantitative model generation method, device and electronic equipment
CN110909125B (en) Detection method of media rumor of news-level society
CN107818491A (en) Electronic installation, Products Show method and storage medium based on user's Internet data
CN112396108A (en) Service data evaluation method, device, equipment and computer readable storage medium
CN107145485A (en) Method and apparatus for compressing topic model
CN116361801B (en) Malicious software detection method and system based on semantic information of application program interface
CN113592605B (en) Product recommendation method, device, equipment and storage medium based on similar products
CN107463935A (en) Application class methods and applications sorter
CN108090042A (en) For identifying the method and apparatus of text subject
CN110462638A (en) Training neural network is sharpened using posteriority

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination