CN108648829A - Disease forecasting method and device, computer installation and readable storage medium storing program for executing - Google Patents

Disease forecasting method and device, computer installation and readable storage medium storing program for executing Download PDF

Info

Publication number
CN108648829A
CN108648829A CN201810321868.XA CN201810321868A CN108648829A CN 108648829 A CN108648829 A CN 108648829A CN 201810321868 A CN201810321868 A CN 201810321868A CN 108648829 A CN108648829 A CN 108648829A
Authority
CN
China
Prior art keywords
data
disease
weather
disease surveillance
public sentiment
Prior art date
Application number
CN201810321868.XA
Other languages
Chinese (zh)
Inventor
阮晓雯
徐亮
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Priority to CN201810321868.XA priority Critical patent/CN108648829A/en
Publication of CN108648829A publication Critical patent/CN108648829A/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu

Abstract

A kind of disease forecasting method, the method includes:Obtain disease surveillance data, weather data and public sentiment data;The disease surveillance data, weather data and public sentiment data are pre-processed;Build multilayer LSTM models;The multilayer LSTM models are trained and performance verification, the multilayer LSTM models after being optimized;Predicted time point is predicted using the multilayer LSTM models after the optimization, obtains the disease forecasting result of the predicted time point.The present invention also provides a kind of disease forecasting device, computer installation and readable storage medium storing program for executing.The disease forecasting of high-accuracy may be implemented in the present invention.

Description

Disease forecasting method and device, computer installation and readable storage medium storing program for executing

Technical field

The present invention relates to electric powder predictions, and in particular to a kind of disease forecasting method and device, computer installation and meter Calculation machine readable storage medium storing program for executing.

Background technology

With the acceleration of global economic integration, economy increases with exchange activity, and crowd's flowing is increasingly frequent, is disease The propagation of disease provides favorable environment with outburst, and public health health problem is more and more severeer.Meanwhile social and natural environment Variation occurs, environmental pollution, natural calamity etc. influence increasing for public health event and also increase public health emergency The possibility of outburst.

How EARLY RECOGNITION to disease public health emergency, sends out early warning in time, takes corresponding control as early as possible Loss caused by public health emergency is preferably minimized, is field of public health focus of attention for a long time by measure, It is the important content of hygienic emergency work.Public health emergency early warning is by the collection in relation to data, arranging, dividing Analysis and integration are monitored the sign of event, identify, diagnose with the modern advanced technology such as computer, network, communication With evaluation and alarm, informs that relevant department and the public carry out relevant reply and preparation, take effective prevention and control in time Measure prevents or slows down the generation of accident or reduces the harm of event as far as possible.

An important process in public health emergency early warning is disease forecasting, i.e., according to the disease surveillance number of history According to following disease surveillance data of (i.e. patient data) prediction.With the development of machine learning techniques, more and more engineerings Learning method is applied on disease forecasting.However, traditional machine learning applied to disease forecasting generally requires artificially to go to define Then feature set searches best feature combination from the feature set defined, and effect is often all not good enough, to affect The accuracy rate of disease forecasting.

Invention content

In view of the foregoing, it is necessary to propose a kind of disease forecasting method and device, computer installation and computer-readable The disease forecasting of high-accuracy may be implemented in storage medium.

The first aspect of the application provides a kind of disease forecasting method, the method includes:

Disease surveillance data are obtained, the disease surveillance data are time series datas;

The relevant weather data of disease surveillance data is obtained, the weather data is and the disease surveillance data pair The time series data answered;

The relevant public sentiment data of disease surveillance data is obtained, the public sentiment data is and the disease surveillance data pair The time series data answered;

The disease surveillance data, weather data and public sentiment data are pre-processed;

Build the long short-term memory recurrent neural networks model of multilayer, i.e. multilayer LSTM models;

Training data and verification number are obtained from the pretreated disease surveillance data, weather data and public sentiment data According to using the training data and the verify data is trained to the multilayer LSTM models and performance verification, obtains excellent Multilayer LSTM models after change;

Before obtaining predicted time point in the pretreated disease surveillance data, weather data and public sentiment data Disease surveillance data, weather data and public sentiment data, by before the predicted time point disease surveillance data, weather data and Public sentiment data inputs the multilayer LSTM models after the optimization, obtains the disease forecasting result of the predicted time point.

In alternatively possible realization method, the weather data that captured from webpage includes:

Generate the seed URL and subsequent URL of the api interface towards Weather information website;

HTTP request is sent to the api interface of the Weather information website, request accesses the api interface;

The data content provided the Weather information website is analyzed and is identified, to check the data content;

Judge whether the data content is predetermined information content;

If the data content is predetermined information content, the data content is captured;

It is saved in local using the data content of crawl as the weather data.

In alternatively possible realization method, the public sentiment data includes:

The searching times of specific word;Or

Specific public sentiment website includes the quantity of the public feelings information of specific word.

It is described that the disease surveillance data, weather data and public sentiment data are carried out in alternatively possible realization method Pretreatment includes:

Fill up the missing values in the disease surveillance data, weather data and public sentiment data;

It corrects to the exceptional value in the disease surveillance data, weather data and public sentiment data;

Data Format Transform is carried out to the disease surveillance data, weather data and public sentiment data.

In alternatively possible realization method, the weather data include humidity, temperature, air pressure, precipitation, vapour pressure, Wind speed, wind direction, sunshine time.

In alternatively possible realization method, the multilayer LSTM models include two layers of LSTM elementary layer and one layer of full connection Layer, first layer LSTM elementary layers are used to, to input data construction feature, obtain the first hiding layer unit, second layer LSTM elementary layers For being combined to the described first hiding layer unit, the second hiding layer unit is obtained, the full articulamentum is used for according to Second hiding layer unit obtains prediction result, and each LSTM elementary layers include forgeing door, input gate, out gate, the forgetting door, Input gate, out gate control the memory state of the LSTM elementary layers.

In alternatively possible realization method, the loss function used during the multilayer LSTM model trainings is square Difference, the algorithm used are RMSprop algorithms.

The second aspect of the application provides a kind of disease forecasting device, and described device includes:

First acquisition unit, for obtaining disease surveillance data, the disease surveillance data are time series datas;

Second acquisition unit, for obtaining the relevant weather data of disease surveillance data, the weather data be with The corresponding time series data of the disease surveillance data;

Third acquiring unit, for obtaining the relevant public sentiment data of disease surveillance data, the public sentiment data be with The corresponding time series data of the disease surveillance data;

Pretreatment unit, for being pre-processed to the disease surveillance data, weather data and public sentiment data;

Construction unit, for building the long short-term memory recurrent neural networks model of multilayer, i.e. multilayer LSTM models;

Optimize unit, for obtaining instruction from the pretreated disease surveillance data, weather data and public sentiment data Practice data and verify data, using the training data and the verify data is trained to the multilayer LSTM models and property It is able to verify that, the multilayer LSTM models after being optimized;

Predicting unit, it is pre- for being obtained from the pretreated disease surveillance data, weather data and public sentiment data Disease surveillance data, weather data and the public sentiment data before time point are surveyed, by the disease surveillance before the predicted time point Data, weather data and public sentiment data input the multilayer LSTM models after the optimization, obtain the disease of the predicted time point Prediction result.

The third aspect of the application provides a kind of computer installation, and the computer installation includes processor, the processing Device when executing the computer program stored in memory for realizing the disease forecasting method.

The fourth aspect of the application provides a kind of computer readable storage medium, is stored thereon with computer program, described The disease forecasting method is realized when computer program is executed by processor.

The present invention obtains disease surveillance data, and the disease surveillance data are time series datas;Obtain the disease prison The relevant weather data of measured data, the weather data are time series datas corresponding with the disease surveillance data;It obtains The relevant public sentiment data of disease surveillance data, the public sentiment data are time serieses corresponding with the disease surveillance data Data;The disease surveillance data, weather data and public sentiment data are pre-processed;Build the long short-term memory recurrence god of multilayer Through network model, i.e. multilayer LSTM models;It is obtained from the pretreated disease surveillance data, weather data and public sentiment data Training data and verify data are taken, the multilayer LSTM models are trained using the training data and the verify data And performance verification, the multilayer LSTM models after being optimized;From the pretreated disease surveillance data, weather data and carriage Disease surveillance data, weather data and the public sentiment data before predicted time point are obtained in feelings data, by the predicted time point Disease surveillance data, weather data and public sentiment data before inputs the multilayer LSTM models after the optimization, obtains described pre- Survey the disease forecasting result at time point.

The present invention predicts illness data by multilayer LSTM models.LSTM models can directly go to carry from data Knowledge is taken, the feature vector for being conducive to prediction is constructed, improves precision of prediction.Meanwhile with traditional RNN (Recurrent Neural Networks, Recognition with Recurrent Neural Network) model compares, and LSTM models, which are solved, to be drawn in the case that time series data amount is excessive The problem of gradient disappears when the long-term dependence risen.Also, weather data, public sentiment data are being added the present invention as influence factor Into disease forecasting, the accuracy of disease forecasting is improved.Therefore, the present invention realizes the disease forecasting of high-accuracy.

Description of the drawings

Fig. 1 is the flow chart for the disease forecasting method that the embodiment of the present invention one provides.

Fig. 2 is the acquisition relevant weather data of disease surveillance data in disease forecasting method provided by Embodiment 2 of the present invention Refined flow chart.

Fig. 3 is the structure chart for the disease forecasting device that the embodiment of the present invention three provides.

Fig. 4 is the refinement structure chart of second acquisition unit in the disease forecasting device that the embodiment of the present invention four provides.

Fig. 5 is the schematic diagram for the computer installation that the embodiment of the present invention five provides.

Specific implementation mode

To better understand the objects, features and advantages of the present invention, below in conjunction with the accompanying drawings and specific real Applying example, the present invention will be described in detail.It should be noted that in the absence of conflict, embodiments herein and embodiment In feature can be combined with each other.

Elaborate many details in the following description to facilitate a thorough understanding of the present invention, described embodiment only It is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill The every other embodiment that personnel are obtained without making creative work, shall fall within the protection scope of the present invention.

Unless otherwise defined, all of technologies and scientific terms used here by the article and belong to the technical field of the present invention The normally understood meaning of technical staff is identical.Used term is intended merely to description tool in the description of the invention herein The purpose of the embodiment of body, it is not intended that in the limitation present invention.

Preferably, disease forecasting method of the invention is applied in one or more computer installation.The computer Device be it is a kind of can be according to the instruction for being previously set or storing, the automatic equipment for carrying out numerical computations and/or information processing, Hardware includes but not limited to microprocessor, application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processing unit (Digital Signal Processor, DSP), embedded device etc..

The computer installation can be that the calculating such as desktop PC, notebook, palm PC and cloud server are set It is standby.The computer installation can with user by modes such as keyboard, mouse, remote controler, touch tablet or voice-operated devices into pedestrian Machine interacts.

Embodiment one

Fig. 1 is the flow chart for the disease forecasting method that the embodiment of the present invention one provides.The disease forecasting method is applied to Computer installation.The disease forecasting method carries out disease surveillance data using long short-term memory recurrent neural networks model pre- It surveys, obtains the disease forecasting result of high-accuracy.

As shown in Figure 1, the disease forecasting method specifically includes following steps:

Step 101, disease surveillance data are obtained, the disease surveillance data are time series datas.

The disease surveillance data may include the illness number of the diseases such as influenza, hand-foot-and-mouth disease, measles, mumps According to.

The disease surveillance network being made of multiple monitoring points can be established in predeterminable area (such as provinces and cities, area), from institute It states monitoring point and obtains disease surveillance data, the time series data of disease surveillance is made of the disease surveillance data.It can select Select medical institutions, school and mechanism of nursery schools and childcare centres, pharmacy etc. and be used as monitoring point, respectively to corresponding target group carry out disease surveillance and Data acquire.The place for meeting preset condition can be selected as monitoring point.The preset condition may include number, scale Deng.For example, select number of student reach preset quantity school and mechanism of nursery schools and childcare centres as monitoring point.For another example, select scale (such as Counted using daily sales) reach the pharmacy of default scale as monitoring point.For another example, select scale (such as with day medical treatment number unite Meter) reach the hospital of default scale as monitoring point.

The disease surveillance data of different time constitute the time series data of disease surveillance.For example, can will be single with day The collected disease surveillance data in position constitute the time series data of disease surveillance.Alternatively, can will be collected as unit of week Disease surveillance data constitute disease surveillance time series data.

Medical institutions' (including mainly hospital) are the places that can most capture disease and break out omen in early days, are to carry out disease surveillance First choice.Can go to a doctor situation according to patient, obtain disease surveillance data.

A part of disease people can voluntarily go pharmacy's purchase medicine to alleviate early symptom, therefore, can be according to the drug pin of pharmacy Situation is sold, disease surveillance data are obtained.

The people at highest risk and the important link during transmission that Children and teenager is disease, should also reinforce pair The monitoring of the crowd.School and mechanism of nursery schools and childcare centres are to monitor the preferable place of Children and teenager disease incidence situation.It can basis The situation of asking for leave of the Children and teenager of school and mechanism of nursery schools and childcare centres obtains disease surveillance data.

Therefore, medical institutions, school and mechanism of nursery schools and childcare centres, this three classes place of pharmacy is mainly selected to carry out disease prison in the present invention The acquisition of measured data.Certainly, the above-mentioned selection to data source can not limit and increase in a further embodiment or replace it He pays close attention to the data source of crowd or place as monitoring.For example, hotel can be included in disease surveillance range, hotel is obtained Move in the disease surveillance data of personnel.

As needed, the disease surveillance data that any type monitoring point (such as medical institutions) acquires can be taken to constitute disease The time series data of monitoring.For example, the time series number of the disease surveillance data composition disease surveillance of hospital's acquisition can be taken According to.Alternatively, the time series data of disease surveillance can be constituted in conjunction with the disease surveillance data of multiclass monitoring point acquisition.For example, , using the disease surveillance data that pharmacy participates in as supplement, disease prison can be constituted based on the disease surveillance data of hospital's acquisition The time series data of survey.

Disease surveillance data may include the medical number, consultation rate, the illness data such as number, incidence of falling ill of disease.For example, The daily medical number that disease (such as influenza) can be obtained from medical institutions (such as hospital), by the daily of disease (such as influenza) Medical number is used as disease surveillance data.For another example, the daily morbidity number that the disease (such as influenza) of student can be obtained from school, will The daily morbidity number of disease (such as influenza) is as disease surveillance data.

Step 102, the relevant weather data of disease surveillance data is obtained, the weather data is supervised with the disease The corresponding time series data of measured data.

The relevant weather data of disease surveillance data refers to having an impact to disease surveillance data (i.e. the illness data of disease) Weather data.Influence of the different weather data to the disease surveillance data can be analyzed in advance, determined according to analysis result The weather data for having an impact or being affected to the disease surveillance data.

The weather data may include humidity, temperature, air pressure, precipitation, vapour pressure, wind speed, wind direction, sunshine time. In one embodiment, the weather data may include daily temperature on average, average gas pressure, the highest temperature, minimum gas Temperature, average relative humidity, minimum relative humidity, precipitation, mean wind speed, sunshine time, average vapour pressure.

The weather data period corresponding with the disease surveillance data is identical, also, the weather data and institute The measurement period (such as daily, weekly) for stating disease surveillance data is identical.For example, the disease surveillance data are 1-2 in 2018 The daily medical number of the moon, the weather data is the daily weather data of the 1-2 months in 2018.For another example, the disease surveillance data For the number of going to a doctor weekly of the 1-12 months in 2017, the weather data is (such as the Zhou Ping of weather data weekly of the 1-12 months in 2017 Equal temperature).

It can be from Weather information website (such as Chinese weather net, Sina's weather, Sohu's weather etc.) the crawl day destiny According to improve the reliability of weather data.It is appreciated that the weather data can be captured from arbitrary webpage.

The weather data of presumptive area can be captured.The presumptive area may include province, city, area etc..For example, crawl The weather data of Shenzhen.

The weather data of predetermined time can be captured.The predetermined time may include year, month, day etc..For example, crawl The daily weather data of the 1-2 months in 2018.

The weather data can be captured by web crawlers.Web crawlers, which is one, can automatically extract web data letter Cease the application program of content.Web crawlers is typically to be opened from the URL (also referred to as seed URL) of either several Initial pages Begin, obtain the URL of Initial page, according to specific algorithm and strategy (such as depth-first search strategy), is carried out to webpage It during crawl, is constantly put into corresponding queue from extracting new URL in current webpage, stops item until meeting Until part.URL is the abbreviation of Uniform Resource Locator, i.e. uniform resource locator.

Api interface (such as api interface of Chinese weather net opening) crawl institute that Weather information website opens can be utilized State weather data.API is the abbreviation of application programming interfaces (application interface), be may be implemented by api interface Being in communication with each other between computer software.The api interface that Weather information website opens can return to JSON formats or XML format Data.

In one embodiment, the api interface that Weather information website can be utilized to open, institute is captured by web crawlers State weather data.The api interface opened using Weather information website, the specific mistake of the weather data is captured by web crawlers Journey is referring to Fig. 2.

Step 103, the relevant public sentiment data of disease surveillance data is obtained, the public sentiment data is supervised with the disease The corresponding time series data of measured data.

The relevant public sentiment data of disease surveillance data refers to embodying the public sentiment data of the disease surveillance data.Citing comes It says, when disease (such as influenza) enters epizootic modeling, as number of patients increases, many people understand the relevant word of internet searching disease Language (such as the specific words such as influenza, Tamiflu, high fever), the volumes of searches of these words greatly increases.For another example, when disease (such as influenza) When into epizootic modeling, as number of patients increases, the disease phase issued on the public sentiments such as news, forum, blog, mhkc website is inside the Pass Hold (such as illness information, treatment information etc.) to increase.Therefore, can be assisted using the relevant public sentiment data of disease surveillance data into Row disease forecasting.

The public sentiment data may include the searching times of specific word.For example, preset search engine can be counted to specific The searching times (such as the presetting search engine in given area is to daily searching times of specific word) of word.

The public sentiment data can also include that specific public sentiment website (such as news, forum, blog, mhkc etc.) includes specific The quantity of the public feelings information of word.

The specific word be with the relevant word of the disease of prediction, for example, the specific word is the relevant word of disease symptoms Language, when the disease of prediction is influenza, the specific word may include:Morbidity suddenly, high fever, chilly, headache, inability, throat Inflammation, DOMS, dry cough etc..For another example, when the disease of prediction is brothers mouthful, the specific word may include:Stomatalgia is detested Food, low-heat, hand exanthema vesiculosum, oral area aphtha etc..

The public sentiment data period corresponding with the disease surveillance data is identical, also, the public sentiment data and institute The measurement period (such as daily, weekly) for stating disease surveillance is identical.For example, the disease surveillance data are the 1-2 months in 2018 Daily medical number, then the public sentiment data is the daily public sentiment data (such as specific word day searching times) of the 1-2 months in 2018.Again Such as, the disease surveillance data are the number of going to a doctor weekly of the 1-12 months in 2017, then the public sentiment data is the 1-12 months in 2017 Public sentiment data (such as specific word week searching times) weekly.

It is appreciated that step 101-103 can be executed with random order, can also execute parallel.

Step 104, the disease surveillance data, weather data and public sentiment data are pre-processed.

The pretreatment of disease surveillance data, weather data and public sentiment data may include dealing of abnormal data.Disease is supervised Measured data, weather data and public sentiment data carry out dealing of abnormal data, are to correct the disease surveillance data, weather data With the abnormal data in public sentiment data, the reliability and accuracy of disease forecasting are improved.

The dealing of abnormal data may include filling up lacking in the disease surveillance data, weather data and public sentiment data Mistake value.Missing values can be filled by the average value or intermediate value of data before and after missing values, alternatively, recurrence can be passed through The method of fitting is filled missing values.

The dealing of abnormal data can also include correcting in the disease surveillance data, weather data and public sentiment data Exceptional value.The exceptional value is to deviate considerably from the numerical value of other data.Interpolation method may be used and correct the exceptional value.

The pretreatment of disease surveillance data, weather data and public sentiment data can also include to the disease surveillance data, Weather data and public sentiment data carry out Data Format Transform.For example, being carried out to disease surveillance data, weather data and public sentiment data Standardization so that disease surveillance data, weather data and the consistent reference format of public sentiment data, to be suitable as The input data of LSTM models.

Step 105, the structure long short-term memory recurrent neural network of multilayer (Long Short-term Memory Recurrent Neural Network) model, i.e. multilayer LSTM models.The multilayer LSTM models include two layers of LSTM unit Layer and one layer of full articulamentum, first layer LSTM elementary layers are used for input data (such as the disease surveillance data, weather data The input data constituted with public sentiment data) construction feature, the first hiding layer unit is obtained, the second layer LSTM elementary layers are used for Described first hiding layer unit is combined, the second hiding layer unit is obtained, the full articulamentum is used for according to described second Hiding layer unit obtains prediction result (such as disease forecasting result), and each LSTM elementary layers include forgeing door, input gate, output Door, the memory state forgotten door, input gate, out gate and control the LSTM elementary layers.

LSTM models are a kind of time recurrent neural networks models.Relative to traditional Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN) model, LSTM models by storing information in some doors of LSTM units layer building, therefore its During model training, gradient will not disappear quickly.

The multilayer LSTM models that this method uses include two layers of LSTM elementary layer and one layer of full articulamentum, first layer LSTM mono- First layer is used for special to input data (such as input data of disease surveillance data, weather data and public sentiment data composition) construction Sign, obtains the first hiding layer unit, and the second layer LSTM elementary layers are obtained for being combined to the described first hiding layer unit To the second hiding layer unit.The full articulamentum obtains predicted value according to the described second hiding layer unit.First hidden layer Unit is local feature, and the second hiding layer unit is global characteristics.That is, first layer LSTM elementary layers are for extracting Local message, second layer LSTM elementary layers are used to obtain global characteristics in conjunction with local feature, and the full articulamentum is used for according to complete Office's feature obtains prediction result (such as disease forecasting result).

LSTM elementary layers include forgeing door, input gate, out gate, and the forgetting door, input gate, out gate control LSTM are mono- The memory state of first layer.Input gate decides whether to receive the input at current time.Out gate decides whether export memory shape State.

In one embodiment, the forgetting door f of LSTM elementary layerst, input gate it, out gate ot, memory state ct and hidden layer Unit htIt can calculate as follows:

ft=σ (Wfxt+Ufht-1+bf);

it=σ (Wixt+Uiht-1+bi);

ot=σ (Woxt+Uoht-1+bo);

Wherein, Wf、Uf、bfTo forget the parameter of door, Wi、Ui、biFor the parameter of input gate, Wo、Uo、boFor the ginseng of out gate Number, Wc、Uc、bcFor the parameter of mnemon.

In another embodiment, the forgetting door f of LSTM elementary layerst, input gate it, out gate ot, memory state ctWith hide Layer unit htIt can calculate as follows:

ft=σ (Wfxt+Ufct-1+bf);

it=σ (Wixt+Uict-1+bi);

ot=σ (Woxt+Uoct-1+bo);

Step 106, training data is obtained from the pretreated disease surveillance data, weather data and public sentiment data And verify data, using the training data and the verify data is trained to the multilayer LSTM models and performance is tested Card, the multilayer LSTM models after being optimized.

Can from the pretreated disease surveillance data, weather data and public sentiment data interception time sequence number According to constituting the training data and the verify data.

The input data of the multilayer LSTM models is the vector of a default dimension (such as 1000 dimensions).It can be from interception Time series data in by the corresponding pretreated disease surveillance data of each time point, weather data and public sentiment data structure The vector of a default dimension is made, sequentially in time, the corresponding vector of Each point in time is sequentially input into the multilayer LSTM Model, for the multilayer LSTM models are trained or are verified.

For example, interception is for training from the pretreated disease surveillance data, weather data and public sentiment data State the first time sequence data of multilayer LSTM models;It is from the first time sequence data of interception that each time point is corresponding Pretreated disease surveillance data, weather data and public sentiment data construct the primary vector of a default dimension, according to the time Sequentially, the corresponding primary vector of Each point in time is sequentially input into the multilayer LSTM models, for the multilayer LSTM moulds Type is trained.Interception is described for verifying from the pretreated disease surveillance data, weather data and public sentiment data Second time series data of multilayer LSTM models;It is from the second time series data of interception that each time point is corresponding pre- Disease surveillance data that treated, weather data and public sentiment data construct the secondary vector of a default dimension, suitable according to the time The corresponding secondary vector of Each point in time is sequentially input the multilayer LSTM models by sequence, for the multilayer LSTM models It is verified.

When being trained to the multilayer LSTM models, the loss function of the multilayer LSTM models can be defined as Variance adjusts the parameter of the multilayer LSTM models, and the mean square deviation is made to obtain minimum value.Trained process may be used RMSprop algorithms.RMSprop is a kind of improved stochastic gradient descent algorithm.Mean square deviation and RMSprop algorithms are existing skills Art, details are not described herein again.

Step 107, predicted time is obtained from the pretreated disease surveillance data, weather data and public sentiment data Disease surveillance data, weather data and public sentiment data before point, by the disease surveillance data before the predicted time point, day Destiny evidence and public sentiment data input the multilayer LSTM models after the optimization, obtain the disease forecasting knot of the predicted time point Fruit.

Disease surveillance data, weather data and public sentiment data before the predicted time point of acquisition are time series data. It can be from disease surveillance data, weather data and the public sentiment data before the predicted time point of acquisition, by each time point pair Pretreated disease surveillance data, weather data and the public sentiment data answered construct the third vector of a default dimension, according to The corresponding third vector of Each point in time is sequentially input the multilayer LSTM models, to be clicked through to predicted time by time sequencing Row disease forecasting.

When carrying out disease forecasting, since initial time point, the multilayer LSTM models after optimization pass through current point in time Input data and the hiding layer unit of previous time point successively combine and obtain each hiding layer unit of current point in time, according to working as The hiding layer unit at preceding time point obtains the predicted value of current point in time, and according to time sequencing, and continuous recurrence obtains lower a period of time Between the hiding layer unit put and predicted value, until obtaining the predicted value of the given point in time.

Embodiment one predicts illness data by multilayer LSTM models.LSTM models can directly be gone from data Knowledge is extracted, the feature vector for being conducive to prediction is constructed, improves precision of prediction.Meanwhile compared with traditional RNN models, LSTM models solve the problems, such as that gradient disappears in the excessive caused long-term dependence of time series data amount.Also, implement Example one in being added to disease forecasting, improves the accuracy of disease forecasting using weather data, public sentiment data as influence factor.

Embodiment two

Fig. 2 is the acquisition relevant weather data of disease surveillance data in disease forecasting method provided by Embodiment 2 of the present invention (the i.e. refined flow chart of step 102) in Fig. 1.

The api interface that Weather information website can be utilized to open, the weather data is captured by web crawlers.Refering to figure Shown in 2, following steps are can specifically include:

Step 201, the seed URL and subsequent URL of the api interface towards the Weather information website are generated.

Seed URL is basis and the premise that web crawlers carries out all work.It can also be more that seed URL, which can be one, It is a.

The design feature of the URL of Weather information website can be analyzed, be obtained according to the design feature of URL subsequent URL。

Step 202, HTTP request is sent to the api interface of the Weather information website, request accesses the api interface.

In a manner of GET HTTP request can be sent to the api interface of the Weather information website.When weather information site is same When meaning obtains the weather data that it is provided, http response is returned to, to inform the operation that can carry out obtaining weather data.

Step 203, the data content provided the Weather information website is analyzed and is identified, to check the data Content.

Weather information website provides the data content of specific format, needs the specific format provided Weather information website Data content is analyzed and is identified, to check the data content.For example, the api interface of the Weather information website provides Data format be JSON formats.JSON is a kind of data interchange format, and the grammer similar to C language has been used to be accustomed to.To this The data content of JSON formats is analyzed and is identified, to check the data content.

Step 204, judge whether the data content is predetermined information content.

Specific weather data in order to obtain needs to judge whether the data content is predetermined information content.If described Whether data content is not predetermined information content, then gives up the data content, otherwise execute next step.

Step 205, if the data content is predetermined information content, the data content is captured.

The final purpose of data grabber is that network data content is grabbed local.For the data content of JSON formats, Depth-first search strategy may be used when capturing the data content and carry out state space search.

Step 206, it is saved in local using the data content of crawl as the weather data.

Database can be created on the computing device, and the weather data is saved in the database.

Traditional web crawlers is all to set one or more entrance URL first, during capturing webpage, according to The strategy of crawl extracts new URL from current web page and is put into queue, to obtain the corresponding web page contents of URL, by webpage Content is saved in local, then, then extracts effective address as entrance URL next time, is finished until creeping.With webpage number The sharp increase of amount, traditional web crawlers can download a large amount of unrelated webpage.The api interface opened using Weather information website is led to It crosses web crawlers and captures the weather data, weather data can be efficiently obtained, to avoid unrelated webpage is downloaded to improve disease The efficiency of disease forecasting.

Embodiment three

Fig. 3 is the structure chart for the disease forecasting device that the embodiment of the present invention three provides.As shown in figure 3, the disease forecasting Device 10 may include:First acquisition unit 301, second acquisition unit 302, third acquiring unit 303, pretreatment unit 304, Construction unit 305, optimization unit 306, predicting unit 307.

First acquisition unit 301, for obtaining disease surveillance data, the disease surveillance data are time series datas.

The disease surveillance data may include the illness number of the diseases such as influenza, hand-foot-and-mouth disease, measles, mumps According to.

The disease surveillance network being made of multiple monitoring points can be established in predeterminable area (such as provinces and cities, area), from institute It states monitoring point and obtains disease surveillance data, the time series data of disease surveillance is made of the disease surveillance data.It can To select medical institutions, school and mechanism of nursery schools and childcare centres, pharmacy etc. as monitoring point, disease prison is carried out to corresponding target group respectively It surveys and data acquires.The place for meeting preset condition can be selected as monitoring point.The preset condition may include number, rule Mould etc..For example, select number of student reach preset quantity school and mechanism of nursery schools and childcare centres as monitoring point.For another example, scale (example is selected Such as counted using daily sales) reach the pharmacy of default scale as monitoring point.For another example, select scale (such as with day medical treatment number Statistics) reach the hospital of default scale as monitoring point.

The disease surveillance data of different time constitute the time series data of disease surveillance.For example, can will be single with day The collected disease surveillance data in position constitute the time series data of disease surveillance.Alternatively, can will be collected as unit of week Disease surveillance data constitute disease surveillance time series data.

Medical institutions' (including mainly hospital) are the places that can most capture disease and break out omen in early days, are to carry out disease surveillance First choice.Can go to a doctor situation according to patient, obtain disease surveillance data.

A part of disease people can voluntarily go pharmacy's purchase medicine to alleviate early symptom, therefore, can be according to the drug pin of pharmacy Situation is sold, disease surveillance data are obtained.

The people at highest risk and the important link during transmission that Children and teenager is disease, should also reinforce pair The monitoring of the crowd.School and mechanism of nursery schools and childcare centres are to monitor the preferable place of Children and teenager disease incidence situation.It can basis The situation of asking for leave of the Children and teenager of school and mechanism of nursery schools and childcare centres obtains disease surveillance data.

Therefore, medical institutions, school and mechanism of nursery schools and childcare centres, this three classes place of pharmacy is mainly selected to carry out disease prison in the present invention The acquisition of measured data.Certainly, the above-mentioned selection to data source can not limit and increase in a further embodiment or replace it He pays close attention to the data source of crowd or place as monitoring.For example, hotel can be included in disease surveillance range, hotel is obtained Move in the disease surveillance data of personnel.

As needed, the disease surveillance data that any type monitoring point (such as medical institutions) acquires can be taken to constitute disease The time series data of monitoring.For example, the time series number of the disease surveillance data composition disease surveillance of hospital's acquisition can be taken According to.Alternatively, the time series data of disease surveillance can be constituted in conjunction with the disease surveillance data of multiclass monitoring point acquisition.For example, , using the disease surveillance data that pharmacy participates in as supplement, disease prison can be constituted based on the disease surveillance data of hospital's acquisition The time series data of survey.

Disease surveillance data may include the medical number, consultation rate, the illness data such as number, incidence of falling ill of disease.For example, The daily medical number that disease (such as influenza) can be obtained from medical institutions (such as hospital), by the daily of disease (such as influenza) Medical number is used as disease surveillance data.For another example, the daily morbidity number that the disease (such as influenza) of student can be obtained from school, will The daily morbidity number of disease (such as influenza) is as disease surveillance data.

Second acquisition unit 302, for obtaining the relevant weather data of disease surveillance data, the weather data is Time series data corresponding with the disease surveillance data.

The relevant weather data of disease surveillance data refers to having an impact to disease surveillance data (i.e. the illness data of disease) Weather data.Influence of the different weather data to the disease surveillance data can be analyzed in advance, determined according to analysis result The weather data for having an impact or being affected to the disease surveillance data.

The weather data may include humidity, temperature, air pressure, precipitation, vapour pressure, wind speed, wind direction, sunshine time. In one embodiment, the weather data may include daily temperature on average, average gas pressure, the highest temperature, minimum gas Temperature, average relative humidity, minimum relative humidity, precipitation, mean wind speed, sunshine time, average vapour pressure.

The weather data period corresponding with the disease surveillance data is identical, also, the weather data and institute The measurement period (such as daily, weekly) for stating disease surveillance data is identical.For example, the disease surveillance data are 1-2 in 2018 The daily medical number of the moon, the weather data is the daily weather data of the 1-2 months in 2018.For another example, the disease surveillance data For the number of going to a doctor weekly of the 1-12 months in 2017, the weather data is (such as the Zhou Ping of weather data weekly of the 1-12 months in 2017 Equal temperature).

It can be from Weather information website (such as Chinese weather net, Sina's weather, Sohu's weather etc.) the crawl day destiny According to improve the reliability of weather data.It is appreciated that the weather data can be captured from arbitrary webpage.

The weather data of presumptive area can be captured.The presumptive area may include province, city, area etc..For example, crawl The weather data of Shenzhen.

The weather data of predetermined time can be captured.The predetermined time may include year, month, day etc..For example, crawl The daily weather data of the 1-2 months in 2018.

The weather data can be captured by web crawlers.Web crawlers, which is one, can automatically extract web data letter Cease the application program of content.Web crawlers is typically to be opened from the URL (also referred to as seed URL) of either several Initial pages Begin, obtain the URL of Initial page, according to specific algorithm and strategy (such as depth-first search strategy), is carried out to webpage It during crawl, is constantly put into corresponding queue from extracting new URL in current webpage, stops item until meeting Until part.URL is the abbreviation of Uniform Resource Locator, i.e. uniform resource locator.

Api interface (such as api interface of Chinese weather net opening) crawl institute that Weather information website opens can be utilized State weather data.API is the abbreviation of application programming interfaces (application interface), be may be implemented by api interface Being in communication with each other between computer software.The api interface that Weather information website opens can return to JSON formats or XML format Data.

In one embodiment, the api interface that Weather information website can be utilized to open, institute is captured by web crawlers State weather data.The api interface opened using Weather information website, the specific mistake of the weather data is captured by web crawlers Journey is referring to Fig. 2.

Third acquiring unit 303, for obtaining the relevant public sentiment data of disease surveillance data, the public sentiment data is Time series data corresponding with the disease surveillance data.

The relevant public sentiment data of disease surveillance data refers to embodying the public sentiment data of the disease surveillance data.Citing comes It says, when disease (such as influenza) enters epizootic modeling, as number of patients increases, many people understand the relevant word of internet searching disease Language (such as the specific words such as influenza, Tamiflu, high fever), the volumes of searches of these words greatly increases.For another example, when disease (such as influenza) When into epizootic modeling, as number of patients increases, the disease phase issued on the public sentiments such as news, forum, blog, mhkc website is inside the Pass Hold (such as illness information, treatment information etc.) to increase.Therefore, can be assisted using the relevant public sentiment data of disease surveillance data into Row disease forecasting.

The public sentiment data may include the searching times of specific word.For example, preset search engine can be counted to specific The searching times (such as the presetting search engine in given area is to daily searching times of specific word) of word.

The public sentiment data can also include that specific public sentiment website (such as news, forum, blog, mhkc etc.) includes specific The quantity of the public feelings information of word.

The specific word be with the relevant word of the disease of prediction, for example, the specific word is the relevant word of disease symptoms Language, when the disease of prediction is influenza, the specific word may include:Morbidity suddenly, high fever, chilly, headache, inability, throat Inflammation, DOMS, dry cough etc..For another example, when the disease of prediction is brothers mouthful, the specific word may include:Stomatalgia is detested Food, low-heat, hand exanthema vesiculosum, oral area aphtha etc..

The public sentiment data period corresponding with the disease surveillance data is identical, also, the public sentiment data and institute The measurement period (such as daily, weekly) for stating disease surveillance is identical.For example, the disease surveillance data are the 1-2 months in 2018 Daily medical number, then the public sentiment data is the daily public sentiment data (such as specific word day searching times) of the 1-2 months in 2018.Again Such as, the disease surveillance data are the number of going to a doctor weekly of the 1-12 months in 2017, then the public sentiment data is the 1-12 months in 2017 Public sentiment data (such as specific word week searching times) weekly.

Pretreatment unit 304, for being pre-processed to the disease surveillance data, weather data and public sentiment data.

The pretreatment of disease surveillance data, weather data and public sentiment data may include dealing of abnormal data.Disease is supervised Measured data, weather data and public sentiment data carry out dealing of abnormal data, are to correct the disease surveillance data, weather data With the abnormal data in public sentiment data, the reliability and accuracy of disease forecasting are improved.

The dealing of abnormal data may include filling up lacking in the disease surveillance data, weather data and public sentiment data Mistake value.Missing values can be filled by the average value or intermediate value of data before and after missing values, alternatively, recurrence can be passed through The method of fitting is filled missing values.

The dealing of abnormal data can also include correcting in the disease surveillance data, weather data and public sentiment data Exceptional value.The exceptional value is to deviate considerably from the numerical value of other data.Interpolation method may be used and correct the exceptional value.

The pretreatment of disease surveillance data, weather data and public sentiment data can also include to the disease surveillance data, Weather data and public sentiment data carry out Data Format Transform.For example, being carried out to disease surveillance data, weather data and public sentiment data Standardization so that disease surveillance data, weather data and the consistent reference format of public sentiment data, to be suitable as The input data of LSTM models.

Construction unit 305, for building the long short-term memory recurrent neural network of multilayer (Long Short-term Memory Recurrent Neural Network) model, i.e. multilayer LSTM models.The multilayer LSTM models include two layers of LSTM unit Layer and one layer of full articulamentum, first layer LSTM elementary layers are used for input data (such as the disease surveillance data, weather data The input data constituted with public sentiment data) construction feature, the first hiding layer unit is obtained, the second layer LSTM elementary layers are used for Described first hiding layer unit is combined, the second hiding layer unit is obtained, the full articulamentum is used for according to described second Hiding layer unit obtains prediction result (such as disease forecasting result), and each LSTM elementary layers include forgeing door, input gate, output Door, the memory state forgotten door, input gate, out gate and control the LSTM elementary layers.

LSTM models are a kind of time recurrent neural networks models.Relative to traditional Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN) model, LSTM models by storing information in some doors of LSTM units layer building, therefore its During model training, gradient will not disappear quickly.

The multilayer LSTM models that this method uses include two layers of LSTM elementary layer and one layer of full articulamentum, first layer LSTM mono- First layer is used for special to input data (such as input data of disease surveillance data, weather data and public sentiment data composition) construction Sign, obtains the first hiding layer unit, and the second layer LSTM elementary layers are obtained for being combined to the described first hiding layer unit To the second hiding layer unit.The full articulamentum obtains predicted value according to the described second hiding layer unit.First hidden layer Unit is local feature, and the second hiding layer unit is global characteristics.That is, first layer LSTM elementary layers are for extracting Local message, second layer LSTM elementary layers are used to obtain global characteristics in conjunction with local feature, and the full articulamentum is used for according to complete Office's feature obtains prediction result (such as disease forecasting result).

LSTM elementary layers include forgeing door, input gate, out gate, and the forgetting door, input gate, out gate control LSTM are mono- The memory state of first layer.Input gate decides whether to receive the input at current time.Out gate decides whether export memory shape State.

In one embodiment, the forgetting door f of LSTM elementary layerst, input gate it, out gate ot, memory state ctAnd hidden layer Unit htIt can calculate as follows:

ft=σ (Wfxt+Ufht-1+bf);

it=σ (Wixt+Uiht-1+bi);

ot=σ (Woxt+Uoht-1+bo);

Wherein, Wf、Uf、bfTo forget the parameter of door, Wi、Ui、biFor the parameter of input gate, Wo、Uo、boFor the ginseng of out gate Number, Wc、Uc、bcFor the parameter of mnemon.

In another embodiment, the forgetting door f of LSTM elementary layerst, input gate it, out gate ot, memory state ctWith hide Layer unit htIt can calculate as follows:

ft=σ (Wfxt+Ufct-1+bf);

it=σ (Wixt+Uict-1+bi);

ot=σ (Woxt+Uoct-1+bo);

Optimize unit 306, for being obtained from the pretreated disease surveillance data, weather data and public sentiment data Training data and verify data, the multilayer LSTM models are trained using the training data and the verify data and Performance verification, the multilayer LSTM models after being optimized.

Can from the pretreated disease surveillance data, weather data and public sentiment data interception time sequence number According to constituting the training data and the verify data.

The input data of the multilayer LSTM models is the vector of a default dimension (such as 1000 dimensions).It can be from interception Time series data in by the corresponding pretreated disease surveillance data of each time point, weather data and public sentiment data structure The vector of a default dimension is made, sequentially in time, the corresponding vector of Each point in time is sequentially input into the multilayer LSTM Model, for the multilayer LSTM models are trained or are verified.

For example, interception is for training from the pretreated disease surveillance data, weather data and public sentiment data State the first time sequence data of multilayer LSTM models;It is from the first time sequence data of interception that each time point is corresponding Pretreated disease surveillance data, weather data and public sentiment data construct the primary vector of a default dimension, according to the time Sequentially, the corresponding primary vector of Each point in time is sequentially input into the multilayer LSTM models, for the multilayer LSTM moulds Type is trained.Interception is described for verifying from the pretreated disease surveillance data, weather data and public sentiment data Second time series data of multilayer LSTM models;It is from the second time series data of interception that each time point is corresponding pre- Disease surveillance data that treated, weather data and public sentiment data construct the secondary vector of a default dimension, suitable according to the time The corresponding secondary vector of Each point in time is sequentially input the multilayer LSTM models by sequence, for the multilayer LSTM models It is verified.

When being trained to the multilayer LSTM models, the loss function of the multilayer LSTM models can be defined as Variance adjusts the parameter of the multilayer LSTM models, and the mean-square value is made to obtain minimum value.Trained process may be used RMSprop algorithms.RMSprop is a kind of improved stochastic gradient descent algorithm.Mean square deviation and RMSprop algorithms are existing skills Art, details are not described herein again.

Predicting unit 307, for being obtained from the pretreated disease surveillance data, weather data and public sentiment data Disease surveillance data, weather data and public sentiment data before predicted time point supervise the disease before the predicted time point Measured data, weather data and public sentiment data input the multilayer LSTM models after the optimization, obtain the disease of the predicted time point Disease forecasting result.

Disease surveillance data, weather data and public sentiment data before the predicted time point of acquisition are time series data. It can be from disease surveillance data, weather data and the public sentiment data before the predicted time point of acquisition, by each time point pair Pretreated disease surveillance data, weather data and the public sentiment data answered construct the third vector of a default dimension, according to The corresponding third vector of Each point in time is sequentially input the multilayer LSTM models, to be clicked through to predicted time by time sequencing Row disease forecasting.

When carrying out disease forecasting, since initial time point, the multilayer LSTM models after optimization pass through current point in time Input data and the hiding layer unit of previous time point successively combine and obtain each hiding layer unit of current point in time, according to working as The hiding layer unit at preceding time point obtains the predicted value of current point in time, and according to time sequencing, and continuous recurrence obtains lower a period of time Between the hiding layer unit put and predicted value, until obtaining the predicted value of the given point in time.

Embodiment threeway is crossed multilayer LSTM models and is predicted illness data.LSTM models can directly be gone from data Knowledge is extracted, the feature vector for being conducive to prediction is constructed, improves precision of prediction.Meanwhile compared with traditional RNN models, LSTM models solve the problems, such as that gradient disappears in the excessive caused long-term dependence of time series data amount.Also, implement Example three in being added to disease forecasting, improves the accuracy of disease forecasting using weather data, public sentiment data as influence factor.

Example IV

Fig. 4 is the refinement of second acquisition unit (i.e. 302 in Fig. 3) in the disease forecasting device that the embodiment of the present invention four provides Structure chart.

Second acquisition unit 302 can utilize the api interface that Weather information website opens, captured by web crawlers described in Weather data.As shown in fig.4, second acquisition unit 302 may include:It generates subelement 3021, request subelement 3022, divide Analyse subelement 3023, judgment sub-unit 3024, crawl subelement 3025, storing sub-units 3026.

Generate subelement 3021, for generates the api interface towards the Weather information website seed URL and subsequently URL.

Seed URL is basis and the premise that web crawlers carries out all work.It can also be more that seed URL, which can be one, It is a.

The design feature of the URL of Weather information website can be analyzed, be obtained according to the design feature of URL subsequent URL。

Subelement 3022 is asked, for sending HTTP request to the api interface of the Weather information website, request accesses institute State api interface.

In a manner of GET HTTP request can be sent to the api interface of the Weather information website.When weather information site is same When meaning obtains the weather data that it is provided, http response is returned to, to inform the operation that can carry out obtaining weather data.

Subelement 3023 is analyzed, the data content for providing the Weather information website is analyzed and is identified, with Check the data content.

Weather information website provides the data content of specific format, needs the specific format provided Weather information website Data content is analyzed and is identified, to check the data content.For example, the api interface of the Weather information website provides Data format be JSON formats.JSON is a kind of data interchange format, and the grammer similar to C language has been used to be accustomed to.To this The data content of JSON formats is analyzed and is identified, to check the data content.

Judgment sub-unit 3024, for judging whether the data content is predetermined information content.

Specific weather data in order to obtain needs to judge whether the data content is predetermined information content.If described Whether data content is not predetermined information content, then gives up the data content, otherwise execute next step.

Subelement 3025 is captured, if being predetermined information content for the data content, captures the data content.

The final purpose of data grabber is that network data content is grabbed local.For the data content of JSON formats, Depth-first search strategy may be used when capturing the data content and carry out state space search.

Storing sub-units 3026, for being saved in local using the data content of crawl as the weather data.

Database can be created on the computing device, and the weather data is saved in the database.

Traditional web crawlers is all to set one or more entrance URL first, during capturing webpage, according to The strategy of crawl extracts new URL from current web page and is put into queue, to obtain the corresponding web page contents of URL, by webpage Content is saved in local, then, then extracts effective address as entrance URL next time, is finished until creeping.With webpage number The sharp increase of amount, traditional web crawlers can download a large amount of unrelated webpage.Second acquisition unit 302 is opened using Weather information website The api interface put captures the weather data by web crawlers, can efficiently obtain weather to avoid unrelated webpage is downloaded Data, to improve the efficiency of disease forecasting.

Embodiment five

Fig. 5 is the schematic diagram for the computer installation that the embodiment of the present invention five provides.The computer installation 1 includes memory 20, processor 30 and the computer program 40 that can be run in the memory 20 and on the processor 30, example are stored in Such as disease forecasting program.The processor 30 is realized when executing the computer program 40 in above-mentioned disease forecasting embodiment of the method The step of, such as step 101-107 shown in FIG. 1.Alternatively, the processor 30 is realized when executing the computer program 40 State the function of each module/unit in device embodiment, such as the unit 301-307 in Fig. 3.

Illustratively, the computer program 40 can be divided into one or more module/units, it is one or Multiple module/units are stored in the memory 20, and are executed by the processor 30, to complete the present invention.Described one A or multiple module/units can be the series of computation machine program instruction section that can complete specific function, which is used for Implementation procedure of the computer program 40 in the computer installation 1 is described.For example, the computer program 40 can be by It is divided into first acquisition unit 301 in Fig. 3, second acquisition unit 302, third acquiring unit 303, pretreatment unit 304, structure Unit 305, optimization unit 306, predicting unit 307 are built, each unit concrete function is referring to embodiment three.

The computer installation 1 can be that the calculating such as desktop PC, notebook, palm PC and cloud server are set It is standby.It will be understood by those skilled in the art that the schematic diagram 5 is only the example of computer installation 1, do not constitute to computer The restriction of device 1 may include either combining certain components or different components, example than illustrating more or fewer components Such as computer installation 1 can also include input-output equipment, network access equipment, bus.

Alleged processor 30 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor 30 can also be any conventional processor Deng the processor 30 is the control centre of the computer installation 1, utilizes various interfaces and connection entire computer dress Set 1 various pieces.

The memory 20 can be used for storing the computer program 40 and/or module/unit, and the processor 30 passes through Operation executes the computer program and/or module/unit being stored in the memory 20, and calls and be stored in memory Data in 20 realize the various functions of the computer installation 1.The memory 20 can include mainly storing program area and deposit Store up data field, wherein storing program area can storage program area, the application program needed at least one function (for example broadcast by sound Playing function, image player function etc.) etc.;Storage data field can be stored uses created data (ratio according to computer installation 1 Such as audio data, phone directory) etc..In addition, memory 20 may include high-speed random access memory, can also include non-easy The property lost memory, such as hard disk, memory, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) block, flash card (Flash Card), at least one disk memory, flush memory device or other Volatile solid-state part.

If the integrated module/unit of the computer installation 1 is realized in the form of SFU software functional unit and as independence Product sale or in use, can be stored in a computer read/write memory medium.Based on this understanding, of the invention It realizes all or part of flow in above-described embodiment method, can also instruct relevant hardware come complete by computer program At the computer program can be stored in a computer readable storage medium, which is being executed by processor When, it can be achieved that the step of above-mentioned each embodiment of the method.Wherein, the computer program includes computer program code, described Computer program code can be source code form, object identification code form, executable file or certain intermediate forms etc..The meter Calculation machine readable medium may include:Can carry the computer program code any entity or device, recording medium, USB flash disk, Mobile hard disk, magnetic disc, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory Device (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It needs to illustrate It is that the content that the computer-readable medium includes can be fitted according to legislation in jurisdiction and the requirement of patent practice When increase and decrease, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium does not include that electric carrier wave is believed Number and telecommunication signal.

In several embodiments provided by the present invention, it should be understood that disclosed computer installation and method, it can be with It realizes by another way.For example, computer installation embodiment described above is only schematical, for example, described The division of unit, only a kind of division of logic function, formula that in actual implementation, there may be another division manner.

In addition, each functional unit in each embodiment of the present invention can be integrated in same treatment unit, it can also That each unit physically exists alone, can also two or more units be integrated in same unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of hardware adds software function module.

It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Profit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent requirements of the claims Variation includes within the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.This Outside, it is clear that one word of " comprising " is not excluded for other units or step, and odd number is not excluded for plural number.It is stated in computer installation claim Multiple units or computer installation can also be realized by software or hardware by the same unit or computer installation.The One, the second equal words are used to indicate names, and are not represented any particular order.

Finally it should be noted that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although reference Preferred embodiment describes the invention in detail, it will be understood by those of ordinary skill in the art that, it can be to the present invention's Technical solution is modified or equivalent replacement, without departing from the spirit of the technical scheme of the invention and range.

Claims (10)

1. a kind of disease forecasting method, which is characterized in that the method includes:
Disease surveillance data are obtained, the disease surveillance data are time series datas;
The relevant weather data of disease surveillance data is obtained, the weather data is corresponding with the disease surveillance data Time series data;
The relevant public sentiment data of disease surveillance data is obtained, the public sentiment data is corresponding with the disease surveillance data Time series data;
The disease surveillance data, weather data and public sentiment data are pre-processed;
Build the long short-term memory recurrent neural networks model of multilayer, i.e. multilayer LSTM models;
Training data and verify data are obtained from the pretreated disease surveillance data, weather data and public sentiment data, Using the training data and the verify data is trained to the multilayer LSTM models and performance verification, after obtaining optimization Multilayer LSTM models;
Disease before obtaining predicted time point in the pretreated disease surveillance data, weather data and public sentiment data Monitoring data, weather data and public sentiment data, by disease surveillance data, weather data and the public sentiment before the predicted time point Data input the multilayer LSTM models after the optimization, obtain the disease forecasting result of the predicted time point.
2. the method as described in claim 1, which is characterized in that it is described from webpage capture weather data include:
Generate the seed URL and subsequent URL of the api interface towards Weather information website;
HTTP request is sent to the api interface of the Weather information website, request accesses the api interface;
The data content provided the Weather information website is analyzed and is identified, to check the data content;
Judge whether the data content is predetermined information content;
If the data content is predetermined information content, the data content is captured;
It is saved in local using the data content of crawl as the weather data.
3. the method as described in claim 1, which is characterized in that the public sentiment data includes:
The searching times of specific word;Or
Specific public sentiment website includes the quantity of the public feelings information of specific word.
4. the method as described in claim 1, which is characterized in that described to the disease surveillance data, weather data and public sentiment Data carry out pretreatment:
Fill up the missing values in the disease surveillance data, weather data and public sentiment data;
It corrects to the exceptional value in the disease surveillance data, weather data and public sentiment data;
Data Format Transform is carried out to the disease surveillance data, weather data and public sentiment data.
5. the method as described in any one of claim 1-4, which is characterized in that the weather data includes humidity, temperature, gas Pressure, precipitation, vapour pressure, wind speed, wind direction, sunshine time.
6. the method as described in any one of claim 1-4, which is characterized in that the multilayer LSTM models include two layers of LSTM Elementary layer and one layer of full articulamentum, first layer LSTM elementary layers are used to, to input data construction feature, obtain the first hidden layer list Member, second layer LSTM elementary layers are used to be combined the described first hiding layer unit, obtain the second hiding layer unit, described complete Articulamentum is used to obtain prediction result according to the described second hiding layer unit, each LSTM elementary layers include forget door, input gate, Out gate, the memory state forgotten door, input gate, out gate and control the LSTM elementary layers.
7. the method as described in any one of claim 1-4, which is characterized in that make during the multilayer LSTM model trainings Loss function is mean square deviation, and the algorithm used is RMSprop algorithms.
8. a kind of disease forecasting device, which is characterized in that described device includes:
First acquisition unit, for obtaining disease surveillance data, the disease surveillance data are time series datas;
Second acquisition unit, for obtaining the relevant weather data of disease surveillance data, the weather data be with it is described The corresponding time series data of disease surveillance data;
Third acquiring unit, for obtaining the relevant public sentiment data of disease surveillance data, the public sentiment data be with it is described The corresponding time series data of disease surveillance data;
Pretreatment unit, for being pre-processed to the disease surveillance data, weather data and public sentiment data;
Construction unit, for building the long short-term memory recurrent neural networks model of multilayer, i.e. multilayer LSTM models;
Optimize unit, for obtaining training number from the pretreated disease surveillance data, weather data and public sentiment data According to and verify data, using the training data and the verify data is trained to the multilayer LSTM models and performance is tested Card, the multilayer LSTM models after being optimized;
Predicting unit, when for obtaining prediction from the pretreated disease surveillance data, weather data and public sentiment data Between put before disease surveillance data, weather data and public sentiment data, by before the predicted time point disease surveillance data, Weather data and public sentiment data input the multilayer LSTM models after the optimization, obtain the disease forecasting knot of the predicted time point Fruit.
9. a kind of computer installation, it is characterised in that:The computer installation includes processor, and the processor is deposited for executing The computer program stored in reservoir is to realize the disease forecasting method as described in any one of claim 1-7.
10. a kind of computer readable storage medium, computer program, feature are stored on the computer readable storage medium It is:The disease forecasting method as described in any one of claim 1-7 is realized when the computer program is executed by processor.
CN201810321868.XA 2018-04-11 2018-04-11 Disease forecasting method and device, computer installation and readable storage medium storing program for executing CN108648829A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810321868.XA CN108648829A (en) 2018-04-11 2018-04-11 Disease forecasting method and device, computer installation and readable storage medium storing program for executing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810321868.XA CN108648829A (en) 2018-04-11 2018-04-11 Disease forecasting method and device, computer installation and readable storage medium storing program for executing
PCT/CN2018/099847 WO2019196286A1 (en) 2018-04-11 2018-08-10 Illness prediction method and device, computer device, and readable storage medium

Publications (1)

Publication Number Publication Date
CN108648829A true CN108648829A (en) 2018-10-12

Family

ID=63746032

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810321868.XA CN108648829A (en) 2018-04-11 2018-04-11 Disease forecasting method and device, computer installation and readable storage medium storing program for executing

Country Status (2)

Country Link
CN (1) CN108648829A (en)
WO (1) WO2019196286A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105022783A (en) * 2015-06-03 2015-11-04 南京邮电大学 Hadoop based user service security system and method
CN105678080A (en) * 2016-01-11 2016-06-15 浪潮集团有限公司 Method for predicting influenza outbreak possibility through big data search and analysis
CN105808942A (en) * 2016-03-04 2016-07-27 深圳市前海安测信息技术有限公司 Analysis and early warning system and method of medical big data
CN105812463A (en) * 2016-03-10 2016-07-27 深圳市前海安测信息技术有限公司 Disease early warning system and method based on medical big data
CN106022527A (en) * 2016-05-27 2016-10-12 河南明晰信息科技有限公司 Trajectory prediction method and device based on map tiling and LSTM cyclic neural network
CN106529113A (en) * 2015-09-15 2017-03-22 平安科技(深圳)有限公司 Reminding information sending method and server
CN107239859A (en) * 2017-06-05 2017-10-10 国网山东省电力公司电力科学研究院 The heating load forecasting method of Recognition with Recurrent Neural Network is remembered based on series connection shot and long term

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015127065A1 (en) * 2014-02-19 2015-08-27 Hrl Laboratories, Llc Disease prediction system using open source data
CN107180152A (en) * 2016-03-09 2017-09-19 日本电气株式会社 Disease forecasting system and method
CN108288502A (en) * 2018-04-11 2018-07-17 平安科技(深圳)有限公司 Disease forecasting method and device, computer installation and readable storage medium storing program for executing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105022783A (en) * 2015-06-03 2015-11-04 南京邮电大学 Hadoop based user service security system and method
CN106529113A (en) * 2015-09-15 2017-03-22 平安科技(深圳)有限公司 Reminding information sending method and server
CN105678080A (en) * 2016-01-11 2016-06-15 浪潮集团有限公司 Method for predicting influenza outbreak possibility through big data search and analysis
CN105808942A (en) * 2016-03-04 2016-07-27 深圳市前海安测信息技术有限公司 Analysis and early warning system and method of medical big data
CN105812463A (en) * 2016-03-10 2016-07-27 深圳市前海安测信息技术有限公司 Disease early warning system and method based on medical big data
CN106022527A (en) * 2016-05-27 2016-10-12 河南明晰信息科技有限公司 Trajectory prediction method and device based on map tiling and LSTM cyclic neural network
CN107239859A (en) * 2017-06-05 2017-10-10 国网山东省电力公司电力科学研究院 The heating load forecasting method of Recognition with Recurrent Neural Network is remembered based on series connection shot and long term

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘键: "基于卷积神经网络的行人检测方法研究", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 *
杨德志: "广义回归神经网络在乙肝发病数时间序列预测中的应用", 《计算机应用与软件》 *

Also Published As

Publication number Publication date
WO2019196286A1 (en) 2019-10-17

Similar Documents

Publication Publication Date Title
Molnar Interpretable machine learning
Castillo Big crisis data: social media in disasters and time-critical situations
Farmer et al. A third wave in the economics of climate change
Ghosh et al. What are we ‘tweeting’about obesity? Mapping tweets with topic modeling and Geographic Information System
Crooks et al. Introduction to agent-based modelling
West Colloquium: Fractional calculus view of complexity: A tutorial
Harris An introduction to exponential random graph modeling
Victora et al. Measuring impact in the Millennium Development Goal era and beyond: a new approach to large-scale effectiveness evaluations
Bollen et al. Twitter mood predicts the stock market
Ren et al. Predicting user-topic opinions in twitter with social and topical context
Wu et al. Evolving RBF neural networks for rainfall prediction using hybrid particle swarm optimization and genetic algorithm
Gujarati Basic econometrics
Nofer et al. Using twitter to predict the stock market
Chen et al. Syndromic surveillance of Flu on Twitter using weakly supervised temporal topic models
Shandra et al. Dependency, democracy, and infant mortality: a quantitative, cross-national analysis of less developed countries
Mattern Methodolatry and the Art of Measure
Mirowski Markets come to bits: Evolution, computation and markomata in economic science
Sornette et al. Importance of positive feedbacks and overconfidence in a self-fulfilling Ising model of financial markets
Schweitzer et al. Economic Networks: What do we know and what do we need to know?
Gorunescu Data Mining: Concepts, models and techniques
Ali et al. Big data for development: applications and techniques
Wang et al. TM-LDA: efficient online modeling of latent topic transitions in social media
Nianogo et al. Agent-based modeling of noncommunicable diseases: a systematic review
Heppenstall et al. “Space, the Final Frontier”: How Good are Agent-Based Models at Simulating Individuals and Space in Cities?
Larson Service science: At the intersection of management, social, and engineering sciences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination