CN105678080A - Method for predicting influenza outbreak possibility through big data search and analysis - Google Patents
Method for predicting influenza outbreak possibility through big data search and analysis Download PDFInfo
- Publication number
- CN105678080A CN105678080A CN201610014183.1A CN201610014183A CN105678080A CN 105678080 A CN105678080 A CN 105678080A CN 201610014183 A CN201610014183 A CN 201610014183A CN 105678080 A CN105678080 A CN 105678080A
- Authority
- CN
- China
- Prior art keywords
- influenza
- data
- prediction
- outbreak
- history
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/80—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method for predicting influenza outbreak possibility through big data search and analysis. The method includes the following operation steps of firstly, determining the route of influenza outbreak time; secondly, determining the route of the search frequency of semantic entries related to influenza outbreak; thirdly, determining expanded semantic entries related to influenza; fourthly, collecting and managing tendency data and time range data of influenza outbreak; fifthly, extracting the internal relation between the historical tendency and the time range of influenza outbreak through a modeling method; sixthly, conducting reliability verification on a model, and making the model reach the optimal prediction capacity by regulating related parameters and algorithms; seventhly, inputting frequency of recent semantic entries related to influenza, and predicting influenza outbreak in future; eighthly, repeatedly executing the steps so that the model can be unceasingly optimized. Statistical prediction is conducted on the possibility of influenza outbreak in future, various corresponding measures are taken on this basis, and benefits are brought to governments and persons for reasonably handling disease risks.
Description
Technical field
The present invention relates to big data analysis extractive technique field, specifically a kind of method by big data search analyses and prediction flu outbreak probability.
Background technology
Big data or claim mass data, mass data, big data, refer to involved data volume huge artificial to passing through, and reaching within reasonable time to intercept, management, are processing and are organized into the information can understood for the mankind. When total amount of data is identical, compared with the small data set of discrete analysis independence, it is analyzed showing that much extra information and data are relational after being merged by each small data set, can be used to discover commercial trends, judge quality of research, avoid disease's spread, fight crime or measure real-time traffic etc.; Such purposes reason that large data collection is prevailing just.
Big data almost cannot use most data base management system to process, and must use " software simultaneously run parallel on tens of, hundreds if not thousands of station servers ". The ability holding the mechanism of data set is depended in the definition of big data, and it is usually for the ability of the software of Treatment Analysis data. " for some tissue, first time is likely to allow them need again to think deeply the option of data management in the face of the data set of hundreds of GB. For its hetero-organization, data set be likely to need to reach tens of or hundreds of Mbytes just they can be caused puzzlement. "
Data mining it be a step in knowledge discovery in database. Data mining generally refers to be hidden in by algorithm search the process of wherein information from substantial amounts of data. Data mining is generally relevant with computer science, and realizes above-mentioned target by all multi-methods such as statistics, Data Environments, information retrieval, machine learning, specialist system (relying on empirical law in the past) and pattern recognitions.
Semanteme has territoriality feature, and the semanteme being not belonging to any field is non-existent. Semantic Heterogeneous then refers to same thing existing difference on explaining, is also just presented as the difference that same thing is understood in different field.For computer science, semanteme generally refers to user for those for describing the explanation of the computer representation (i.e. symbol) of real world, and namely user is used for contacting the approach of computer representation and real world.
Semanteme is the explanation to data symbol, grammer be then for these symbols between organization regulation and the definition of structural relation. For information integration field, data (are absent from for pattern or implicit destructuring and semi-structured data often by pattern, generally require the pattern defining them before integrated) tissue, the access of data also obtains by acting on pattern, at this moment semanteme refers to the implication of schema elements (such as class, attribute, constraint etc.), and grammer is then the structure of schema elements
Influenza is called for short influenza, is the acute respiratory infection that causes of influenza virus, is also the disease that a kind of infectiousness is strong, spread speed is fast. It is mainly through the spittle in air, interpersonal contact or the contact transmission with contaminated article. Typical clinical symptoms is: anxious high heat, overall pain, significantly weak and slight respiratory symptom. Influenza virus can be divided into first (A), second (B), third (C) three type, and antigenic variation often occurs Alphavirus, and infectiousness is big, propagates rapidly, very easily occurs popular on a large scale.
The report of World Health Organization (WHO) shows, every annual has the adult of 1/10 and the childhood infection influenza of 1/3. And when we are all over looking into history, it has been found that the influenza that lethality is surprising is not rarely seen. It it is 1580 to flu outbreak detailed description the earliest. In this year, between the several months, Rome is dead 9000 people just, and Madrid becomes a desolate and uninhabited empty city, and Italy, Spain add the new grave of hundreds of thousands seat. In whole the 17th century, occur in that three influenzas are broken out greatly in the world. 1658, a flu outbreak in city, Venice, Italy made 60,000 people dead. 1742 to 1743, influenza the epidemic diseases caused once affected the bohunk of 90%. In January, 1837, the influenza broken out in Europe is very serious, and in Berlin, the death toll that influenza causes has exceeded births; The all of public business activity in Barcelona stops. 1889 1894, the influenza occurred for these several years swept across whole West Europe, and extensively, mortality rate is high, causes and has a strong impact in morbidity. 1918, break out foremost serious flu outbreak " spanish influenza " in history in the world. After this field flow sense, American average life have dropped 10 years. Nineteen fifty-seven, having broken out " Asia influenza " (Virus Type H2N2), influenza has harassed the All Countries in Asia after two weeks, and in Australia, America and Europe log in, roamed countless country. The whole world still has more than 200 ten thousand people to meet with misfortune. July nineteen sixty-eight translated into by calendar, " Mao flu " caused by influenza A virus (H3N2) breaks out on a large scale in Hong Kong, and according to statistics, it is lethal because infecting that the U.S. has 3.4 ten thousand people, whole London a lot of people catch an illness, it is necessary to large quantities of volunteers nurse. 1976, US soldiers infection " swine flue " camping on New Jersey Ford Otto Dix military base was lethal, and a lot of health officials worry that " spanish influenza " stages a comeback, and have caused national fear. But, this virus was only propagated at that time between the pig of the U.S., and have developed vaccine.In January, 1977, " Russia influenza " (Virus Type H1N1) occurs and popular in the former Soviet Union, begins in January, 1978 in U.S. students and the new recruit that enlists and breaks out. 1997, starting bird flu virus (H5N1) occur, although this virus seldom infects people, but still seized the life of 18 people, these people mostly had with poultry and directly contacted. 2003, since 2003, there was the case that 400 many cases bird flus are lethal in the whole world. In April, 2009, Mexico, the U.S. etc. break out influenza A H1N1 in multiple countries and regions in succession.
Can be seen that from above history, although modern medicine constantly improves, but the periodic trend in local or whole world outburst of influenza is substantially free of and is blocked. Frequent along with various places economic and trade contact, even if area more inaccessible in the past, can be influenced by the threat of whole world virus. Therefore it is badly in need of a kind of method and carrys out the possibility occurrence of Forewarn evaluation influenza pandemic, and then take corresponding countermeasure to stop strick precaution.
Early warning to influenza before, Disease Control and Prevention Center's cohersive and integrated data of various places is leaned in some areas substantially, then reports layer by layer. Global range to lean on official's early warning of World Health Organization (WHO), reality to be also the data successively above wrapped. The drawback of this pattern is quite obvious. First the flu victims of major part energy spontaneous recovery, will not go to see a doctor, and it is very big that this results in influenza infection number digital deflection. Secondly the data successively reported have bigger delay, and a lot of people infect often, and official just can note abnormalities. Additionally these type of data are easy to be subject to the manipulation of anthropic factor, because these data are correlated with the vital interests of the personage that much holds power in some system, this makes this data entirely without credibility. Even more backward area does not have these data at all.
Fortunately the development of adjoint mobile Internet now and popularizing of smart mobile phone, the youngster in China's overwhelming majority area can both connect network, lineal relative plus them can indirectly be surfed the Net by these youngsters, and the coverage rate of network can be seamless coverage. When these people have had flu-like symptom, the first thing feelings that they often to do are to search for influenza related symptoms and related methods for the treatment of on the net, are then told the disease condition of groove oneself by social software. These data all can enter the server of the companies such as Tengxun of Baidu. By the analysis to these data, the flu episode situation of all groups just can be understood timely, thus making corresponding decision.
Summary of the invention
The technical assignment of the present invention is to provide a kind of method by big data search analyses and prediction flu outbreak probability.
The technical assignment of the present invention realizes in the following manner, and the operating procedure of the method is as follows:
Step 1) determines the reliable approach obtaining history flu outbreak time range;
Step 2) determine the reliable approach of the history tendency obtaining the relevant semantic vocabulary entry search frequency of flu outbreak;
Step 3) determines influenza related expanding semanteme entry;
History tendency data and the history flu outbreak time range data of the search rate of the semantic entry of influenza explosion facies pass extension are collected arranging by step 4);
History tendency and the history flu outbreak time range internal relation of the search rate of the semantic entry of influenza explosion facies pass extension are extracted by step 5) by modeling method;
The step 6) model to having built up carries out reliability demonstration by inputting historical data, and adjusts relevant parameter and algorithm so that it is reach optimum prediction ability;
The relevant semantic item frequency of influenza that step 7) input is recent, is predicted future influenza explosion time scope;
Step 8) repeat the above steps 6) and step 7), make model continue to optimize, and tend towards stability.
The process compiled in described step 4) can use script to obtain related data from website, and be stored in local data base and arrange, if without associated rights, and also commercially available related data.
Described step 5) adopts Matlab, SAS or SPSS to be analyzed related data processing.
Described step 6) calls recursive algorithm, correlation coefficient carries out traversal and attempts, and in relatively attempting for each time, the prediction correct probability of forecast model, therefrom finds the correlation matrix of optimum, solidifies forecast model.
The method by big data search analyses and prediction flu outbreak probability of the present invention is compared to the prior art, by to main flow search engine and social software in history Changing Pattern about influenza keyword search frequency carry out mathematical modeling, extract the corresponding relation between the influenza key word frequency of occurrences and the actual outburst of influenza, the probability of future influenza outburst is statistically predicted, make various counter-measure with this, bring benefit for government and individual's rationally reply disease risks.
Detailed description of the invention
Embodiment 1:
Should be as follows by the operating procedure of the method for big data search analyses and prediction flu outbreak probability:
Step 1) determines the reliable approach obtaining history flu outbreak time range;
Step 2) determine the reliable approach of the history tendency obtaining the relevant semantic vocabulary entry search frequency of flu outbreak;
Step 3) determines influenza related expanding semanteme entry; The entry relevant to influenza semanteme is very complicated, but in view of each entry is different to server price power, only need to select wherein bigger being analyzed of power of influence, utilize Baidu to search for, it is also possible to obtain some relevant entries.
History tendency data and the history flu outbreak time range data of the search rate of the semantic entry of influenza explosion facies pass extension are collected arranging by step 4); The process compiled can use script to obtain related data from website, and be stored in local data base and arrange, if without associated rights, and also commercially available related data.
History tendency and the history flu outbreak time range internal relation of the search rate of the semantic entry of influenza explosion facies pass extension are extracted by step 5) by modeling method; Adopt Matlab, SAS or SPSS to be analyzed related data processing, it is determined that a set of effective algorithm model, by adjusting correlation coefficient, can make to predict that probability can reach maximum accurately.
The step 6) model to having built up carries out reliability demonstration by inputting historical data, and adjust relevant parameter and algorithm, call recursive algorithm, correlation coefficient carries out traversal attempt, the prediction correct probability of forecast model in relatively attempting for each time, therefrom find the correlation matrix of optimum, solidify forecast model so that it is reach optimum prediction ability.
The relevant semantic item frequency of influenza that step 7) input is recent, is predicted future influenza explosion time scope; This step only need to input currently searches element frequency, obtains the prediction to future price tendency.
Step 8) repeat the above steps 6) and step 7), make model continue to optimize, and tend towards stability; This step is for continuous dynamic optimization forecast model, is that model prediction accuracy rate keeps optimum.
By the continuous iteration to above step, it is possible to build a set of reliably by the system of the big data prediction flu outbreak probability of network. Benefit is brought for government and individual's rationally reply disease risks.
By detailed description of the invention above, described those skilled in the art can be easy to realize the present invention. It is understood that the present invention is not limited to above-mentioned several detailed description of the invention. On the basis of disclosed embodiment, described those skilled in the art can the different technical characteristic of combination in any, thus realizing different technical schemes.
Claims (4)
1. by the method for big data search analyses and prediction flu outbreak probability, it is characterised in that the operating procedure of the method is as follows:
Step 1) determines the reliable approach obtaining history flu outbreak time range;
Step 2) determine the reliable approach of the history tendency obtaining the relevant semantic vocabulary entry search frequency of flu outbreak;
Step 3) determines influenza related expanding semanteme entry;
History tendency data and the history flu outbreak time range data of the search rate of the semantic entry of influenza explosion facies pass extension are collected arranging by step 4);
History tendency and the history flu outbreak time range internal relation of the search rate of the semantic entry of influenza explosion facies pass extension are extracted by step 5) by modeling method;
The step 6) model to having built up carries out reliability demonstration by inputting historical data, and adjusts relevant parameter and algorithm so that it is reach optimum prediction ability;
The relevant semantic item frequency of influenza that step 7) input is recent, is predicted future influenza explosion time scope;
Step 8) repeat the above steps 6) and step 7), make model continue to optimize, and tend towards stability.
2. the method by big data search analyses and prediction flu outbreak probability according to claim 1, it is characterized in that, the process compiled in described step 4) can use script to obtain related data from website, and be stored in local data base and arrange, if without associated rights, also commercially available related data.
3. the method by big data search analyses and prediction flu outbreak probability according to claim 1, it is characterised in that described step 5) adopts Matlab, SAS or SPSS to be analyzed related data processing.
4. the method by big data search analyses and prediction flu outbreak probability according to claim 1, it is characterized in that, described step 6) calls recursive algorithm, correlation coefficient carries out traversal attempt, the prediction correct probability of forecast model in relatively attempting for each time, therefrom find the correlation matrix of optimum, solidify forecast model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610014183.1A CN105678080A (en) | 2016-01-11 | 2016-01-11 | Method for predicting influenza outbreak possibility through big data search and analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610014183.1A CN105678080A (en) | 2016-01-11 | 2016-01-11 | Method for predicting influenza outbreak possibility through big data search and analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105678080A true CN105678080A (en) | 2016-06-15 |
Family
ID=56299804
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610014183.1A Pending CN105678080A (en) | 2016-01-11 | 2016-01-11 | Method for predicting influenza outbreak possibility through big data search and analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105678080A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI582718B (en) * | 2016-07-01 | 2017-05-11 | 南臺科技大學 | A method and system of data mining with climate and air pollution data integrated for respiratory disease prevention |
CN108288502A (en) * | 2018-04-11 | 2018-07-17 | 平安科技(深圳)有限公司 | Disease forecasting method and device, computer installation and readable storage medium storing program for executing |
CN108417274A (en) * | 2018-03-06 | 2018-08-17 | 东南大学 | Forecast of epiphytotics method, system and equipment |
CN108573332A (en) * | 2017-10-09 | 2018-09-25 | 江苏立华牧业股份有限公司 | A kind of PDI poultry diseases fashion information management method and system |
CN108648829A (en) * | 2018-04-11 | 2018-10-12 | 平安科技(深圳)有限公司 | Disease forecasting method and device, computer installation and readable storage medium storing program for executing |
CN110136842A (en) * | 2019-04-04 | 2019-08-16 | 平安科技(深圳)有限公司 | Morbidity prediction technique, device and the computer readable storage medium of acute infectious disease |
WO2019227716A1 (en) * | 2018-05-31 | 2019-12-05 | 平安科技(深圳)有限公司 | Method for generating influenza prediction model, apparatus, and computer readable storage medium |
CN113362962A (en) * | 2021-06-07 | 2021-09-07 | 中国疾病预防控制中心病毒病预防控制所 | Epidemiological data integration system and method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104834976A (en) * | 2015-05-14 | 2015-08-12 | 浪潮集团有限公司 | Method for searching for, analyzing and predicting price change trend of memory chip through big data |
CN204698541U (en) * | 2015-06-16 | 2015-10-14 | 张学海 | Digital medical diagnosis integrating device |
-
2016
- 2016-01-11 CN CN201610014183.1A patent/CN105678080A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104834976A (en) * | 2015-05-14 | 2015-08-12 | 浪潮集团有限公司 | Method for searching for, analyzing and predicting price change trend of memory chip through big data |
CN204698541U (en) * | 2015-06-16 | 2015-10-14 | 张学海 | Digital medical diagnosis integrating device |
Non-Patent Citations (3)
Title |
---|
GINSBERG J ET AL.: "Detecting influenzaepidemics using search engine query data", 《NATURE》 * |
邹晓辉等: "谷歌流感预测—大数据在公共卫生领域的尝试", 《中华预防医学杂志》 * |
陆波等: "时间序列模型预测流感发病率的研究", 《中国使用医药》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI582718B (en) * | 2016-07-01 | 2017-05-11 | 南臺科技大學 | A method and system of data mining with climate and air pollution data integrated for respiratory disease prevention |
CN108573332A (en) * | 2017-10-09 | 2018-09-25 | 江苏立华牧业股份有限公司 | A kind of PDI poultry diseases fashion information management method and system |
CN108417274A (en) * | 2018-03-06 | 2018-08-17 | 东南大学 | Forecast of epiphytotics method, system and equipment |
CN108288502A (en) * | 2018-04-11 | 2018-07-17 | 平安科技(深圳)有限公司 | Disease forecasting method and device, computer installation and readable storage medium storing program for executing |
CN108648829A (en) * | 2018-04-11 | 2018-10-12 | 平安科技(深圳)有限公司 | Disease forecasting method and device, computer installation and readable storage medium storing program for executing |
WO2019227716A1 (en) * | 2018-05-31 | 2019-12-05 | 平安科技(深圳)有限公司 | Method for generating influenza prediction model, apparatus, and computer readable storage medium |
CN110136842A (en) * | 2019-04-04 | 2019-08-16 | 平安科技(深圳)有限公司 | Morbidity prediction technique, device and the computer readable storage medium of acute infectious disease |
CN113362962A (en) * | 2021-06-07 | 2021-09-07 | 中国疾病预防控制中心病毒病预防控制所 | Epidemiological data integration system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105678080A (en) | Method for predicting influenza outbreak possibility through big data search and analysis | |
Zhao | Event prediction in the big data era: A systematic survey | |
CN110233849B (en) | Method and system for analyzing network security situation | |
Leung et al. | Fast algorithms for frequent itemset mining from uncertain data | |
CN107332848A (en) | A kind of exception of network traffic real-time monitoring system based on big data | |
CN103034693B (en) | Open entity and kind identification method thereof | |
Murtaza et al. | A host-based anomaly detection approach by representing system calls as states of kernel modules | |
Kwashie et al. | Certus: An effective entity resolution approach with graph differential dependencies (GDDs) | |
Lian et al. | Similarity join processing on uncertain data streams | |
Ahmed et al. | Mining interesting patterns from uncertain databases | |
CN105653518A (en) | Specific group discovery and expansion method based on microblog data | |
CN108875366A (en) | A kind of SQL injection behavioral value system towards PHP program | |
CN105827594A (en) | Suspicion detection method based on domain name readability and domain name analysis behavior | |
Kiran et al. | Discovering Recurring Patterns in Time Series. | |
Fisichella et al. | Detecting health events on the social web to enable epidemic intelligence | |
Berlingerio et al. | Evolving networks: Eras and turning points | |
CN112131392A (en) | Public health epidemic situation early warning method and system based on knowledge graph | |
CN101604408B (en) | Generation of detectors and detecting method | |
Wickramasinghe et al. | Social network analysis and community detection on spread of COVID-19 | |
Suprem et al. | Assed: A framework for identifying physical events through adaptive social sensor data filtering | |
Li et al. | Distributed higher order association rule mining using information extracted from textual data | |
Pu et al. | Challenges and opportunities in rapid epidemic information propagation with live knowledge aggregation from social media | |
Sung et al. | Behaviour mining for fraud detection | |
Liu et al. | A correlation analysis method of network security events based on rough set theory | |
CN111612531A (en) | Click fraud detection method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160615 |