CN111813922A - High-temperature event detection method and system based on microblog text data - Google Patents

High-temperature event detection method and system based on microblog text data Download PDF

Info

Publication number
CN111813922A
CN111813922A CN202010943807.4A CN202010943807A CN111813922A CN 111813922 A CN111813922 A CN 111813922A CN 202010943807 A CN202010943807 A CN 202010943807A CN 111813922 A CN111813922 A CN 111813922A
Authority
CN
China
Prior art keywords
temperature
microblog
day
hma
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010943807.4A
Other languages
Chinese (zh)
Other versions
CN111813922B (en
Inventor
易嘉伟
杜云艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Geographic Sciences and Natural Resources of CAS
Original Assignee
Institute of Geographic Sciences and Natural Resources of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Geographic Sciences and Natural Resources of CAS filed Critical Institute of Geographic Sciences and Natural Resources of CAS
Priority to CN202010943807.4A priority Critical patent/CN111813922B/en
Publication of CN111813922A publication Critical patent/CN111813922A/en
Application granted granted Critical
Publication of CN111813922B publication Critical patent/CN111813922B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Abstract

The invention provides a high-temperature event detection method and system based on microblog text data, which comprises the steps of taking historical or real-time microblog data as input, and judging and screening a microblog MHF containing high-temperature feedback information through keyword matching; counting the number ratio of MHFs every day and the number ratio of the MHFs relevant to microblog sending every day, judging whether the number of the MHFs and the number of microblog sending users are increased abnormally due to the fact that high temperature exists in the day, forming microblog abnormity caused by the high temperature, and setting HMA to represent abnormal microblog events caused by the high temperature in the day; and identifying the publicly-perceived high-temperature threshold and the high-temperature event, wherein the method comprises the steps of calculating the publicly-perceived high-temperature threshold according to the temperature indexes in the HMA period and the non-HMA period, and judging that the publicly-perceived high-temperature event occurs in the day if the observed value of the temperature index in the day exceeds the high-temperature threshold. The method and the system are beneficial to establishing local high-temperature early warning standards according to local conditions, improve the emergency prevention capability of extreme high-temperature disasters, and can avoid corresponding manpower and material resource loss through automatic event detection and alarm.

Description

High-temperature event detection method and system based on microblog text data
Technical Field
The invention belongs to the field of Internet emergency detection, and particularly relates to a high-temperature event detection method and system based on microblog text data.
Background
With the increasing global climate change, extreme high temperature events are more frequent, endangering human health [1-3 ]. Accurately identifies high-temperature events, establishes scientific early warning standards, and has important significance for improving public high-temperature prevention consciousness and reducing high-temperature disaster influence [4 ]. However, the criteria for the identification of high temperature events have not been uniformly defined to date [5-6 ]. The current common high-temperature identification method is different according to the selection of a temperature index and the definition of a threshold, some researches are carried out by combining various thresholds for distinguishing, and some researches consider the duration time of high temperature [7-10 ]. These standards are mainly established based on the physiological characteristics of human thermal regulation, and no consideration is given to the geographical differences in the high-temperature adaptability of people affected by local social culture in the practical application of technology [8 ]. Only if the high temperature threshold value and the high temperature event which are publicly sensed in different places are identified, the local standard of the extreme high temperature can be more accurately formulated. The microblog provides a new way for knowing the actual feeling of the public to the high temperature. As a social media which is popular at present, mass information of all aspects of public voluntary expression experience, feeling, viewpoints and the like is contained in microblog data, and important data support is provided for identifying the influence of extreme high temperature on the public and the response of the public [11-12 ].
At present, the following patents are related to social media data identification events based on microblogs:
1) detecting an online emergency based on microblog data, for example:
CN105119807A method for detecting online emergency facing real-time microblog message flow, published (announcement) date 20151202
CN110502703A social network emergency detection method public (announcement) date 20191126 constructed based on character string dictionary
CN106547875A microblog online emergency detection method based on sentiment analysis and label, public (announcement) day 20170329
The patents mainly aim at hot events in network public sentiment to construct a detection method, which cannot be used for deducing a critical threshold value of high temperature endured by the public;
2) monitoring for city hotspot events based on social media data, such as:
CN107908766 dynamic monitoring method and system for urban hot spot event open (announcement) date 20180413
The patent identifies the spatial distribution of microblog hot events from the interior of a city, and can not be applied to the calculation of a high-temperature threshold value;
3) identifying a catastrophic weather event based on the microblog, for example:
CN108595582 discloses (announced) day 20180928 of identification method of disastrous weather hot spot event based on social signal
The patent identifies the occurrence of the disastrous weather events from the microblog data by constructing the disastrous weather key dictionary, but the patent lacks of identifying extreme events such as extreme high temperature from the perspective of perception and feedback of the public, and therefore the patent cannot be used for identifying the high-temperature critical value perceived by the public.
Therefore, under the technical background, the invention discloses a high-temperature event detection method and system based on microblog text data, and aims to identify extreme high-temperature events experienced by the public and wide influences of the extreme high-temperature events and the wide influences of the extreme high-temperature events and overcome the defect that the existing high-temperature event detection technology is difficult to reflect public experience differences.
The related documents are:
[1]Barriopedro, D., Fischer, E. M., Luterbacher, J., Trigo, R. M.,&García-Herrera, R. (2011). The hot summer of 2010: Redrawing the temperaturerecord map of Europe. Science.
[2]Garcia-Herrera, R., Díaz, J., Trigo, R. M., Luterbacher, J.,&Fischer, E. M. (2010). A review of the european summer heat wave of 2003.Critical Reviews in Environmental Science and Technology.
[3]Gasparrini, A., Guo, Y., Hashizume, M., Lavigne, E., Zanobetti, A.,Schwartz, J., et al. (2015). Mortality risk attributable to high and lowambient temperature: A multicountry observational study. The Lancet.
[4]Chen, Y.,&Li, Y. (2017). An Inter-comparison of Three Heat WaveTypes in China during 1961-2010: Observed Basic Features and Linear Trends.Scientific Reports.
[5]Perkins, S. E.,&Alexander, L. V. (2013). On the measurement of heatwaves. Journal of Climate.
[6]Xu, Z., FitzGerald, G., Guo, Y., Jalaludin, B.,&Tong, S. (2016).Impact of heatwave on mortality under different heatwave definitions: Asystematic review and meta-analysis. Environment International.
[7]Ding, T., Qian, W.,&Yanb, Z. (2010). Changes in hot days and heatwaves in China during 1961-2007. International Journal of Climatology.
[8]Robinson, P. J. (2001). On the definition of a heat wave. Journal ofApplied Meteorology.
[9]Steadman, R. G. (1979). The assessment of sultriness. Part II:effects of wind, extra radiation and barometric pressure on apparenttemperature. Journal of Applied Meteorology.
[10]Zhai, P.,&Pan, X. (2003). Trends in temperature extremes during1951-1999 in China. Geophysical Research Letters.
[11]Grasso, V., Crisci, A., Morabito, M., Nesi, P.,&Pantaleo, G.(2017). Public crowdsensing of heat waves by social media data. Advances inScience and Research.
[12]Jung, J.,&Uejio, C. K. (2017). Social media responses to heatwaves. International Journal of Biometeorology。
disclosure of Invention
According to the invention, the high-temperature feedback of the public is mined through microblog text data, the extreme high-temperature events experienced by the public and the wide influence of the extreme high-temperature events are identified, and the problem of low detection precision caused by the defect that the public experience difference is difficult to reflect by the existing high-temperature event detection technology is solved.
The technical scheme of the invention provides a high-temperature event detection method based on microblog text data, which comprises the following steps,
step 1, extracting microblogs containing high-temperature feedback information, wherein historical or real-time microblog data are used as input, and the microblogs containing the high-temperature feedback information are judged and screened through keyword matching, and are marked as MHF;
step 2, identifying microblog abnormity caused by high temperature, including counting the number proportion of MHF every day and the number proportion of the relevant microblog users who send MHF every day, and judging whether the high temperature exists on the day to cause abnormal increase of the MHF number and the microblog users according to the counting result to form microblog abnormity caused by high temperature; setting HMA to represent that a microblog abnormal event triggered by high temperature is formed on the current day, and non-HMA to represent that the microblog abnormal event triggered by high temperature is not formed on the current day;
and 3, identifying the high-temperature threshold value and the high-temperature event of the public perception, including calculating the high-temperature threshold value of the public perception according to the temperature indexes in the HMA period and the non-HMA period, and judging that the high-temperature event of the public perception occurs in the day if the observed value of the temperature index of the day exceeds the high-temperature threshold value.
Furthermore, step 1 comprises the sub-steps of,
step 1.1, extracting a microblog comprising 'hot' words;
step 1.2, performing word segmentation on the text of each microblog, and extracting words comprising 'hot';
step 1.3, screening the 'hot' contained participles related to high-temperature weather from the result obtained in the step 1.2, and adding the participles into an effective keyword library;
and step 1.4, extracting the microblog containing the effective keywords as the MHF through word segmentation matching based on the effective keyword library.
Furthermore, step 2 comprises the sub-steps of,
step 2.1, calculating the ratio of the number of MHFs in each day to the number of microblogs in the day, and recording the ratio as p 1;
step 2.2, calculating the ratio of the number of the relevant microblogs which send MHF every day to the total number of the microblogs which send the MHF on the current day, and recording the ratio as p 2;
step 2.3, calculating the corresponding abnormal critical values P of P1 and P2 respectivelycrit
And 2.4, determining that the date that both p1 and p2 exceed the abnormal critical value is HMA, and otherwise, determining that the date is not HMA.
Furthermore, in step 2.3, the abnormal critical value PcritThe calculation is as follows,
Pcrit=Q3+1.5×IQR,
q3 and IQR are respectively an upper quartile and a quartile distance obtained by long-time statistics of characteristic indexes, and the characteristic indexes are p1 or p 2.
Furthermore, step 3 comprises the sub-steps of,
step 3.1, counting the temperature index during HMATTaking the 5 th percentile as a high temperature threshold candidate value, and recording asP HMA (T, 5%);
Step 3.2, counting the temperature index in the non-HMA periodTAnd taking the 95 th percentile as a high-temperature threshold candidate value and recording the candidate value as the high-temperature threshold candidate valueP ~HMA (T, 95%);
Step 3.3, toP HMA (T, 5%) AndP ~HMA (T, 95%) Is used as the high temperature threshold value of public perceptionT crit When the daily observed value of the temperature index T exceedsT crit When the event is determined to be a high temperature event, which is perceived by the public, occurring on the day.
And, the temperature indexTThe daily maximum temperature was used.
And, after judging the high temperature event sensed by the public on the current day, sending out an automatic alarm notice.
The invention also provides a high-temperature event detection system based on the microblog text data, which is used for realizing the high-temperature event detection method based on the microblog text data.
And, including the following modules,
the system comprises a first module, a second module and a third module, wherein the first module is used for extracting the microblogs containing the high-temperature feedback information, historical or real-time microblog data are used as input, the microblogs containing the high-temperature feedback information are judged and screened through keyword matching, and the microblogs containing the high-temperature feedback information are marked as MHF;
the second module is used for identifying microblog abnormity caused by high temperature, and comprises the steps of counting the number proportion of MHF every day and the number proportion of the people who send the MHF relevant microblog every day, judging whether the high temperature exists in the day to cause the abnormal increase of the number of the MHF and the number of the users who send the microblog according to the counting result, and forming the microblog abnormity caused by high temperature; setting HMA to represent that a microblog abnormal event triggered by high temperature is formed on the current day, and non-HMA to represent that the microblog abnormal event triggered by high temperature is not formed on the current day;
and the third module is used for identifying the publicly-perceived high-temperature threshold and the high-temperature event, and comprises the steps of calculating the publicly-perceived high-temperature threshold according to the temperature indexes in the HMA period and the non-HMA period, and judging that the publicly-perceived high-temperature event occurs in the day if the observed value of the temperature index in the day exceeds the high-temperature threshold.
According to the invention, the high-temperature events which affect the public are identified by extracting the microblogs containing high-temperature feedback information, monitoring the number of the microblogs, and calculating the high-temperature sensing threshold of the public. The invention is beneficial to establishing local high-temperature early warning standards according to local conditions, improves the emergency prevention capability of extreme high-temperature disasters, and can avoid corresponding manpower and material resource loss through automatic event detection and alarm.
Drawings
Fig. 1 is a schematic system structure according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating the public-perceived high temperature threshold and temperature variation in Beijing in an example of the application of an embodiment of the present invention.
Detailed Description
In order to more clearly understand the present invention, the technical solutions of the present invention are specifically described below with reference to the accompanying drawings and examples.
In this embodiment, the microblog sampling data of the beijing area in 2017 is used as input, and according to the technical embodiment of the invention, the public perceived high-temperature threshold value in the beijing city and the public perceived high-temperature event in 2017 are identified.
Referring to fig. 1, the embodiment provides a high-temperature event detection method based on microblog text data, and the specific implementation process is as follows:
step 1, extracting microblogs containing high-temperature feedback: historical or real-time microblog data are used as input, and microblogs (MHF for short) containing high-temperature feedback information are judged and screened through keyword matching, namely, the microblogs with hot weather on the same day are expressed by text contents. Namely, the MHF in the following text represents the microblog containing the high temperature feedback information.
In the embodiment, the specific implementation steps of step 1 include,
step 1.1, extracting the microblog containing the 'hot' word.
In an embodiment, about 8 million microblogs including 'hot' words are extracted.
Step 1.2, performing word segmentation on the text of each microblog obtained in the step 1.1, and extracting words containing 'hot'.
In the embodiment, text word segmentation is carried out on each piece of microblog content, and more than 2 thousand of 'hot' word segmentation is extracted.
And step 1.3, screening the 'hot' contained participles related to high-temperature weather from the result obtained in the step 1.2, adding the 'hot' contained participles into an effective keyword library, and removing the 'hot' contained participles unrelated to high temperature.
In an embodiment, 28 hot-containing segments associated with high temperature weather, including 'too hot', 'muggy', etc., are filtered and added to the valid keyword library, and the remaining hot-containing segments not associated with high temperature are removed, such as 'hot', 'love', etc.
And step 1.4, extracting 6 thousands of MHF microblogs from the microblogs containing the 'hot' words through keyword matching.
Step 2, identifying microblog abnormality caused by high temperature: calculating the ratio of the number of MHFs and the ratio of the number of MHFs generating micro blogs every day, respectively calculating critical threshold values of the two characteristic indexes based on a Tukey's Fences abnormal value judging method, and judging that the number of MHFs and the number of users generating micro blogs are increased due to high temperature at the day when the two indexes exceed corresponding critical values at the same time to form micro blogs abnormal caused by high temperature (HMA for short). Namely, the HMA in the following text represents the microblog abnormal events caused by high temperature on the current day. non-HMAs represent microblog anomalies not caused by high temperature on the current day.
In the embodiment, the step 2 includes the following steps,
and 2.1, calculating the ratio of the daily MHF number to the daily microblog number, and recording as p 1.
In the embodiment, the ratio p1 of the daily MHF number of Beijing to the daily microblog amount is calculated, and the average daily amount of p1 in 2017 is 0.001 through statistics.
And 2.2, calculating the ratio of the number of the relevant microblogs which send MHF every day to the total number of the microblogs which send the MHF on the current day, and recording the ratio as p 2.
In the embodiment, the ratio p2 of the number of relevant microblogs of MHF issued daily in Beijing to the total number of microblogs issued on the current day is calculated, and the average daily number of p2 in 2017 is 0.0013 through statistics.
In the embodiment, the number of microblogs related to high-temperature feedback every day is represented by ' the number of MHFs per day ', and the number of microblogs sent by MHFs per day ' represents the number of users who send microblogs related to high temperature every day.
Step 2.3, the embodiment preferably uses Tukey's outlier discrimination method to calculate the critical value P of the corresponding discriminant abnormalities of P1 and P2, respectivelycritAs follows below, the following description will be given,
Pcrit=Q3+1.5×IQR,
wherein, Q3 and IQR are the upper quartile and the quartile distance respectively obtained by long-time statistics (such as one year in the embodiment) on the characteristic index (p 1 or p 2). The upper quartile is equal to the 75% of the numbers of all the numerical values in the sample after being arranged from small to large; the quartile range is the difference between the upper quartile and the lower quartile.
In the examples, the anomaly thresholds for p1 and p2 were calculated to be 0.0067 and 0.0086, respectively, based on 2017 data.
And 2.4, judging the time period (in days) when both p1 and p2 exceed the abnormal critical value as HMA, otherwise, judging the time period as non-HMA.
In the examples, HMA events occurring during the 4-10 months of Beijing City were identified 47 times with the highest frequency of 7 months.
The invention counts the p1 and p2 indexes every day, and the calculation of Q3, IQR and threshold value is counted according to historical long-time data (such as one year) in the subsequent steps.
Step 3, identifying a high-temperature threshold value and a high-temperature event which are perceived by the public: and calculating a public perception high-temperature threshold according to the temperature indexes in the HMA and non-HMA periods, and judging that the day is a public perception high-temperature event if the observed value of the temperature index in a certain day exceeds the threshold.
In the embodiment, the specific implementation steps of step 3 include,
step 3.1, counting the temperature index during HMAT(e.g., highest daily temperature), taking the 5 th percentile as a high temperature threshold candidate value, and recording asP HMA (T, 5%). When the temperature indicator exceeds the candidate threshold, it indicates a greater probability of an HMA event occurring on the current day.
In the examples, the 5 th percentile of the highest daily temperatures during HMA was counted,P HMA (T, 5%)=30.8˚C
step 3.2, counting the temperature index in the non-HMA periodT(e.g., highest daily temperature), and the 95 th percentile is taken as a high temperature threshold candidate value and is recorded asP ~HMA (T, 95%). When the temperature indicator exceeds the candidate threshold, it indicates that the occurrence of non-HMA events on the day is a small probability event, and conversely, that HMA events occur on a large probability on the day.
In the examples, the 95 th percentile of the highest daily temperatures during non-HMA periods was counted,P ~HMA (T, 95%)=32.6˚C
step 3.3, toP HMA (T, 5%) AndP ~HMA (T, 95%) Is used as the high temperature threshold value of public perceptionT crit Therefore, the risk of misjudging the high-temperature event can be reduced by adopting a higher threshold value. When the daily observed value of the temperature index T exceedsT crit When the event is a high temperature event, it is judged that the high temperature event is perceived by the public.
In an embodiment, the high temperature threshold is publicly known in BeijingT crit Is composed ofP HMA (T, 5%) AndP ~HMA (T, 95%) (iii) the maximum value of (a), i.e. 32.6 ℃; as shown in figure 2, the highest temperature of Beijing city exceeds that of Beijing city in the period of 4-10 monthsT crit Up to 46 days, 17 events of high temperature are perceived by the public, with a duration of up to 7 days, such as continuous high temperature during 14-20 days 6 months. Therefore, the high-temperature threshold actually perceived by the public is actually lower than the threshold (namely, the highest daily temperature is 35 ℃) of high-temperature early warning information issued by meteorological departments in Beijing city, and the fact that the current high-temperature early warning standard is too high and the influence of underestimation of high temperature existsAnd (4) risks. According to the high-temperature threshold value and event judgment method provided by the patent, mass data acquisition of a social platform such as a microblog can be used for automatically judging, and when the daily maximum temperature of people in Beijing exceeds 32.6 ℃, sensing feedback can be formed on high temperature, and an automatic alarm notice can be sent out. In specific implementation, the alarm notification can be pushed to a computer, a mobile phone and other equipment used by a related user according to a preset mode. After receiving the alarm, relevant departments can consider reminding the public to prevent high temperature and reduce the health influence of the high temperature on the public. The patent technology can be applied to other cities, can more scientifically set high-temperature early warning standards and management methods suitable for local climate conditions for different local cities to provide decision support, guarantees the safety of the masses, and avoids corresponding manpower and material resource loss.
In specific implementation, the method can adopt a computer software technology to realize an automatic operation process, and a corresponding system device for implementing the method process is also in the protection scope of the invention.
In some possible embodiments, a high temperature event detection system based on microblog text data is provided, comprising the following modules,
the system comprises a first module, a second module and a third module, wherein the first module is used for extracting the microblogs containing the high-temperature feedback information, historical or real-time microblog data are used as input, the microblogs containing the high-temperature feedback information are judged and screened through keyword matching, and the microblogs containing the high-temperature feedback information are marked as MHF;
the second module is used for identifying microblog abnormity caused by high temperature, and comprises the steps of counting the number proportion of MHF every day and the number proportion of the people who send the MHF relevant microblog every day, judging whether the high temperature exists in the day to cause the abnormal increase of the number of the MHF and the number of the users who send the microblog according to the counting result, and forming the microblog abnormity caused by high temperature; setting HMA to represent that a microblog abnormal event triggered by high temperature is formed on the current day, and non-HMA to represent that the microblog abnormal event triggered by high temperature is not formed on the current day;
and the third module is used for identifying the publicly-perceived high-temperature threshold and the high-temperature event, and comprises the steps of calculating the publicly-perceived high-temperature threshold according to the temperature indexes in the HMA period and the non-HMA period, and judging that the publicly-perceived high-temperature event occurs in the day if the observed value of the temperature index in the day exceeds the high-temperature threshold.
In specific implementation, the system can also be divided into an input layer, a function module layer and an output layer according to the data flow direction, referring to fig. 2, microblog text data and temperature observation data are input into the input layer, and after corresponding processing of the steps of the method, the obtained MHF and HMA can be output and the obtained high-temperature event can be judged.
In some possible embodiments, a microblog text data based high-temperature event detection system is provided, and includes a processor and a memory, where the memory is used for storing program instructions, and the processor is used for calling the stored instructions in the processor to execute a microblog text data based high-temperature event detection method as described above.
In some possible embodiments, a microblog text data based high-temperature event detection system is provided, and includes a readable storage medium on which a computer program is stored, and when the computer program is executed, the microblog text data based high-temperature event detection method is implemented.
It should be understood that the above-mentioned embodiments are described in some detail, and not intended to limit the scope of the invention, and those skilled in the art will be able to make alterations and modifications without departing from the scope of the invention as defined by the appended claims.

Claims (9)

1. A high-temperature event detection method based on microblog text data is characterized by comprising the following steps: comprises the following steps of (a) carrying out,
step 1, extracting microblogs containing high-temperature feedback information, wherein historical or real-time microblog data are used as input, and the microblogs containing the high-temperature feedback information are judged and screened through keyword matching, and are marked as MHF;
step 2, identifying microblog abnormity caused by high temperature, including counting the number proportion of MHF every day and the number proportion of the relevant microblog users who send MHF every day, and judging whether the high temperature exists on the day to cause abnormal increase of the MHF number and the microblog users according to the counting result to form microblog abnormity caused by high temperature; setting HMA to represent that a microblog abnormal event triggered by high temperature is formed on the current day, and non-HMA to represent that the microblog abnormal event triggered by high temperature is not formed on the current day;
and 3, identifying the high-temperature threshold value and the high-temperature event of the public perception, including calculating the high-temperature threshold value of the public perception according to the temperature indexes in the HMA period and the non-HMA period, and judging that the high-temperature event of the public perception occurs in the day if the observed value of the temperature index of the day exceeds the high-temperature threshold value.
2. The microblog text data-based high-temperature event detection method according to claim 1, wherein: step 1 comprises the sub-steps of,
step 1.1, extracting a microblog comprising 'hot' words;
step 1.2, performing word segmentation on the text of each microblog, and extracting words comprising 'hot';
step 1.3, screening the 'hot' contained participles related to high-temperature weather from the result obtained in the step 1.2, and adding the participles into an effective keyword library;
and step 1.4, extracting the microblog containing the effective keywords as the MHF through word segmentation matching based on the effective keyword library.
3. The microblog text data-based high-temperature event detection method according to claim 1, wherein: step 2 comprises the sub-steps of,
step 2.1, calculating the ratio of the number of MHFs in each day to the number of microblogs in the day, and recording the ratio as p 1;
step 2.2, calculating the ratio of the number of the relevant microblogs which send MHF every day to the total number of the microblogs which send the MHF on the current day, and recording the ratio as p 2;
step 2.3, calculating the corresponding abnormal critical values P of P1 and P2 respectivelycrit
And 2.4, determining that the date that both p1 and p2 exceed the abnormal critical value is HMA, and otherwise, determining that the date is not HMA.
4. The composition of claim 3The method for detecting the high-temperature event of the microblog text data is characterized by comprising the following steps of: in step 2.3, the abnormal critical value PcritThe calculation is as follows,
Pcrit=Q3+1.5×IQR,
q3 and IQR are respectively an upper quartile and a quartile distance obtained by long-time statistics of characteristic indexes, and the characteristic indexes are p1 or p 2.
5. The microblog text data-based high-temperature event detection method according to claim 1, wherein: step 3 comprises the sub-steps of,
step 3.1, counting the temperature index during HMATTaking the 5 th percentile as a high temperature threshold candidate value, and recording asP HMA (T, 5%);
Step 3.2, counting the temperature index in the non-HMA periodTAnd taking the 95 th percentile as a high-temperature threshold candidate value and recording the candidate value as the high-temperature threshold candidate valueP ~HMA (T, 95%);
Step 3.3, toP HMA (T, 5%) AndP ~HMA (T, 95%) Is used as the high temperature threshold value of public perceptionT crit When the daily observed value of the temperature index T exceedsT crit When the event is determined to be a high temperature event, which is perceived by the public, occurring on the day.
6. The microblog text data-based high-temperature event detection method according to claim 5, wherein: the temperature indexTThe daily maximum temperature was used.
7. The method for detecting the high-temperature event based on the microblog text data according to claim 1, 2, 3, 4, 5 or 6, wherein the method comprises the following steps: and after the occurrence of a high-temperature event sensed by the public on the current day is judged, an automatic alarm notice is sent.
8. A high temperature event detection system based on microblog text data is characterized in that: the method for detecting the high-temperature event based on the microblog text data is used for realizing the method for detecting the high-temperature event based on the microblog text data as claimed in any one of claims 1 to 7.
9. The microblog text data based high-temperature event detection system according to claim 8, wherein: comprises the following modules which are used for realizing the functions of the system,
the system comprises a first module, a second module and a third module, wherein the first module is used for extracting the microblogs containing the high-temperature feedback information, historical or real-time microblog data are used as input, the microblogs containing the high-temperature feedback information are judged and screened through keyword matching, and the microblogs containing the high-temperature feedback information are marked as MHF;
the second module is used for identifying microblog abnormity caused by high temperature, and comprises the steps of counting the number proportion of MHF every day and the number proportion of the people who send the MHF relevant microblog every day, judging whether the high temperature exists in the day to cause the abnormal increase of the number of the MHF and the number of the users who send the microblog according to the counting result, and forming the microblog abnormity caused by high temperature; setting HMA to represent that a microblog abnormal event triggered by high temperature is formed on the current day, and non-HMA to represent that the microblog abnormal event triggered by high temperature is not formed on the current day;
and the third module is used for identifying the publicly-perceived high-temperature threshold and the high-temperature event, and comprises the steps of calculating the publicly-perceived high-temperature threshold according to the temperature indexes in the HMA period and the non-HMA period, and judging that the publicly-perceived high-temperature event occurs in the day if the observed value of the temperature index in the day exceeds the high-temperature threshold.
CN202010943807.4A 2020-09-10 2020-09-10 High-temperature event detection method and system based on microblog text data Active CN111813922B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010943807.4A CN111813922B (en) 2020-09-10 2020-09-10 High-temperature event detection method and system based on microblog text data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010943807.4A CN111813922B (en) 2020-09-10 2020-09-10 High-temperature event detection method and system based on microblog text data

Publications (2)

Publication Number Publication Date
CN111813922A true CN111813922A (en) 2020-10-23
CN111813922B CN111813922B (en) 2021-01-05

Family

ID=72860149

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010943807.4A Active CN111813922B (en) 2020-09-10 2020-09-10 High-temperature event detection method and system based on microblog text data

Country Status (1)

Country Link
CN (1) CN111813922B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883717A (en) * 2021-04-27 2021-06-01 北京嘉和海森健康科技有限公司 Wrongly written character detection method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140085328A1 (en) * 2012-09-24 2014-03-27 International Business Machines Corporation Social Media Event Detection and Content-Based Retrieval
CN103955505A (en) * 2014-04-24 2014-07-30 中国科学院信息工程研究所 Micro-blog-based real-time event monitoring method and system
CN104102681A (en) * 2013-04-15 2014-10-15 腾讯科技(深圳)有限公司 Microblog key event acquiring method and device
CN108595582A (en) * 2018-04-17 2018-09-28 北京理工大学 A kind of disastrous meteorological focus incident recognition methods based on social signal
CN111079031A (en) * 2019-12-27 2020-04-28 北京工业大学 Bowen disaster information importance weighting classification method based on deep learning and XGboost algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140085328A1 (en) * 2012-09-24 2014-03-27 International Business Machines Corporation Social Media Event Detection and Content-Based Retrieval
CN104102681A (en) * 2013-04-15 2014-10-15 腾讯科技(深圳)有限公司 Microblog key event acquiring method and device
CN103955505A (en) * 2014-04-24 2014-07-30 中国科学院信息工程研究所 Micro-blog-based real-time event monitoring method and system
CN108595582A (en) * 2018-04-17 2018-09-28 北京理工大学 A kind of disastrous meteorological focus incident recognition methods based on social signal
CN111079031A (en) * 2019-12-27 2020-04-28 北京工业大学 Bowen disaster information importance weighting classification method based on deep learning and XGboost algorithm

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883717A (en) * 2021-04-27 2021-06-01 北京嘉和海森健康科技有限公司 Wrongly written character detection method and device

Also Published As

Publication number Publication date
CN111813922B (en) 2021-01-05

Similar Documents

Publication Publication Date Title
Yoon et al. Big data as complementary audit evidence
Ripberger et al. Social media and severe weather: do tweets provide a valid indicator of public attention to severe weather risk communication?
Abeku et al. Malaria epidemic early warning and detection in African highlands
CN110751451B (en) Laboratory big data management system
Bunting Dealing with a problem that doesn't exist? Professional responses to female perpetrated child sexual abuse
CN106776806A (en) The methods of marking and system of call center's quality inspection voice
WO2010138864A2 (en) Forecasting hotspots using predictive visual analytics approach
Fischer et al. Crisis warning apps: Investigating the factors influencing usage and compliance with recommendations for action
CN106656996A (en) Information safety risk assessment method
Yang et al. Seeking for your own sake: Chinese citizens’ motivations for information seeking about air pollution
Liang et al. Text vs. images: on the viability of social media to assess earthquake damage
CN111754241A (en) User behavior perception method, device, equipment and medium
Shu et al. Monitoring imprecise fraction of nonconforming items using p control charts
CN111813922B (en) High-temperature event detection method and system based on microblog text data
KR101685334B1 (en) Disaster detection technique based on the key word relevance and the method for managing the disaster using the same
CN111951104A (en) Risk conduction early warning method based on associated graph
CN116596305A (en) Risk grading method for food safety management
CN115188688A (en) Abnormality detection method and apparatus, electronic device, and storage medium
US11526776B1 (en) System and method for generating predictions of geopolitical events
Huang et al. Risk here vs. risk there: intention to seek information about Gulf Coastal Erosion
Meis et al. Quantifying and modelling the ENSO phenomenon and extreme discharge events relation in the La Plata Basin
De Moor et al. Are DNA data a valid source to study the spatial behavior of unknown offenders?
NT et al. Allegations of Child Maltreatment
Wang et al. Geography matters in online hotel reviews
Pearn et al. Estimating process yield based on Spk for multiple samples

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant