CN114708485A - Method for acquiring flood disaster information from social media - Google Patents

Method for acquiring flood disaster information from social media Download PDF

Info

Publication number
CN114708485A
CN114708485A CN202210306767.1A CN202210306767A CN114708485A CN 114708485 A CN114708485 A CN 114708485A CN 202210306767 A CN202210306767 A CN 202210306767A CN 114708485 A CN114708485 A CN 114708485A
Authority
CN
China
Prior art keywords
key
picture
information
flood
social media
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210306767.1A
Other languages
Chinese (zh)
Inventor
张凌嘉
梁汉远
顾海挺
江衍铭
许月萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210306767.1A priority Critical patent/CN114708485A/en
Publication of CN114708485A publication Critical patent/CN114708485A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention relates to a method for acquiring flood disaster information from social media, which is realized based on a Selenium automation tool and a YOLO v5 convolution neural network, and comprises the following steps: (1) utilizing a Selenium automation tool to simulate login to obtain character and picture data when a flood disaster happens; (2) based on the acquired flood picture data, identifying key objects and key parts in the flood by adopting a YOLO v5 convolutional neural network; (3) and converting the recognition result of the flood character information and the image information into the water level depth according to a preset standard. According to the method, the efficiency and the accuracy of identifying the anti-crawler effect and the YOLO v5 convolutional neural network in the automatic webpage simulation of the Selenium are considered, and more flood data except for a conventional hydrological site are obtained through social media platforms such as microblogs, so that an accurate hydrological model can be better established.

Description

Method for acquiring flood disaster information from social media
Technical Field
The invention belongs to the field of intelligent water conservancy, relates to a new method for acquiring flood disaster information from social media, and particularly relates to a social media flood information acquisition method based on a Selenium automation and YOLO neural network.
Background
The establishment of the hydrological model in flood forecasting is usually based on data of a traditional hydrological observation station, but due to the limitation of spatial layout, the traditional hydrological observation station cannot observe water level data of person dense areas such as urban areas, while a large amount of real-time image, voice, video, text, number and other information about flood disasters shared by relatives or bystanders on social media can provide effective disaster information except the traditional hydrological observation station, and the flood forecasting model can be better verified by utilizing the data so as to improve the accuracy of the flood forecasting. Flood disaster data in the social network platform can be efficiently collected through a web crawler technology, but the data is often extremely large in data volume and extremely high in repetition rate, has the problems of exaggeration, delay, false and the like, and cannot be directly obtained and utilized. Machine learning techniques can then fully identify and utilize the valid information collected in social media.
A web crawler based on a Selenium automation tool directly simulates user operation in a browser through running a test script to complete terminal test of an application program, and the problem that Requests cannot execute JavaScript codes is solved in a mode of simulating user login.
Machine learning enables a machine to have an analysis learning ability like a human being and to recognize data such as characters, images, and sounds. When machine learning is used for realistic tasks, the features describing the sample typically need to be designed by human experts, which is called "Feature Engineering". The quality of the features has a crucial influence on the generalization performance, and it is not easy for a human expert to design good features. Due to the introduction of a Convolutional Neural Network (CNN) based on deep learning, the problems of sliding window selection and artificial feature extraction in machine learning are solved, and the real-time performance and the accuracy of target detection are greatly improved. YOLO v1 was proposed in 2015, and its core idea is to take the whole picture as the input of the network and directly implement the determination of the bounding box position and classification at the output layer. The YOLO v5 convolutional neural network is further in network lightweight, and the training and recognition speed is higher.
Disclosure of Invention
In order to solve the defects of the prior art, the invention aims to provide a method for acquiring flood disaster information from social media based on the Selenium automation and the YOLO neural network, so as to effectively utilize flood disaster related data on the social media.
In order to achieve the above object, the technical scheme adopted by the invention is as follows:
a method for acquiring flood disaster information from social media is characterized by comprising the following steps:
(1) simulating user login by using a Selenium automation tool to obtain text and picture information when a flood disaster happens from a social media;
(2) identifying key objects and key parts in the flood by using the obtained flood related picture data and using a YOLO v5 convolutional neural network training model;
(3) and converting the flood character information and the picture recognition result into water level data by using a preset key part height standard.
In the above technical solution, further, the social media in the step (1) is a microblog.
Further, the step (1) comprises:
determining keywords and classification places of flood disaster information to be acquired;
simulating user login, page click, scrolling and input operations by adopting a Selenium automation tool, and acquiring character and picture data according to key words;
calling a microblog API to acquire the release time and the release place, comparing the release time and the release place with the occurrence time of a flood disaster, eliminating irrelevant data, deleting the stored text information, reserving repeated information with the earliest release time, classifying and storing pictures according to places if classified places are included when picture information is stored, resampling the acquired pictures, calculating a hash value, solving a hamming distance by using the hash value, and deleting the picture.
Further, in the step (2), a YOLO v5 convolutional neural network is adopted for image recognition, specifically comprising a key object recognition model and a key object part recognition model in the picture, the picture containing the specified key object is used as input to train the key object recognition model, and the picture marking the related key object part and the serial number is used as input to train the key object part recognition model; and (3) identifying the picture information obtained in the step (1) after training.
Further, the picture identification result in the step (3) is a picture with an object identification frame, so that the picture identification result can be directly observed conveniently; and storing identification result data including: the picture name, the serial number of the part obtained by identification, the central point position of the identification frame and the length and the width of the identification frame;
further, the criteria in step (3) are specifically: determining the height represented by the part of the specified key object by consulting the relevant manufacturing standard of the specified key object, thereby forming a key part height standard;
setting a plurality of key parts for a certain key object, taking the height corresponding to the lowest key part identified in the image as water level information, and if any key part cannot be identified, considering that the water level depth reaches the height corresponding to the highest key part, thereby obtaining the water level information corresponding to the key object; when a plurality of key objects are identified in the picture, the water level information corresponding to each key object is compared, abnormal values are removed, and the water level information is averaged to obtain the water level information of the place. The more the key parts are arranged, the higher the detection precision is.
The invention has the beneficial effects that:
according to the method, the efficiency and the accuracy of identifying the anti-crawler effect and the YOLO v5 convolutional neural network in the automatic webpage simulation of the Selenium are considered, and more flood data except for a conventional hydrological site are obtained through social media platforms such as microblogs, so that an accurate hydrological model can be better established.
Drawings
FIG. 1 is a schematic flow chart of an embodiment of the present invention.
Fig. 2 is an example of a key object recognition result of the acquired picture information.
Fig. 3 is an example 1 of a key object part recognition result of the acquired picture information.
Fig. 4 is an example 2 of a key object part recognition result of the acquired picture information.
Fig. 5 is an example of a picture information analysis process.
Detailed Description
The technical scheme of the invention is further explained in detail by the examples and the accompanying drawings.
The whole information acquisition process is shown in fig. 1.
A method for acquiring flood disaster information from social media comprises the following steps:
(1) simulating user login by using a Selenium automation tool to obtain text and picture information when a flood disaster happens from a social media; the social media can be microblogs or other social media; the specific definition of the text and picture information is as follows:
1) the text information comprises the re-processed microblog text content with the keywords, the user name, the user grade, the release time, the comment like forwarding amount, the keywords, the name discussion number and the popularity of the topic;
2) the picture information is an attached figure of the microblog and time and place information acquired by calling a microblog API.
The specific process of obtaining information from social media is as follows:
1) and determining keywords and classification places of flood disaster information to be acquired.
2) Simulating user login, page click, scrolling and input operations by using a Selenium automation tool so as to acquire more data;
3) searching topics related to the hydrology disaster keywords from a topic column according to the keywords, and storing websites of the topics;
4) acquiring page text content and other data from the acquired topic website and storing the page text content and other data, storing text information in an excel table form, storing picture information in a jpg format, calling a microblog API (application program interface) to acquire published time and place, comparing the published time and place with the occurrence time of flood disasters, and rejecting data with overlarge time span; deleting duplicate information of the stored character information, and reserving duplicate information with the earliest release time; when picture information is obtained, calling a microblog API to obtain the time and place of release, and if the picture information comprises classified places, classifying the pictures according to the places and storing the pictures;
5) resampling the obtained picture into 8 pixels by 8 pixels, calculating a hash value of the resampled picture, and performing deletion and duplication processing on the picture by solving a Hamming distance by using the hash value.
In the above process, care should be taken to remove pictures containing a large amount of official text announcements and other irrelevant pictures, to screen out duplicate pictures, and to remove pictures where key objects or key information cannot be identified (for example, where a key object is identified but a part of any key object cannot be identified, it cannot be determined that the key object is identified as an error or is completely drowned).
(2) Identifying key objects and key parts in the flood by using the obtained flood related picture data and using a YOLO v5 convolutional neural network training model;
the method specifically comprises a key object identification model and a key object part identification model in a picture, wherein the picture containing a specified key object is used as input to train the key object identification model, and the picture marking the part and the serial number of the related key object is used as input to train the key object part identification model; and (3) identifying the picture information obtained in the step (1) after training. The recognition result of the key object determines which part recognition model is adopted subsequently, and the recognition result of the key object part directly determines the water level depth represented by the picture. Through tests, six hundred pictures are input into the trained part recognition model, and the accuracy rate of part recognition reaches 0.84.
(3) And converting the flood character information and the picture recognition result into water level data by using a preset key part height standard. The picture identification result is a picture with an object identification frame, so that the picture identification result can be conveniently and directly observed; and storing identification result data including: the picture name, the serial number of the part obtained by identification, the central point position of the identification frame and the length and the width of the identification frame; when a plurality of key objects are identified in the picture, the water level information corresponding to each key object is compared, abnormal values are removed, and the water level information is averaged to obtain the water level information of the place.
For the text information, automatically comparing the acquired microblog text content with the counted common water level description keywords, and if the common water level description keywords exist in the text content, keeping the water level information of the place; if not, the text information is removed.
The standard specifically comprises: determining the height represented by the part of the specified key object by consulting the relevant manufacturing standard of the specified key object, thereby forming a key part height standard; if a key part is identified in the image, the water level depth is considered to not submerge the part, so that the depth which the water level does not reach is determined according to the standard. The specific appointed key objects can be various vehicle types, people and the like, the basic size and the part height of the selected key objects are determined by referring to related domestic manufacturing standards and biological data and are arranged in sequence from high to low, if the part is identified, a height feedback is generated, and the height in the standard is in millimeter unit.
The method is described below with reference to specific examples, a flood disaster to be acquired, such as liqima flood causing huge economic casualties in 2019, is determined, a keyword "liqima" is selected, and four sites A, B, C, D are selected in an attempt to identify missing data sites according to existing hydrologic sites.
Keywords are transmitted to a search bar through a Selenium automation tool to obtain topic websites, microblog data from 8/10/2019 to 8/16/2019 in topics are further obtained, repeated and irrelevant data are removed, three thousand pieces of character information and five thousand pieces of picture information can be obtained, and a table 1 shows an example of capturing microblog character information by the Selenium automation tool.
The key objects selected by the picture analysis are 'cars', 'cars' comprise 'cars (car),' buses), 'trucks (truck)'; key parts of the automobile are "back _ light", "door (car _ door)", and "tire (tire)". The identification of key objects and key parts all uses the YOLO v5 convolutional neural network, and table 2 is the standard used in this example. The pictures are identified by using the water level identification model, and the water level information of each place containing the pictures can be obtained, as shown in table 3. The picture data analysis process is shown in fig. 5, and the data can assist model verification for urban flood forecast warning.
Through statistics, the accuracy rate of water level identification in the example reaches 0.97, wherein the error is caused by that the relief of the terrain of the picture shooting site is large, and a key object is just positioned in the accumulated water to cause water level identification error.
Table 1 example of capturing microblog text information by a Selenium automation tool
Figure BDA0003565818060000051
Figure BDA0003565818060000061
TABLE 2 Key objects and their location information height conversion criteria
Categories Rear lamp Vehicle door Tyre
Car (R.C.) 300mm 100mm 50mm
Public transport 400mm 200mm 70mm
Truck 500mm 300mm 100mm
TABLE 3 Water level depth (mm) for identification of each picture sample
Figure BDA0003565818060000062
Figure BDA0003565818060000071
The foregoing description is only exemplary of the implementation of the present invention and is not intended to limit the invention thereto. The selection of the key objects and parts to be treated and the establishment of the standard can be specifically established according to different research problems. Various modifications and alterations of this invention will occur to those skilled in the art. All changes, equivalents, modifications and the like which come within the scope of the invention as defined by the appended claims are intended to be embraced therein.

Claims (7)

1. A method for acquiring flood disaster information from social media is characterized by comprising the following steps:
(1) simulating user login by using a Selenium automation tool to obtain text and picture information when a flood disaster happens from a social media;
(2) identifying key objects and key parts in the flood by using the obtained flood related picture data and using a YOLO v5 convolutional neural network training model;
(3) and converting the flood character information and the picture recognition result into water level data by using a preset key part height standard.
2. The method for acquiring flood disaster information from social media according to claim 1, wherein the social media in step (1) is microblog.
3. The method of claim 2, wherein the step (1) comprises:
determining keywords and classification places of flood disaster information to be acquired;
simulating user login, page click, scrolling and input operations by adopting a Selenium automation tool, and acquiring character and picture data according to key words;
calling a microblog API to acquire the release time and the release place, comparing the release time and the release place with the occurrence time of a flood disaster, eliminating irrelevant data, deleting the stored text information, reserving repeated information with the earliest release time, classifying and storing pictures according to places if classified places are included when picture information is stored, resampling the acquired pictures, calculating a hash value, solving a hamming distance by using the hash value, and deleting the picture.
4. The method of claim 1, wherein the YOLO v5 convolutional neural network is used for image recognition in step (2), and specifically includes two parts, namely a key object recognition model and a key object part recognition model, in the picture, the picture containing the specified key object is used as input to train the key object recognition model, and the picture with the serial number of the key object part is used as input to train the key object part recognition model; and (3) identifying the picture information obtained in the step (1) after training.
5. The method for acquiring flood disaster information from social media according to claim 1, wherein the picture recognition result in the step (3) is a picture with an object identification frame, so that the picture recognition result can be directly observed; and storing the recognition result data, including: the picture name, the part serial number obtained by identification, the central point position of the identification frame and the length and width of the identification frame.
6. The method for acquiring flood disaster information from social media according to claim 1, wherein the criteria in step (3) are specifically: the key-part height criterion is obtained by determining the height represented by the designated key-object part by referring to the associated manufacturing criteria for the designated key-object.
7. The method of claim 1, wherein for a certain key object, a plurality of key parts are set, the height corresponding to the lowest key part identified in the image is taken as water level information, and if any key part cannot be identified, the water level depth is considered to reach the height corresponding to the highest key part, so as to obtain the water level information corresponding to the key object; when a plurality of key objects are identified in the picture, the water level information corresponding to each key object is compared, abnormal values are removed, and the water level information is averaged to obtain the water level information of the place.
CN202210306767.1A 2022-03-25 2022-03-25 Method for acquiring flood disaster information from social media Pending CN114708485A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210306767.1A CN114708485A (en) 2022-03-25 2022-03-25 Method for acquiring flood disaster information from social media

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210306767.1A CN114708485A (en) 2022-03-25 2022-03-25 Method for acquiring flood disaster information from social media

Publications (1)

Publication Number Publication Date
CN114708485A true CN114708485A (en) 2022-07-05

Family

ID=82170045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210306767.1A Pending CN114708485A (en) 2022-03-25 2022-03-25 Method for acquiring flood disaster information from social media

Country Status (1)

Country Link
CN (1) CN114708485A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170800A (en) * 2022-07-15 2022-10-11 浙江大学 Urban waterlogging deep recognition method based on social media and deep learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170800A (en) * 2022-07-15 2022-10-11 浙江大学 Urban waterlogging deep recognition method based on social media and deep learning

Similar Documents

Publication Publication Date Title
CN111460247B (en) Automatic detection method for network picture sensitive characters
CN105608456B (en) A kind of multi-direction Method for text detection based on full convolutional network
CN107729363B (en) Bird population identification analysis method based on GoogLeNet network model
CN112668375B (en) Tourist distribution analysis system and method in scenic spot
CN109447069A (en) Collecting vehicle information recognition methods and system towards intelligent terminal
CN105608454A (en) Text structure part detection neural network based text detection method and system
CN109918648B (en) Rumor depth detection method based on dynamic sliding window feature score
CN111428558A (en) Vehicle detection method based on improved YO L Ov3 method
CN109635010B (en) User characteristic and characteristic factor extraction and query method and system
CN108446312A (en) Remote sensing image search method based on depth convolution semantic net
CN112560895A (en) Bridge crack detection method based on improved PSPNet network
CN111046213B (en) Knowledge base construction method based on image recognition
CN114708485A (en) Method for acquiring flood disaster information from social media
CN109597926A (en) A kind of information acquisition method and system based on social media emergency event
CN116597270A (en) Road damage target detection method based on attention mechanism integrated learning network
CN116206112A (en) Remote sensing image semantic segmentation method based on multi-scale feature fusion and SAM
CN112785610B (en) Lane line semantic segmentation method integrating low-level features
CN101615255B (en) Video text multi-frame interfusion method
CN113256978A (en) Method and system for diagnosing urban congestion area and storage medium
CN117455237A (en) Road traffic accident risk prediction method based on multi-source data
CN116630683A (en) Road damage detection method based on diffusion self-adaptive feature extraction
CN112015937B (en) Picture geographic positioning method and system
CN116012709A (en) High-resolution remote sensing image building extraction method and system
Wang et al. Instance segmentation of soft‐story buildings from street‐view images with semiautomatic annotation
CN111678531B (en) Subway path planning method based on LightGBM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination