CN114708485A

CN114708485A - Method for acquiring flood disaster information from social media

Info

Publication number: CN114708485A
Application number: CN202210306767.1A
Authority: CN
Inventors: 张凌嘉; 梁汉远; 顾海挺; 江衍铭; 许月萍
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2022-07-05

Abstract

The invention relates to a method for acquiring flood disaster information from social media, which is realized based on a Selenium automation tool and a YOLO v5 convolution neural network, and comprises the following steps: (1) utilizing a Selenium automation tool to simulate login to obtain character and picture data when a flood disaster happens; (2) based on the acquired flood picture data, identifying key objects and key parts in the flood by adopting a YOLO v5 convolutional neural network; (3) and converting the recognition result of the flood character information and the image information into the water level depth according to a preset standard. According to the method, the efficiency and the accuracy of identifying the anti-crawler effect and the YOLO v5 convolutional neural network in the automatic webpage simulation of the Selenium are considered, and more flood data except for a conventional hydrological site are obtained through social media platforms such as microblogs, so that an accurate hydrological model can be better established.

Description

Method for acquiring flood disaster information from social media

Technical Field

The invention belongs to the field of intelligent water conservancy, relates to a new method for acquiring flood disaster information from social media, and particularly relates to a social media flood information acquisition method based on a Selenium automation and YOLO neural network.

Background

The establishment of the hydrological model in flood forecasting is usually based on data of a traditional hydrological observation station, but due to the limitation of spatial layout, the traditional hydrological observation station cannot observe water level data of person dense areas such as urban areas, while a large amount of real-time image, voice, video, text, number and other information about flood disasters shared by relatives or bystanders on social media can provide effective disaster information except the traditional hydrological observation station, and the flood forecasting model can be better verified by utilizing the data so as to improve the accuracy of the flood forecasting. Flood disaster data in the social network platform can be efficiently collected through a web crawler technology, but the data is often extremely large in data volume and extremely high in repetition rate, has the problems of exaggeration, delay, false and the like, and cannot be directly obtained and utilized. Machine learning techniques can then fully identify and utilize the valid information collected in social media.

A web crawler based on a Selenium automation tool directly simulates user operation in a browser through running a test script to complete terminal test of an application program, and the problem that Requests cannot execute JavaScript codes is solved in a mode of simulating user login.

Machine learning enables a machine to have an analysis learning ability like a human being and to recognize data such as characters, images, and sounds. When machine learning is used for realistic tasks, the features describing the sample typically need to be designed by human experts, which is called "Feature Engineering". The quality of the features has a crucial influence on the generalization performance, and it is not easy for a human expert to design good features. Due to the introduction of a Convolutional Neural Network (CNN) based on deep learning, the problems of sliding window selection and artificial feature extraction in machine learning are solved, and the real-time performance and the accuracy of target detection are greatly improved. YOLO v1 was proposed in 2015, and its core idea is to take the whole picture as the input of the network and directly implement the determination of the bounding box position and classification at the output layer. The YOLO v5 convolutional neural network is further in network lightweight, and the training and recognition speed is higher.

Disclosure of Invention

In order to solve the defects of the prior art, the invention aims to provide a method for acquiring flood disaster information from social media based on the Selenium automation and the YOLO neural network, so as to effectively utilize flood disaster related data on the social media.

In order to achieve the above object, the technical scheme adopted by the invention is as follows:

a method for acquiring flood disaster information from social media is characterized by comprising the following steps:

(1) simulating user login by using a Selenium automation tool to obtain text and picture information when a flood disaster happens from a social media;

(2) identifying key objects and key parts in the flood by using the obtained flood related picture data and using a YOLO v5 convolutional neural network training model;

(3) and converting the flood character information and the picture recognition result into water level data by using a preset key part height standard.

In the above technical solution, further, the social media in the step (1) is a microblog.

Further, the step (1) comprises:

determining keywords and classification places of flood disaster information to be acquired;

simulating user login, page click, scrolling and input operations by adopting a Selenium automation tool, and acquiring character and picture data according to key words;

calling a microblog API to acquire the release time and the release place, comparing the release time and the release place with the occurrence time of a flood disaster, eliminating irrelevant data, deleting the stored text information, reserving repeated information with the earliest release time, classifying and storing pictures according to places if classified places are included when picture information is stored, resampling the acquired pictures, calculating a hash value, solving a hamming distance by using the hash value, and deleting the picture.

Further, in the step (2), a YOLO v5 convolutional neural network is adopted for image recognition, specifically comprising a key object recognition model and a key object part recognition model in the picture, the picture containing the specified key object is used as input to train the key object recognition model, and the picture marking the related key object part and the serial number is used as input to train the key object part recognition model; and (3) identifying the picture information obtained in the step (1) after training.

Further, the picture identification result in the step (3) is a picture with an object identification frame, so that the picture identification result can be directly observed conveniently; and storing identification result data including: the picture name, the serial number of the part obtained by identification, the central point position of the identification frame and the length and the width of the identification frame;

further, the criteria in step (3) are specifically: determining the height represented by the part of the specified key object by consulting the relevant manufacturing standard of the specified key object, thereby forming a key part height standard;

setting a plurality of key parts for a certain key object, taking the height corresponding to the lowest key part identified in the image as water level information, and if any key part cannot be identified, considering that the water level depth reaches the height corresponding to the highest key part, thereby obtaining the water level information corresponding to the key object; when a plurality of key objects are identified in the picture, the water level information corresponding to each key object is compared, abnormal values are removed, and the water level information is averaged to obtain the water level information of the place. The more the key parts are arranged, the higher the detection precision is.

The invention has the beneficial effects that:

according to the method, the efficiency and the accuracy of identifying the anti-crawler effect and the YOLO v5 convolutional neural network in the automatic webpage simulation of the Selenium are considered, and more flood data except for a conventional hydrological site are obtained through social media platforms such as microblogs, so that an accurate hydrological model can be better established.

Drawings

FIG. 1 is a schematic flow chart of an embodiment of the present invention.

Fig. 2 is an example of a key object recognition result of the acquired picture information.

Fig. 3 is an example 1 of a key object part recognition result of the acquired picture information.

Fig. 4 is an example 2 of a key object part recognition result of the acquired picture information.

Fig. 5 is an example of a picture information analysis process.

Detailed Description

The technical scheme of the invention is further explained in detail by the examples and the accompanying drawings.

The whole information acquisition process is shown in fig. 1.

A method for acquiring flood disaster information from social media comprises the following steps:

(1) simulating user login by using a Selenium automation tool to obtain text and picture information when a flood disaster happens from a social media; the social media can be microblogs or other social media; the specific definition of the text and picture information is as follows:

1) the text information comprises the re-processed microblog text content with the keywords, the user name, the user grade, the release time, the comment like forwarding amount, the keywords, the name discussion number and the popularity of the topic;

2) the picture information is an attached figure of the microblog and time and place information acquired by calling a microblog API.

The specific process of obtaining information from social media is as follows:

1) and determining keywords and classification places of flood disaster information to be acquired.

2) Simulating user login, page click, scrolling and input operations by using a Selenium automation tool so as to acquire more data;

3) searching topics related to the hydrology disaster keywords from a topic column according to the keywords, and storing websites of the topics;

4) acquiring page text content and other data from the acquired topic website and storing the page text content and other data, storing text information in an excel table form, storing picture information in a jpg format, calling a microblog API (application program interface) to acquire published time and place, comparing the published time and place with the occurrence time of flood disasters, and rejecting data with overlarge time span; deleting duplicate information of the stored character information, and reserving duplicate information with the earliest release time; when picture information is obtained, calling a microblog API to obtain the time and place of release, and if the picture information comprises classified places, classifying the pictures according to the places and storing the pictures;

5) resampling the obtained picture into 8 pixels by 8 pixels, calculating a hash value of the resampled picture, and performing deletion and duplication processing on the picture by solving a Hamming distance by using the hash value.

In the above process, care should be taken to remove pictures containing a large amount of official text announcements and other irrelevant pictures, to screen out duplicate pictures, and to remove pictures where key objects or key information cannot be identified (for example, where a key object is identified but a part of any key object cannot be identified, it cannot be determined that the key object is identified as an error or is completely drowned).

the method specifically comprises a key object identification model and a key object part identification model in a picture, wherein the picture containing a specified key object is used as input to train the key object identification model, and the picture marking the part and the serial number of the related key object is used as input to train the key object part identification model; and (3) identifying the picture information obtained in the step (1) after training. The recognition result of the key object determines which part recognition model is adopted subsequently, and the recognition result of the key object part directly determines the water level depth represented by the picture. Through tests, six hundred pictures are input into the trained part recognition model, and the accuracy rate of part recognition reaches 0.84.

(3) And converting the flood character information and the picture recognition result into water level data by using a preset key part height standard. The picture identification result is a picture with an object identification frame, so that the picture identification result can be conveniently and directly observed; and storing identification result data including: the picture name, the serial number of the part obtained by identification, the central point position of the identification frame and the length and the width of the identification frame; when a plurality of key objects are identified in the picture, the water level information corresponding to each key object is compared, abnormal values are removed, and the water level information is averaged to obtain the water level information of the place.

For the text information, automatically comparing the acquired microblog text content with the counted common water level description keywords, and if the common water level description keywords exist in the text content, keeping the water level information of the place; if not, the text information is removed.

The standard specifically comprises: determining the height represented by the part of the specified key object by consulting the relevant manufacturing standard of the specified key object, thereby forming a key part height standard; if a key part is identified in the image, the water level depth is considered to not submerge the part, so that the depth which the water level does not reach is determined according to the standard. The specific appointed key objects can be various vehicle types, people and the like, the basic size and the part height of the selected key objects are determined by referring to related domestic manufacturing standards and biological data and are arranged in sequence from high to low, if the part is identified, a height feedback is generated, and the height in the standard is in millimeter unit.

The method is described below with reference to specific examples, a flood disaster to be acquired, such as liqima flood causing huge economic casualties in 2019, is determined, a keyword "liqima" is selected, and four sites A, B, C, D are selected in an attempt to identify missing data sites according to existing hydrologic sites.

Keywords are transmitted to a search bar through a Selenium automation tool to obtain topic websites, microblog data from 8/10/2019 to 8/16/2019 in topics are further obtained, repeated and irrelevant data are removed, three thousand pieces of character information and five thousand pieces of picture information can be obtained, and a table 1 shows an example of capturing microblog character information by the Selenium automation tool.

The key objects selected by the picture analysis are 'cars', 'cars' comprise 'cars (car),' buses), 'trucks (truck)'; key parts of the automobile are "back _ light", "door (car _ door)", and "tire (tire)". The identification of key objects and key parts all uses the YOLO v5 convolutional neural network, and table 2 is the standard used in this example. The pictures are identified by using the water level identification model, and the water level information of each place containing the pictures can be obtained, as shown in table 3. The picture data analysis process is shown in fig. 5, and the data can assist model verification for urban flood forecast warning.

Through statistics, the accuracy rate of water level identification in the example reaches 0.97, wherein the error is caused by that the relief of the terrain of the picture shooting site is large, and a key object is just positioned in the accumulated water to cause water level identification error.

Table 1 example of capturing microblog text information by a Selenium automation tool

TABLE 2 Key objects and their location information height conversion criteria

Categories	Rear lamp	Vehicle door	Tyre
				Car (R.C.)	300mm	100mm	50mm
Public transport	400mm	200mm	70mm
				Truck	500mm	300mm	100mm

TABLE 3 Water level depth (mm) for identification of each picture sample

The foregoing description is only exemplary of the implementation of the present invention and is not intended to limit the invention thereto. The selection of the key objects and parts to be treated and the establishment of the standard can be specifically established according to different research problems. Various modifications and alterations of this invention will occur to those skilled in the art. All changes, equivalents, modifications and the like which come within the scope of the invention as defined by the appended claims are intended to be embraced therein.

Claims

1. A method for acquiring flood disaster information from social media is characterized by comprising the following steps:

2. The method for acquiring flood disaster information from social media according to claim 1, wherein the social media in step (1) is microblog.

3. The method of claim 2, wherein the step (1) comprises:

4. The method of claim 1, wherein the YOLO v5 convolutional neural network is used for image recognition in step (2), and specifically includes two parts, namely a key object recognition model and a key object part recognition model, in the picture, the picture containing the specified key object is used as input to train the key object recognition model, and the picture with the serial number of the key object part is used as input to train the key object part recognition model; and (3) identifying the picture information obtained in the step (1) after training.

5. The method for acquiring flood disaster information from social media according to claim 1, wherein the picture recognition result in the step (3) is a picture with an object identification frame, so that the picture recognition result can be directly observed; and storing the recognition result data, including: the picture name, the part serial number obtained by identification, the central point position of the identification frame and the length and width of the identification frame.

6. The method for acquiring flood disaster information from social media according to claim 1, wherein the criteria in step (3) are specifically: the key-part height criterion is obtained by determining the height represented by the designated key-object part by referring to the associated manufacturing criteria for the designated key-object.

7. The method of claim 1, wherein for a certain key object, a plurality of key parts are set, the height corresponding to the lowest key part identified in the image is taken as water level information, and if any key part cannot be identified, the water level depth is considered to reach the height corresponding to the highest key part, so as to obtain the water level information corresponding to the key object; when a plurality of key objects are identified in the picture, the water level information corresponding to each key object is compared, abnormal values are removed, and the water level information is averaged to obtain the water level information of the place.