CN115511280A - Urban flood toughness evaluation method based on multi-mode data fusion - Google Patents

Urban flood toughness evaluation method based on multi-mode data fusion Download PDF

Info

Publication number
CN115511280A
CN115511280A CN202211139339.0A CN202211139339A CN115511280A CN 115511280 A CN115511280 A CN 115511280A CN 202211139339 A CN202211139339 A CN 202211139339A CN 115511280 A CN115511280 A CN 115511280A
Authority
CN
China
Prior art keywords
flood
toughness
data
text
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211139339.0A
Other languages
Chinese (zh)
Inventor
冯天
张微
胡晨璐
尤宁宁
沈骏翱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202211139339.0A priority Critical patent/CN115511280A/en
Publication of CN115511280A publication Critical patent/CN115511280A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • G06Q10/063114Status monitoring or status determination for a person or group
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • G06Q50/265Personal security, identity or safety
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Abstract

The invention discloses an urban flood toughness evaluation method based on multi-mode data fusion. In the evaluation task of urban flood toughness, the urban data of two modes and corresponding flood toughness evaluation index labels are preprocessed; then, feature information of two modal data is fused through a data feature fusion module; and finally, evaluating the quality of the urban toughness based on the fusion characteristics through an urban flood toughness evaluation module. Compared with the traditional urban flood toughness evaluation method, the method introduces the multi-mode fusion idea into the urban flood toughness evaluation task for the first time, and verifies the effectiveness of the method.

Description

Urban flood toughness evaluation method based on multi-mode data fusion
Technical Field
The invention relates to the fields of deep learning, computer vision and natural language processing, in particular to a city flood toughness evaluation method based on multi-mode data fusion.
Background
Flooding is one of the most common natural disasters. Since cities contain dense populations, support industry, commerce and many economic and social activities, flooding poses a direct and serious threat to the development of cities and the safety of urban residents to their lives and property. Many cities in different countries and regions are subject to flood losses. Currently, "building a flexible city" has become a development plan and a target of social attention, so as to improve the ability of the city to keep normal operation and actively cope with various disasters. Therefore, by learning disaster history data and evaluating the urban flood toughness, the method can contribute to improving urban disaster adaptability.
The existing urban flood toughness evaluation method mainly comprises a coastal community toughness index, rapid risk evaluation, urban water toughness analysis and the like. The Coastal Community toughness Index (CCRI) is developed by the American national atmospheric oceanic administration in 2015, can predict the function loss and short-term recovery of Coastal urban communities under the conditions of hurricane, storm surge, rainfall and the like, and provides analysis, planning and decision basis; quick Risk Evaluation (QRE) is an evaluation tool developed by United nations disaster reduction administration, and can establish a disaster Risk matrix according to the severity and occurrence possibility of 86 disasters such as flood, earthquake, nuclear explosion and the like, identify and understand risks and output a Risk evaluation result; the City Water toughness analysis method (CWRA) is proposed in 2019 by the Swedish CWRA Steel Group, and scores and adds various indexes through 5 steps of understanding a system to evaluate the toughness of a City Water system, making a plan, implementing the plan, evaluating, learning and adapting and the like to obtain the integral City Water toughness evaluation.
The method is based on a scoring and questionnaire evaluation model, mainly based on manual evaluation, needs to be participated by planners, engineers and decision makers with professional knowledge and experience, and has the obvious limitations of poor instantaneity, strong regional sensitivity, high cost and the like. At present, the rapid development of artificial intelligence technology, especially the wide application of deep learning in the fields of computer vision, natural language processing and the like, provides a new idea for better evaluating the toughness of urban flood. Specifically, the mining capability of the deep neural network is exerted, the characteristic information in different modal data is fused, and the urban flood toughness evaluation problem can be solved by a more advanced, faster and real-time method.
Disclosure of Invention
The invention aims to solve the technical problem of how to fully and effectively fuse characteristic information in image and text modal data by using the technology in the fields of deep learning, computer vision and natural language processing, and provides a city flood toughness evaluation method based on multi-modal data fusion.
The invention adopts the following specific technical scheme:
a city flood toughness evaluation method based on multi-mode data fusion comprises the following steps:
s1, carrying out regional division on a target city according to a road network to form a series of irregular regions; setting a plurality of sampling points in each irregular area, and acquiring corresponding multi-mode data consisting of image data and text data aiming at each sampling point, wherein the image data is a street view image at the sampling point, and the text data is a social network text about flood published on a social network by a user in a neighborhood space range taking the sampling point as a center; calculating the flood toughness value of each irregular area based on the surface water volume change data set of the target city, and endowing each irregular area with a binary label representing the quality of the flood toughness according to the average value of the flood toughness values of all the irregular areas as a threshold value; the flood toughness label of each sampling point location is a binaryzation label corresponding to an irregular area where the sampling point location is located;
s2, under the supervision of the flood toughness labels of all sampling points, training the urban flood toughness evaluation model in the urban flood toughness evaluation model formed by the data feature fusion module and the urban flood toughness evaluation module according to multi-mode data of all the sampling points of the target city; in the urban flood toughness evaluation model, firstly, a data feature fusion module respectively extracts image data and text data in multi-modal data, wherein shallow features and deep features of street view images are respectively extracted in an image feature extraction network and are fused to obtain image features, text features of the text data are extracted in a pre-training language model based on an attention mechanism, the image features and the text features are spliced to form fusion features, the fusion features are input into the urban flood toughness evaluation module to evaluate the flood toughness of each sampling point, and predicted flood toughness labels of each sampling point are output;
and S3, acquiring multi-mode data which is composed of image data and text data and corresponds to any point location to be evaluated, inputting the multi-mode data into the trained urban flood toughness evaluation model, and outputting a flood toughness label of the point location to be evaluated.
Preferably, the street view image at each sampling point needs to be preprocessed in the following manner: firstly, the street view image is zoomed (Resize) to the same size, then center clipping is carried out, then normalization (Normalize) is carried out, and finally corrected street view image data are obtained.
Preferably, the social network text at each sampling point needs to be preprocessed in the following manner: firstly, deleting stop words and numbers in a text; then removing the website, english words, chinese and non-alphabetic characters beginning at @ based on the regular expression; and finally, performing part-of-speech restoration on the text data to obtain corrected twitter text data.
Preferably, the specific method for dividing the target city into regions according to the road network to form a series of irregular regions is as follows: the method comprises the steps of firstly obtaining a primary vector road network in a target city, then carrying out expansion processing to eliminate noise and smooth a boundary, then carrying out corrosion operation to remove unnecessary details in the road network, and finally storing the boundary longitude and latitude coordinate information of each closed irregular area and the irregular area in a vector surface graph mode.
Preferably, a specific method of assigning a binarized label representing the superiority and inferiority of the flood toughness to each irregular area is as follows:
acquiring a surface water quantity change data set of a target city and denoising the surface water quantity change data set; constructing a Mask (Mask) matrix based on the boundary longitude and latitude coordinate information of each irregular area, and extracting time sequence data of flood generation periods in each irregular area from the denoised data set by using the Mask matrix;
aiming at each irregular area, calculating the toughness value of each flood occurrence period in the irregular area, wherein the calculation formula is as follows:
Figure BDA0003852814400000031
in the formula: t is t 0 Recording time when the flood area in the irregular area exceeds an area threshold value for the first time during the current flood occurrence period; q (t) is a fitted curve of flood area to time in the irregular area during the current flood, and the range of Q (t) values is normalized to [0,100]]Within the range; t is t 1 Calculating the time for the flood area in the irregular area to fall back below the area threshold value again in the current flood generation period according to the fitting curve Q (t);
aiming at each irregular area, averaging the toughness values of all rounds of flood generation periods in the irregular area, and taking the obtained first average value as the flood toughness value of the irregular area; averaging the flood toughness values of all irregular areas in the target city, and taking an obtained second average value as a toughness threshold value;
finally, endowing each irregular area with a binary label representing the toughness of the flood, if the toughness value of the flood in one irregular area is higher than the toughness threshold value, endowing a first flood toughness label, and if not, endowing a second flood toughness label; wherein the flood toughness of the irregular area having the first flood toughness label is superior to the irregular area having the second flood toughness label.
Preferably, the image feature extraction network in the data feature fusion module adopts a ResNet50 network model, the street view image is input into the ResNet50 network model, a shallow feature is obtained from a first residual block of the ResNet50, a deep feature is obtained from a fourth residual block of the ResNet50, and the shallow feature and the deep feature are fused to obtain a final image feature of the street view image.
Preferably, the pre-training language model based on the attention mechanism in the data feature fusion module adopts a BERT (Bidirectional Encoder retrieval from transforms) model, and each piece of social network text is input into the BERT model and a text feature is output.
Preferably, when the image features and the text features are fused in the data feature fusion module, all text features corresponding to the same sampling point location are weighted and fused, the final text features are output, and the weight of each text feature is negatively correlated with the distance between the corresponding social network user and the sampling point location when weighted fusion is performed; and then fusing the image features and the final text features in a splicing (Concat) mode to obtain the final fusion features of the fusion text information and the image information.
Preferably, the urban flood toughness evaluation module is realized by adopting a linear classifier, and the fusion features are input into the linear classifier for secondary classification to obtain a prediction result of the flood toughness label.
Preferably, in each irregular area, sampling points need to be arranged along an internal road network; the street view image at each sampling point comprises street views in four directions, namely front, back, left and right, and the radius of the neighborhood space range of each sampling point is 0.5-1.5 km.
Compared with the prior art, the invention has the following beneficial effects:
according to the method, the urban flood toughness evaluation module based on multi-mode data fusion is introduced, a more advanced multi-mode fusion technology focusing on the urban toughness field is applied, and the efficiency of urban flood toughness evaluation can be remarkably improved while high labor cost is avoided. Compared with the existing manual evaluation method, the method avoids the problems of information omission, poor real-time performance and the like which possibly occur in data processing.
Drawings
FIG. 1 is a schematic structural diagram of an urban flood toughness evaluation model;
fig. 2 is a training flow chart of the urban flood toughness evaluation model.
Detailed Description
The invention will be further elucidated and described with reference to the drawings and the detailed description. The technical features of the embodiments of the present invention can be combined correspondingly without mutual conflict.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will recognize without departing from the spirit and scope of the present invention. The technical characteristics in the embodiments of the invention can be correspondingly combined on the premise of no mutual conflict.
In the description of the present invention, it is to be understood that this concept of toughness, an accurate and theoretically supported definition of toughness, is the basis of the entire experiment. Its definition is the ability to recover quickly from a disaster, a concept that is both time and space related.
In the existing evaluation methods for flood toughness, such as coastal community toughness index and urban water toughness analysis methods, most of the evaluation methods rely on manual evaluation, and the initial toughness evaluation of the city needs to be completed by means of the existing information given by a system and depending on the experience of experts. It is obvious that these methods have problems of high labor cost, long time consumption, low evaluation efficiency, poor real-time property, and the like. The emerging of the neural network at the present stage provides a new idea for the toughness evaluation problem, so that the urban flood toughness evaluation problem is solved by adopting knowledge of multi-mode data fusion, utilizing the learning performance of the neural network and using an advanced, fast-computing and real-time method with strong robustness and high fault tolerance.
Therefore, the urban flood toughness evaluation method based on multi-modal data fusion provided by the invention specifically comprises the following steps: the method comprises the steps of collecting street view images, text data and water quantity change data, preprocessing the data of two modes and the water quantity change data through a data preprocessing module to obtain corrected street view images, text data and toughness label data obtained through processing, inputting the street view images into ResNet50 shallow and deep networks to obtain shallow and deep features, fusing the shallow and deep features to obtain image features, obtaining text features through BERT (best effort) of the text data, conducting multi-mode fusion on the image features and the text features, and finally obtaining classification results of city toughness through a city flood toughness evaluation module.
As a better implementation mode of the invention, the invention provides an urban flood toughness evaluation method based on multi-mode data fusion, which comprises the following steps:
s1, performing region division on a target city according to a road network to form a series of irregular regions; setting a plurality of sampling points in each irregular area, and acquiring corresponding multi-mode data consisting of image data and text data aiming at each sampling point, wherein the image data is a street view image of the sampling point, and the text data is a social network text which is published on a social network by a user in a neighborhood space range with the sampling point as the center and is related to flood; calculating the flood toughness value of each irregular area based on the surface water volume change data set of the target city, and endowing each irregular area with a binary label representing the quality of the flood toughness according to the average value of the flood toughness values of all the irregular areas as a threshold value; the flood toughness label of each sampling point location is a binaryzation label corresponding to an irregular area where the sampling point location is located.
As a preferred implementation manner of the present invention, a specific method for dividing a target city into regions according to a road network to form a series of irregular regions is as follows: the method comprises the steps of firstly obtaining a primary vector road network in a target city, then carrying out expansion processing to eliminate noise and smooth a boundary, then carrying out corrosion operation to remove unnecessary details in the road network, and finally storing the boundary longitude and latitude coordinate information of each closed irregular area and the irregular area in a vector surface graph mode.
As a preferred implementation manner of the present invention, the street view image at each sampling point needs to be preprocessed in the following manner: firstly, the street view image is zoomed (Resize) to the same size, then center clipping is carried out, then normalization (normalization) is carried out, and finally corrected street view image data are obtained.
As a preferred implementation manner of the present invention, the social network text at each sampling point needs to be preprocessed in the following manner: firstly, deleting stop words and numbers in a text; then removing the website, english words, chinese and non-alphabetic characters beginning at @ based on the regular expression; and finally, performing part-of-speech restoration on the text data to obtain corrected twitter text data.
As a better implementation manner of the invention, in each irregular area, sampling point locations need to be arranged along an internal road network; the street view image at each sampling point comprises street views in four directions, namely front, back, left and right, and the radius of the neighborhood space range of each sampling point is preferably 0.5-1.5 km.
As a preferred implementation of the present invention, a specific method of assigning a binarization label representing the quality of the toughness of the flood to each irregular area is as follows:
acquiring a surface water quantity change data set of a target city and denoising the surface water quantity change data set; constructing a Mask (Mask) matrix based on the boundary longitude and latitude coordinate information of each irregular area, and extracting time sequence data of flood occurrence periods in each irregular area from the data set subjected to denoising treatment by using the Mask matrix;
aiming at each irregular area, calculating the toughness value of each flood occurrence period in the irregular area, wherein the calculation formula is as follows:
Figure BDA0003852814400000061
in the formula: t is t 0 Record that the flood area in the irregular area exceeds the area threshold for the first time during the current flood occurrenceTime; q (t) is a fitted curve of flood area to time in the irregular area during the current flood, and the range of Q (t) values is normalized to [0,100]]Within the range; t is t 1 Calculating the time for the flood area in the irregular area to fall back below the area threshold value again in the current flood generation period according to the fitting curve Q (t);
aiming at each irregular area, averaging the toughness values of flood generation periods of all rounds in the irregular area, and taking the obtained first average value as the flood toughness value of the irregular area; averaging flood toughness values of all irregular areas in the target city to obtain a second average value serving as a toughness threshold;
finally, endowing each irregular area with a binary label representing the toughness of the flood, if the toughness value of the flood in one irregular area is higher than the toughness threshold value, endowing a first flood toughness label, and if not, endowing a second flood toughness label; wherein the flood toughness of the irregular area having the first flood toughness label is superior to the irregular area having the second flood toughness label.
It should be noted that, in the training phase, the flood toughness labels are assigned in units of irregular areas, and the labels of all sampling points in each irregular area are directly the labels corresponding to the irregular area. But in the inference stage, the flood toughness labels are generated by taking each sampling point position as a unit.
S2, under the supervision of flood toughness labels of all sampling point locations, training a city flood toughness evaluation model by using multi-mode data of all sampling point locations of a target city in the city flood toughness evaluation model consisting of a data feature fusion module and a city flood toughness evaluation module; in the urban flood toughness evaluation model, firstly, a data feature fusion module respectively extracts image data and text data in multi-modal data, wherein shallow features and deep features of street view images are respectively extracted in an image feature extraction network and image features are obtained through fusion, text features of the text data are extracted in a pre-training language model based on an attention mechanism, the image features and the text features are spliced to form fusion features, the fusion features are input into the urban flood toughness evaluation module to carry out flood toughness evaluation on each sampling point, and predicted flood toughness marks of each sampling point are output.
As a better implementation mode of the invention, an image feature extraction network in the data feature fusion module adopts a ResNet50 network model, street view images are input into the ResNet50 network model, shallow features are obtained by a first residual block of the ResNet50, deep features are obtained by a fourth residual block of the ResNet50, and the shallow features and the deep features are fused to obtain final image features of the street view images.
As a better implementation mode of the invention, the pre-training language model based on the attention mechanism in the data feature fusion module adopts a BERT (Bidirectional Encoder retrieval from transformations) model, and each piece of social network text is input into the BERT model to output the text feature. The BERT pre-training model takes the cleaned text data obtained by the data preprocessing module as input, and firstly carries out word segmentation, filling, truncation and conversion on the text data. The word segmentation device (Tokenizer) of the BERT obtains the index corresponding to each word; the filling and the truncation depend on the length of the sentence, the two operations are to make the length of the incoming sentence equal, if the length of the sentence is larger than the maximum sentence length (max _ length), the redundant part is cut off, and if the length of the sentence is smaller than the maximum sentence length, the filling is carried out; the conversion is to convert the sentence into a set of word indices, add the encoded text to a list, and convert it into a vector. And then, transmitting the vector into a BERT pre-training model for fine tuning, wherein during training, the BERT is required to be connected with a linear classification layer for supervised training, after the training, the parameter of the BERT is reserved, then, a BERT model is reloaded, the parameter is transmitted into a new BERT model, the linear classification layer in the model is deleted, and only the word vector obtained by a BERT encoder layer is used as the generated text characteristic.
As a preferred implementation manner of the present invention, when the image features and the text features are fused in the data feature fusion module, all the text features corresponding to the same sampling point are weighted and fused, and the final text features are output. And the weight of each text feature is negatively correlated with the distance between the corresponding social network user and the sampling point location when weighted fusion is performed, that is, the closer the user who publishes a certain social network text is to the sampling point location, the larger the weight of the corresponding text feature is, otherwise, the farther the user who publishes a certain social network text is from the sampling point location, the smaller the weight of the corresponding text feature is. Then, the image feature and the final text feature are fused in a splicing (concatemate, hereinafter, concat) manner, so as to obtain a fusion feature of the final fusion text information and the image information.
As a better implementation mode of the invention, the urban flood toughness evaluation module is realized by adopting a linear classifier, and the fusion features are input into the linear classifier for secondary classification to obtain the prediction result of the flood toughness label.
And S3, acquiring multi-mode data which is composed of image data and text data and corresponds to any point location to be evaluated, inputting the multi-mode data into the trained urban flood toughness evaluation model, and outputting a flood toughness label of the point location to be evaluated.
Examples
In the embodiment, a method for evaluating urban flood toughness based on multi-modal data fusion is provided, which specifically comprises the following steps: inputting the street view image and the twitter text into a data preprocessing module, and acquiring preprocessed image data and preprocessed text data; inputting the image data and the text data into a data feature fusion module, extracting features of the two modal data through an image feature extraction model and a text feature extraction model respectively, and fusing the features of the two modal data by using a connection (hereinafter, referred to as Concat) method to obtain a fusion feature; and inputting the fusion characteristics into an urban flood toughness evaluation module, and outputting a classification result of urban flood toughness.
The data preprocessing module in the embodiment comprises a street view image preprocessing submodule, a twitter text preprocessing submodule and a tag data preprocessing submodule. The street view image preprocessing module takes street view images of different areas downloaded by an Application Programming Interface (API) as input to preprocess the street view images. And the twitter text preprocessing module takes the crawled twitter text containing the flood keywords and in the fixed latitude and longitude range as input to preprocess twitter text data.
Overall this is a supervised binary task, so label data to determine toughness is required. The tag data preprocessing submodule takes surface water volume data of an irregular area as input, the data set adopted in the embodiment is a data set (link is https:// explorer. Sandbox. Dea. Ga. Gov. Au/products/ga _ ls _ wo _ 3) observed by terrestrial satellite water of Australian geoscience, and the area is divided according to the geographic position, namely longitude and latitude, of the data. The core of this task is to divide the appropriate regions and to match the image text and the tags exactly. The region of the embodiment adopts the principle of road network division, divides a region into a plurality of irregular regions, requests the mode and configuration of street networks of all regions of the world by using an open source map database tool (OSMnx), converts complex cities into calculation and data, and analyzes the form of the cities from the quantitative and qualitative aspects.
The data set used in this embodiment is described in detail below, the target city in this embodiment is sydney, and the data set used in this embodiment includes street view images downloaded by google street view API, twitter texts crawled according to flood keywords, and a water volume change data set. It should be noted that it is very important that the data and tags of both modalities be precisely aligned over the area.
The regions need to be defined and divided first. The regions may be divided according to administrative regions or zip codes, may be divided into rectangular regions by equidistant straight lines, and may be divided according to road networks. The present embodiment uses road network division. Because the OSM data is open source, the initial road network data for the sydney zone can be downloaded directly on the web site. The general idea of road network division is as follows: obtaining a vector road network through an OSM; simplifying for the first time according to the road network grade; converting the vector diagram into a grid diagram; performing secondary simplification through expansion corrosion; vectorizing the twice simplified grid map to obtain boundary coordinate information of each region. The initial simplification of the road network needs to be carried out through ArcGIS, low-grade roads are removed, and high-grade roads are reserved for the simplification of the road network. In the embodiment, the first three levels of roads are used as main roads for carrying out region division, and each divided region comprises a plurality of low-level roads, so that street view sampling points are obtained on the four low-level roads in each region. The road network is simplified secondarily, because the road information stored in the vector diagram is segmented, and the segmentation still exists after centerline extraction, in order to find the boundary of each closed area, each small segment of road needs to be corresponding to the corresponding area and converted into a polygon in a combined mode, so that the method is complex, the road network diagram is converted into a grid diagram, expansion corrosion processing is carried out on the grid diagram, then the centerline is extracted and converted into the vector diagram, and the boundary longitude and latitude coordinate information of each closed area is obtained. The grid map is converted into a vector map, and each closed region of the converted vector map is represented as a Polygon, namely a series of longitude and latitude coordinate points.
In addition, street view images need to be acquired, an OSMnx library is used for acquiring a vector road network in the region from an OSM, the road network is traversed by adopting the rear 4-level roads, and each road adopts 1 to 2 point locations. Since the road network may contain many small sections of roads (such as branches, side roads, circular lines and the like), setting road length thresholds of 50 and 100, and not sampling road sections below 50 m; the distance is higher than 50 and lower than 100, and the middle point of the road is set as a sampling point; and taking two trisection points of the road section as sampling points, wherein the trisection points are higher than 100. Each sampling point adopts four directions, namely front, back, left and right directions, and the front direction represents the advancing direction of the road.
As for the collection of text data, the embodiment acquires text information by crawling twitter located in a certain longitude and latitude interval, and performs screening through english keywords related to "flood". The keywords include "rain", "flood", "storm", "infrastructure", "drainage", and the like.
The data set for acquiring water volume changes is a data set for terrestrial satellite water observation in the earth science in australia, and contains data in the format of GeoTIFF. GeoTIFF is used in various geographic information systems, photogrammetry, remote sensing, and the like, and contains geocoded information such as a coordinate system in which an image is located, a scale, coordinates of points on the image, latitude and longitude, units of length, units of angle, and the like. The data corresponding to the image of the data set is stored in the yaml file, the data is sourced on the AWS, an account of the AWS needs to be registered, an access key needs to be obtained, and the data can be obtained only through correct configuration of a Python terminal. And configuring an AWS CLI at a Python terminal, entering a configuration command line through an aw configure command, filling a key with an AWS access key ID and part of an access key password, filling ap-southeast-2 by the default area name, and filling json by the default output format. After the AWS is connected, the data can be imported into the Datacube. And finally, the water quantity change of the area can be obtained from the Datacube within the range of time and longitude and latitude, and the water quantity change is used for calculating the later toughness label after preliminary treatment.
The data preprocessing module is described in detail below, and includes preprocessing of street view images, preprocessing of twitter texts, and preprocessing of toughness labels.
The preprocessing of the street view images needs to perform image normalization preprocessing on the collected street view images, all the street view images need to be adjusted to the same size, then the mean value and the variance of an image data set are calculated, and normalization operation is performed through a transform module of Torchvision.
As for the pretreatment of the tweet text, before text data is transmitted into a model, data cleaning is needed, an nltk tool is used for deleting stop words and numbers, the website at the beginning of http, english words at the beginning of @ and some messy code Chinese characters are matched by a method based on regular expressions, the texts are deleted, and different tenses of the same word are restored into the word by a method of part-of-speech restoration.
For the calculation of the tenacity label, first, images of the extracted water at different times were taken. It is known that the sydney region is divided into regions according to a road network, and each of the divided irregular regions is used as a community. And exporting the whole irregular area in the ArcGIS into a SHAPFILIE file for subsequent application. The latitude and longitude information of the bounding box of each community can be obtained by directly reading the SHAPEFLE file. Regional water quantity change information acquisition is required according to longitude and latitude information of each irregular region. The water quantity change in the irregular area is acquired by inputting the longitude and latitude information and the time range of the boundary box mainly through a Datacube interface of WO, and the water quantity information in the boundary box can be acquired.
The label data preprocessing establishes a mask matrix through the border box (Bounding box) information of each irregular area and the position information of points on a road network, the points on the road network are connected into a line to obtain a closed area, the points in the area and on the border are marked as 1, and the parts outside the border are marked as 0, so that the mask matrix is established. Multiplying the water distribution data matrix with a mask matrix, only extracting water quantity information in an irregular area, marking 1 data of water and 0 land data in matrix data obtained after multiplication, then taking a data coordinate with a median value of 0 in the mask matrix, marking 0.5 data values of the positions in the obtained matrix data, and then generating an image by the obtained matrix, wherein the name of each image comprises time and geographic information.
For the present embodiment, spatial coordinate registration is extremely important. The default coordinate system is EPSG:4326, but the loading of the water observation data is under the EPSG:3577 coordinate system, so that a mask matrix is generated conveniently, and the coordinate system is directly converted into the EPSG:3577 in ArcGIS, so that the subsequent operation is convenient. The purpose is to obtain the water quantity change of the irregular area, so the mask method is adopted to find the external rectangle of the irregular area, then the external rectangle and the irregular area are marked according to the pixel size, the internal mark of the irregular area is 1, the external mark is 0, and the water quantity information of the irregular area is extracted. Finally, the value of coordinate (x, y) outside the irregular area boundary in the water quantity matrix is marked as 0.5, and then an image is generated according to the matrix, wherein (x, y) represents the position of the value 0 in the mask matrix.
The definition of the toughness is that the city can resist disasters by self capacity, reduce disaster loss and quickly recover from the disasters, namely the toughness is defined by the change of water quantity along with time from the beginning to the end of flood. For each irregular area, calculating a toughness value of each flood occurrence period in the irregular area, wherein a calculation formula of the toughness value is defined as:
Figure BDA0003852814400000121
in the formula: t is t 0 Recording time when the flood area in the irregular area exceeds an area threshold value for the first time (specifically, optimization can be carried out according to actual conditions) during the current flood occurrence period; q (t) is a fitted curve of flood area versus time in the irregular area during the current flood occurrence, and the range of values of Q (t) is normalized to [0,100]]Within the range; t is t 1 The time for which the flood area in the irregular area falls back below the area threshold again during the current flood occurrence period is calculated according to the fitted curve Q (t).
And time sequence data of flood occurrence periods in each irregular area can be extracted from the data set by using the mask matrix. The obtained data can obtain the number of pixel blocks (namely the coverage area of water) of water in different regions at different time points through preliminary processing. Since only data that floods occur in continuous time, that is, a period of time during which the floods occur, needs to be extracted, noise data needs to be removed. Since toughness is defined as how fast recovery is.
It should be noted that according to the definition of the toughness value described above, it is necessary to perform fitting of Q (t) in advance and determine t for each irregular area during each flood occurrence 0 And t 1 。t 0 Can be directly determined according to the data set, but the t is t due to the problem that the acquisition period of the original data satellite is too long 1 The specific time cannot be directly given, and needs to be calculated. t is t 1 The value of (a) is calculated by calculating a fitting curve Q (t) and then calculating the end point time t of the flood according to the fitting curve Q (t) 1
In this example, the curve Q (t) is fitted to a 5-degree term curve. Moreover, the data on the curve needs to be normalized according to the definition of the toughness value, and the used method is maximum and minimum normalization, so that the minimum water quantity y _ min and the maximum water quantity y _ max need to be fixed, and the value range of Q (t) is normalized in the range of [0,100 ].
And calculating the value of the toughness value of each irregular area according to the calculation formula of the integral form, then averaging the toughness values of all rounds of flood occurrence periods in the irregular area, and taking the obtained first average value as the flood toughness value of the irregular area. And averaging the flood toughness values of all irregular areas in the target city to obtain a second average value serving as a toughness threshold. Finally, endowing each irregular area with a binary label representing the toughness of the flood, if the toughness value of the flood in one irregular area is higher than the toughness threshold value, endowing a first flood toughness label, and if not, endowing a second flood toughness label; wherein the flood toughness of the irregular area having the first flood toughness label is superior to the irregular area having the second flood toughness label. In this embodiment, after the calculation, an average value is calculated for all toughness values, an irregular area with toughness greater than the average value is recorded as 1, which represents that the irregular area has better flood toughness, and an irregular area less than the average value is recorded as 0, which represents that the irregular area has worse flood toughness, so that the final output result of the tag data preprocessing module can be obtained.
The data obtained by the label data preprocessing module can be used for supervision training of the urban flood toughness evaluation model. The urban flood toughness evaluation model consists of a data characteristic fusion module and an urban flood toughness evaluation module. Fig. 1 is an overall structure diagram of an urban flood toughness evaluation model. The whole model is built by means of a PyTorch framework.
The specific structure of the data feature fusion module of this embodiment is described in detail below, and the data feature fusion module includes two parts, namely, multi-modal feature extraction and feature fusion.
The multi-modal feature extraction comprises feature extraction of street view images and feature extraction of twitter texts. The feature extraction of the street view image comprises the steps that firstly, the street view image is input into a ResNet50 network after being normalized, some information in a low-dimensional feature map can be lost in a high-dimensional feature map along with the continuous deepening of the network depth, in order to enable features of different scales to contain rich semantic information, the shallow feature and the deep feature of the street view image are obtained through a shallow network (a first residual block of the ResNet 50) and a deep network (a fourth residual block of the ResNet 50) of the ResNet50 respectively, and feature fusion is carried out on the shallow feature and the deep feature to serve as an image feature part input into a multi-mode model. The feature extraction of the twitter text adopts a BERT pre-training model, and the pre-processing of the text before the text is input into the BERT comprises word segmentation, filling, truncation and conversion. The Tokenizer tool of the BERT is used for segmenting words, so that the index corresponding to each word can be obtained, and the BERT is transmitted into a word index set of sentences. In order to make each sentence equal in length, if the length of the sentence is larger than max _ length, the redundant part is cut off, and if the length is smaller than max _ length, filling is performed. The encoded text is added to the list and converted into vectors. The pre-training is performed with BERT to adjust the internal parameters in BERT models, so that BERT is required to be connected to a linear classification layer for Fine-tuning (Fine-tune) during training, parameters are retained after training, the linear classification layer of the model is removed, and only the word vector obtained through BERT encoder layer is used as the generated text feature.
And the feature fusion in the data feature fusion module is to perform weighted fusion on the acquired image features and the text features to obtain new multi-modal fusion features. In this embodiment, before fusion, appropriate text information needs to be matched according to the longitude and latitude information of each street view image, it is known that both twitter text data and street view image data contain longitude and latitude information, and the influence on the adjacent data is larger by applying the idea of Inverse Distance Weighting (IDW), so that the text data generated by twitter users within 1km from the radius of the street view image is matched with the twitter text data. When the image features and the text features are fused, because text features generated by a plurality of users may exist, all text features corresponding to the same sampling point location need to be weighted and fused, the final text features are output, when the text features are weighted and fused, the corresponding text features are sorted according to the distance of the images according to the increasing order, the weights of the text features are calculated based on the IDW thought, and the weight of each text feature is in negative correlation with the distance between the corresponding social network user and the sampling point location. And then fusing the image features and the final text features in a splicing (Concat) mode to obtain the final multi-mode fusion features fusing the text information and the image information.
And finally, inputting the multi-mode fusion feature vectors into an urban flood toughness evaluation module to evaluate the flood toughness of each sampling point, and outputting predicted flood toughness labels of each sampling point. The urban flood toughness evaluation module is trained with linear classifier and forms in this embodiment, and the training test is through dividing training set and test set, according to 8: and 2, dividing, and adjusting parameters by adjusting the learning rate, the number of training rounds, the optimizer and some parameters in the model to achieve the maximum possible accuracy. The output result of the linear classifier is the probability of taking 0 and 1 in the binary classification, and the probability with high probability is taken as the final urban toughness prediction result. The training process of the urban flood toughness evaluation model is shown in fig. 2.
After the urban flood toughness evaluation model is trained, the urban flood toughness evaluation model can be used for actual urban flood toughness evaluation application. And aiming at any point location to be evaluated in the target disk city, acquiring multi-mode data which is composed of image data and text data and corresponds to the point location to be evaluated, inputting the multi-mode data into the trained urban flood toughness evaluation model, and outputting a flood toughness label of the point location to be evaluated.
The above-described embodiments are merely preferred embodiments of the present invention, and are not intended to limit the present invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical solutions obtained by means of equivalent substitution or equivalent transformation all fall within the protection scope of the present invention.

Claims (10)

1. A city flood toughness evaluation method based on multi-mode data fusion is characterized by comprising the following steps:
s1, carrying out regional division on a target city according to a road network to form a series of irregular regions; setting a plurality of sampling points in each irregular area, and acquiring corresponding multi-mode data consisting of image data and text data aiming at each sampling point, wherein the image data is a street view image of the sampling point, and the text data is a social network text which is published on a social network by a user in a neighborhood space range with the sampling point as the center and is related to flood; calculating a flood toughness value of each irregular area based on a surface water volume change data set of a target city, and endowing each irregular area with a binary label representing the superiority and inferiority of the flood toughness according to the average value of the flood toughness values of all the irregular areas as a threshold value; the flood toughness label of each sampling point location is a binarization label corresponding to an irregular area where the sampling point location is located;
s2, under the supervision of flood toughness labels of all sampling point locations, training a city flood toughness evaluation model by using multi-mode data of all sampling point locations of a target city in the city flood toughness evaluation model consisting of a data feature fusion module and a city flood toughness evaluation module; in the urban flood toughness evaluation model, firstly, a data feature fusion module respectively extracts image data and text data in multi-modal data, wherein shallow features and deep features of street view images are respectively extracted in an image feature extraction network and are fused to obtain image features, text features of the text data are extracted in a pre-training language model based on an attention mechanism, the image features and the text features are spliced to form fusion features, the fusion features are input into the urban flood toughness evaluation module to evaluate the flood toughness of each sampling point, and predicted flood toughness labels of each sampling point are output;
and S3, acquiring multi-mode data which is composed of image data and text data and corresponds to any point location to be evaluated, inputting the multi-mode data into the trained urban flood toughness evaluation model, and outputting a flood toughness label of the point location to be evaluated.
2. The urban flood toughness evaluation method based on multi-modal data fusion as claimed in claim 1, wherein the street view image at each sampling point needs to be preprocessed in a manner that: firstly, the street view image is zoomed (Resize) to the same size, then center clipping is carried out, then normalization (normalization) is carried out, and finally corrected street view image data are obtained.
3. The method for evaluating the toughness of the urban flood based on the multi-modal data fusion of claim 1, wherein the social network text at each sampling point needs to be preprocessed in a manner that: firstly, deleting stop words and numbers in a text; then removing the website, english words, chinese and non-alphabetic characters starting at @ based on the regular expression; and finally, performing part-of-speech restoration on the text data to obtain corrected twitter text data.
4. The method for evaluating the toughness of the urban flood based on the multi-modal data fusion of claim 1, wherein the specific method for performing the regional division on the target city according to the road network to form a series of irregular regions comprises the following steps: firstly, a primary vector road network in a target city is obtained, then expansion processing is carried out to eliminate noise and smooth boundaries, then corrosion operation is carried out to remove unnecessary details in the road network, and finally, the longitude and latitude coordinate information of each closed irregular area and the boundary longitude and latitude coordinate information of the irregular area are stored in the form of a vector surface diagram.
5. The urban flood toughness evaluation method based on multi-modal data fusion as claimed in claim 1, wherein the specific method of assigning a binary label representing the superiority and inferiority of flood toughness to each irregular area is as follows:
acquiring a surface water quantity change data set of a target city and denoising the surface water quantity change data set; constructing a Mask (Mask) matrix based on the boundary longitude and latitude coordinate information of each irregular area, and extracting time sequence data of flood generation periods in each irregular area from the denoised data set by using the Mask matrix;
aiming at each irregular area, calculating the toughness value of each flood occurrence period in the irregular area, wherein the calculation formula is as follows:
Figure FDA0003852814390000021
in the formula: t is t 0 Recording time when the flood area in the irregular area exceeds an area threshold value for the first time during the current flood occurrence period; q (t) is a fitted curve of flood area versus time in the irregular area during the current flood occurrence, and the range of values of Q (t) is normalized to [0,100]]Within the range; t is t 1 Calculating the time for the flood area in the irregular area to fall back below the area threshold value again in the current flood generation period according to the fitting curve Q (t);
aiming at each irregular area, averaging the toughness values of all rounds of flood generation periods in the irregular area, and taking the obtained first average value as the flood toughness value of the irregular area; averaging flood toughness values of all irregular areas in the target city to obtain a second average value serving as a toughness threshold;
finally, endowing each irregular area with a binary label representing the toughness of the flood, if the toughness value of the flood in one irregular area is higher than the toughness threshold value, endowing a first flood toughness label, and if not, endowing a second flood toughness label; wherein the flood toughness of the irregular area having the first flood toughness label is superior to the irregular area having the second flood toughness label.
6. The method as claimed in claim 1, wherein the image feature extraction network in the data feature fusion module adopts a ResNet50 network model, street view images are input into the ResNet50 network model, shallow features are obtained from a first residual block of the ResNet50, deep features are obtained from a fourth residual block of the ResNet50, and the shallow features and the deep features are fused to obtain final image features of the street view images.
7. The method as claimed in claim 1, wherein the pre-training language model based on attention mechanism in the data feature fusion module adopts a BERT (Bidirectional Encoder retrieval from transforms) model, and each piece of social network text is input into the BERT model and a text feature is output.
8. The method for evaluating the toughness of the urban flood based on the multi-modal data fusion of claim 1, wherein when the image features and the text features are fused in the data feature fusion module, all the text features corresponding to the same sampling point are weighted and fused, a final text feature is output, and the weight of each text feature is negatively correlated with the distance between the corresponding social network user and the sampling point when the weighted fusion is performed; and then fusing the image features and the final text features in a splicing (Concat) mode to obtain the final fusion features of the fusion text information and the image information.
9. The method of claim 1, wherein the urban flood toughness evaluation module is implemented by using a linear classifier, and the fusion features are input into the linear classifier for secondary classification to obtain the prediction result of the flood toughness label.
10. The urban flood toughness evaluation method based on multi-modal data fusion as claimed in claim 1, wherein in each irregular area, sampling points are required to be arranged along an internal road network; the street view image at each sampling point comprises street views in four directions, namely front, back, left and right, and the radius of the neighborhood space range of each sampling point is 0.5-1.5 km.
CN202211139339.0A 2022-09-19 2022-09-19 Urban flood toughness evaluation method based on multi-mode data fusion Pending CN115511280A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211139339.0A CN115511280A (en) 2022-09-19 2022-09-19 Urban flood toughness evaluation method based on multi-mode data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211139339.0A CN115511280A (en) 2022-09-19 2022-09-19 Urban flood toughness evaluation method based on multi-mode data fusion

Publications (1)

Publication Number Publication Date
CN115511280A true CN115511280A (en) 2022-12-23

Family

ID=84504481

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211139339.0A Pending CN115511280A (en) 2022-09-19 2022-09-19 Urban flood toughness evaluation method based on multi-mode data fusion

Country Status (1)

Country Link
CN (1) CN115511280A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116307260A (en) * 2023-05-11 2023-06-23 深圳市规划和自然资源数据管理中心(深圳市空间地理信息中心) Urban road network toughness optimization method and system for disturbance of defective road sections

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116307260A (en) * 2023-05-11 2023-06-23 深圳市规划和自然资源数据管理中心(深圳市空间地理信息中心) Urban road network toughness optimization method and system for disturbance of defective road sections
CN116307260B (en) * 2023-05-11 2023-08-08 深圳市规划和自然资源数据管理中心(深圳市空间地理信息中心) Urban road network toughness optimization method and system for disturbance of defective road sections

Similar Documents

Publication Publication Date Title
Hao et al. Leveraging multimodal social media data for rapid disaster damage assessment
CN112001385B (en) Target cross-domain detection and understanding method, system, equipment and storage medium
CN113780296B (en) Remote sensing image semantic segmentation method and system based on multi-scale information fusion
CN109448361B (en) Resident traffic travel flow prediction system and prediction method thereof
Gupta et al. Deep learning-based aerial image segmentation with open data for disaster impact assessment
Jafari et al. Real-time water level monitoring using live cameras and computer vision techniques
CN113256649B (en) Remote sensing image station selection and line selection semantic segmentation method based on deep learning
CN112149547A (en) Remote sensing image water body identification based on image pyramid guidance and pixel pair matching
CN104142995A (en) Social event recognition method based on visual attributes
CN112668375B (en) Tourist distribution analysis system and method in scenic spot
KR20220125719A (en) Method and equipment for training target detection model, method and equipment for detection of target object, electronic equipment, storage medium and computer program
Gazzea et al. Automated satellite-based assessment of hurricane impacts on roadways
Zhang et al. Social media meets big urban data: A case study of urban waterlogging analysis
Lin et al. Rapid urban flood risk mapping for data-scarce environments using social sensing and region-stable deep neural network
CN116662468A (en) Urban functional area identification method and system based on geographic object space mode characteristics
CN115511280A (en) Urban flood toughness evaluation method based on multi-mode data fusion
Liu et al. Cloud detection using super pixel classification and semantic segmentation
Kaur et al. A review on natural disaster detection in social media and satellite imagery using machine learning and deep learning
Mohan et al. A brief review of recent developments in the integration of deep learning with GIS
Fu et al. Extracting historical flood locations from news media data by the named entity recognition (NER) model to assess urban flood susceptibility
CN117556197A (en) Typhoon vortex initialization method based on artificial intelligence
CN116384844B (en) Decision method and device based on geographic information cloud platform
Gupta et al. Cnn-based semantic change detection in satellite imagery
Liu et al. Landslide susceptibility mapping with the fusion of multi-feature SVM model based FCM sampling strategy: A case study from Shaanxi Province
Hao et al. Hurricane Damage Assessment with Multi-, Crowd-Sourced Image Data: A Case Study of Hurricane Irma in the City of Miami.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination