CN117292230A - Building earthquake damage intelligent assessment method based on multi-mode large model - Google Patents

Building earthquake damage intelligent assessment method based on multi-mode large model Download PDF

Info

Publication number
CN117292230A
CN117292230A CN202311278623.0A CN202311278623A CN117292230A CN 117292230 A CN117292230 A CN 117292230A CN 202311278623 A CN202311278623 A CN 202311278623A CN 117292230 A CN117292230 A CN 117292230A
Authority
CN
China
Prior art keywords
damage
building
mode
earthquake
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311278623.0A
Other languages
Chinese (zh)
Inventor
王健泽
江永清
戴靠山
徐军
丁焕龙
郁文海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Xunhui Technology Co ltd
Sichuan University
Original Assignee
Chengdu Xunhui Technology Co ltd
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Xunhui Technology Co ltd, Sichuan University filed Critical Chengdu Xunhui Technology Co ltd
Priority to CN202311278623.0A priority Critical patent/CN117292230A/en
Publication of CN117292230A publication Critical patent/CN117292230A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • G06V10/811Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data the classifiers operating on different input data, e.g. multi-modal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)
  • Remote Sensing (AREA)

Abstract

The invention discloses an intelligent assessment method for building earthquake damage based on a multi-mode large model, and belongs to the technical field of earthquake engineering; the method comprises the following steps: acquiring a damage picture to construct a multi-mode earthquake damage assessment data set; constructing a multi-mode earthquake damage evaluation large model, optimizing by using an image-text contrast loss function, and outputting a new multi-mode vector representation through a multi-mode encoder; training the multi-mode earthquake damage assessment large model by adopting a data set, and integrating the trained multi-mode earthquake damage assessment large model algorithm into an unmanned plane or a patrol vehicle; and carrying out real-time earthquake damage evaluation on each building on the earthquake disaster site to generate regional building disaster damage evaluation reports. The evaluation method provided by the invention realizes the function of evaluating the damage degree judgment of the building more professionally and accurately by comprehensively utilizing the multi-mode damage pictures and the multiple information sources described by the damage language and combining the deep learning technology, and can provide important technical support for the earthquake disaster emergency rescue work.

Description

Building earthquake damage intelligent assessment method based on multi-mode large model
Technical Field
The invention belongs to the technical field of earthquake engineering, and particularly relates to an intelligent assessment method for building earthquake damage based on a multi-mode large model.
Background
When natural disasters such as earthquakes occur, buildings are often damaged and injured to different extents. The traditional building damage assessment method is mainly based on field observation and experience judgment of civil engineering professional practitioners, but has the problems of high professional level requirements, low field work efficiency and uncontrollable personal safety.
In addition, most of the existing automatic earthquake damage assessment technologies based on the deep learning method are damage image classification tasks based on convolutional neural network models. For example, based on a convolutional neural network or a machine learning model, the classification and detection tasks of the regional seismological satellite image, the monomer building damage image and the image characteristic parameters thereof are developed. However, a limitation of this type of assessment method is that it relies solely on classifying or locating the damage to the seismological image using a "damage picture" or "digitized feature parameter" dataset, and does not provide other types of information such as textual descriptions or historical data of the damage. Thus, methods that rely on image and digitizing features may be limited by the amount and quality of data and may not provide a comprehensive assessment of the digital, literal impact.
Therefore, in order to accurately evaluate the earthquake damage degree of a building and provide scientific basis for guiding emergency management and post-disaster reconstruction work, development of a building damage earthquake damage evaluation system based on a multi-mode large model is needed to be urgently developed, and the intelligentized and automatic level of building earthquake damage evaluation can be effectively improved.
Disclosure of Invention
The purpose of the invention is that: the intelligent assessment method for the earthquake damage of the building based on the multi-mode large model solves the problems in the prior art, achieves the function of assessing the damage degree judgment of the building more professionally and accurately by comprehensively utilizing multi-mode damage pictures and a plurality of information sources described by damage languages and combining a deep learning technology, and can provide important technical support for emergency rescue work of earthquake disasters.
In order to achieve the above purpose, the present invention adopts the following technical scheme: a building earthquake damage intelligent assessment method based on a multi-mode large model comprises the following steps:
s1, acquiring a seismic damage picture construction data set of a building:
the method comprises the steps of adjusting collected damage pictures to be of the same size through damage pictures of different areas, different building types and different earthquake levels contained in web crawlers, field investigation, post-disaster images and investigation reports, and labeling text description on all collected damage pictures according to a building earthquake damage grade division standard to construct a multi-mode earthquake damage evaluation data set;
s2, constructing a multi-mode earthquake damage evaluation large model:
s21, constructing a multi-mode earthquake damage assessment large model by two image encoders and two text encoders, wherein the inputs corresponding to the four encoders are respectively a thermal imaging image, an RGB earthquake damage picture, damage text information description and laser radar data, the inside of each encoder gradually extracts and encodes the characteristics of different mode data by using a self-attention mechanism and forward propagation, the fixed length characteristic representation of the corresponding input data is obtained through the processing of each encoder, the fixed length characteristic representation comprises the relevance and important characteristics among different mode data, and the parameters are shared among two adjacent encoders;
s22, optimizing the multi-modal characteristics according to a minimization target of an image-text contrast loss function, wherein the image-text contrast loss function is used for measuring similarity or relativity between images and texts and consists of three parts: the image-text matching loss is used for measuring the matching degree between the image and the corresponding text description; the text generation loss is used for measuring the similarity between the text description generated by the model and the real text description; the damage information classification loss is used for measuring the accuracy between damage information predicted by the model and real damage information;
s23, constructing a multi-mode encoder by a self-attention mechanism, forward propagation, a multi-head self-attention mechanism and a multi-head cross-mode attention mechanism, and outputting a new multi-mode vector representation, wherein each vector comprises fusion of vision, text and position information;
s3, training a multi-mode earthquake damage assessment large model:
performing model training on the basis of preparation of the multi-mode earthquake damage assessment data set in the step S1 and completion of construction of the multi-mode earthquake damage assessment large model in the step S2, and obtaining training weight of the model, wherein the training process comprises two stages of pre-training and fine-tuning, the general understanding of the model to the picture characteristics is trained by adopting a large-scale building damage picture crawled through the Internet and a data set of word description corresponding to the damage phenomenon in the pre-training stage, and the accuracy and generalization capability of the model are improved by adopting a finer data set selected in the step S1 in the fine-tuning stage;
in practical application, the multi-mode earthquake damage assessment large model provides guiding comments of damage category division, damage grade assessment and post-earthquake repair reinforcement under various scenes after building damage pictures are acquired in real time on a given site by loading pre-training weights;
s4, integrating a multi-mode earthquake damage evaluation large model algorithm into the unmanned plane or the patrol vehicle:
integrating the multi-mode earthquake damage assessment large model loaded with the pre-training weight in the step S3 into an unmanned plane or an inspection vehicle system, under an actual disaster scene, shooting pictures of an earthquake damage site in real time by an inspection vehicle or unmanned plane device integrated with the multi-mode large model, recording geographical position information of each picture, and carrying out earthquake damage analysis on a shooting building through the multi-mode earthquake damage assessment large model;
s5, evaluating the earthquake damage of each building in the disaster area of the target:
when the earthquake hazard investigation is carried out for regional building groups, an unmanned plane or a patrol vehicle integrated with a multi-mode earthquake hazard evaluation large model plans a route in advance, and determines the access sequence of each building in a target region to be scanned; collecting images, temperatures and geometric dimension data of each building in disaster areas by using cameras, thermal imagers and laser radar equipment carried by unmanned aerial vehicles or patrol vehicles; inputting the collected disaster area building earthquake damage image data into the multi-mode earthquake damage evaluation large model constructed and trained in the steps S2 and S3, and performing operations of visual feature extraction, text coding, multi-mode coding and loss function optimization to evaluate the earthquake damage degree of each building;
s6, generating a building earthquake damage evaluation report in the disaster-affected target area:
according to the evaluation results of the building earthquake damages obtained by the unmanned plane and the inspection vehicle system integrated with the multi-mode earthquake damage evaluation large model in the step S5, the damage condition of similar buildings in the neighborhood of the given geographical position is evaluated by utilizing the graphic neural network model, the target disaster area is subjected to grid division, each grid is subjected to damage grade division, the earthquake economic loss of each grid is calculated, the disaster area is subjected to overall evaluation according to the damage grade and the loss amount index, the normal earthquake damage evaluation report with the geographical position information is organized, the analysis result is transmitted to a local interface in real time, and the detailed earthquake damage and loss distribution is displayed on a map.
In the step S1, building types comprise masonry houses, reinforced concrete frame structures and wood structure houses; building failure grades are classified as substantially intact, slightly damaged, moderately damaged, severely damaged and destroyed; the damaged pictures are adjusted to the same size of 1024 x 1024 size.
In the step S2, the data processing process inside the image encoder is as follows: the method comprises the steps of dividing a damaged picture into smaller and non-overlapping blocks through a visual feature extraction end in an image encoder, applying a linear projection layer to each block to generate a plane vector, adding position codes to each plane vector to enable a self-attention mechanism to distinguish blocks at different positions, finally taking the plane vector as query, key and value vectors of the attention mechanism, calculating correlation between each block and other blocks through a multi-head self-attention mechanism, and outputting a new vector representation, wherein each vector contains visual information and spatial information in the damaged picture.
In the step S2, the data processing process inside the text encoder is as follows: the text information description and the laser radar data of the building earthquake damage assessment are marked through text coding, the text information description and the laser radar data are converted into a marking sequence, a vector representation with a fixed size is generated for each marking, position coding is added to each vector, a self-attention mechanism can maintain the context and sequence of input text, finally the vector representations are used as query, key and value vectors of the attention mechanism, the relevance between each marking and other marking is calculated through a multi-head self-attention mechanism, and a new vector representation is output, wherein each vector comprises the text description and the text information and the position information in the laser radar data.
In the step S2, assume that the damaged picture I and the text description T corresponding thereto are input, and the degree of matching between the image and the text is measured by using the cosine similarity, where the expression of the degree of matching is:
L match =1-Sim(I,T),
where Sim represents cosine similarity between the image and text,L match representing image-text matching loss;
assuming that the text description generated by the model is T', using cross entropy loss to measure similarity between the generated text description and the real text description, wherein the similarity is expressed as follows:
L generate delta sigma (T × log (T ') + (1-T) × log (1-T '), where T represents the real textual description, T ' represents the textual description of the model generation, L generate Representing text generation loss;
assuming that the prediction result of the model for classifying the damage information is P, the real damage information is D, and the accuracy between the predicted damage information and the real damage information is measured by using cross entropy loss, wherein the expression of the accuracy is as follows:
L classification =-∑(D*log(P)+(1-D)*log(1-*P),
wherein D represents one-hot vector of real damage information, P represents prediction result of model, L classification Indicating a loss of classification of the impairment information.
In the step S6, the specific steps of evaluating by using the graph neural network include:
s61, when the adjacent buildings are similar in structural system, geometric dimension and dynamic characteristic, after the same earthquake disaster is experienced, the earthquake damage degree of the similar buildings can be obtained through the building similarity relation, so that after earthquake damage evaluation is carried out on representative single buildings at a plurality of positions through a multi-mode earthquake damage evaluation large model, damage images, text description and damage information of scene buildings at each position are obtained;
s62, representing the position scenes as nodes in the target evaluation area graph according to the geographic position, the structural system, the building year and the geometric dimension attribute of each position scene, and constructing a graph structure according to the similarity or distance connecting edges among the nodes;
s63, information transmission and updating of a plurality of position scenes in the area are carried out by utilizing the graph structure, so that each building node which is not checked and evaluated can learn information about damage conditions from adjacent building earthquake damage evaluation result nodes, and the damage information of the building node is updated;
s64, according to the result output by the multi-mode earthquake damage evaluation large model and the graphic neural network, economic loss evaluation is carried out on the damage condition of each building in the target disaster area, and regional earthquake damage and loss evaluation reports are generated.
The beneficial effects of the invention are as follows:
1) The evaluation method disclosed by the invention realizes the function of more specialized and accurate evaluation of the damage degree judgment of the building by comprehensively utilizing the multi-mode damage pictures and the multiple information sources of the damage language description and combining the deep learning technology, the damage language description can provide the integral and local description of the earthquake damage images, is beneficial to more accurately dividing and positioning the type and degree of the damage of the building, has higher scientificity and accuracy, and can provide important technical support for the emergency management of the earthquake disasters and the post-disaster rescue work.
2) According to the assessment method, through information fusion and analysis, different information sources of damage language description and images can be provided, the damage language description and the images are mutually supplemented, the earthquake damage condition is more comprehensively described and understood, visual semantic information can be provided by the damage language description, and more specific and fine-grained visual information can be provided by the images; and simultaneously, by combining data with text description, the text information can be subjected to semantic analysis by using a natural language processing technology and is fused with image characteristics, and high-resolution images and various data modes (such as RGB, laser radar and thermal imaging) can be processed so as to improve the accuracy of damage assessment. The nature and extent of damage can be understood in more detail and with greater accuracy than in conventional inspection methods.
3) According to the assessment method, the building earthquake damage pictures and the text descriptions are used as training data sets for training, so that richer and comprehensive information can be provided, the assessment of the type, degree and position of the earthquake damage is facilitated, and therefore more accurate and comprehensive earthquake damage assessment is achieved.
4) According to the assessment method, the multi-mode large model is integrated into the unmanned aerial vehicle or the patrol vehicle, so that the unmanned aerial vehicle and the patrol vehicle can effectively cover a large area, and the assessment and decision of the building earthquake damage can be completed more quickly and widely; the unmanned aerial vehicle or the inspection vehicle can be used for collecting building earthquake hazard image data in difficult terrains, simultaneously completing damage identification and evaluation rapidly, and reducing the need of manual inspection in dangerous areas, thereby reducing the life risk of field professionals to the greatest extent.
5) The evaluation method can realize real-time processing of building earthquake damage data and rapid acquisition of analysis results, thereby improving the timeliness of disaster relief work; the method can be seamlessly integrated with a GIS tool, provides a visual and spatial analysis function, and supports effective decision making and resource priority ordering under emergency; compared with the traditional method, the method can remarkably improve the speed, accuracy and safety of seismic loss evaluation, and provides targeted guidance for emergency rescue and repair work.
Drawings
FIG. 1 is a schematic flow chart of an evaluation method of the present invention;
FIG. 2 is a schematic diagram of a multi-mode earthquake damage assessment large model constructed in the assessment method of the invention;
fig. 3 is a view showing the results of damage assessment and localization obtained in the embodiment of the present invention.
Detailed Description
The invention is further illustrated by the following description in conjunction with the accompanying drawings and specific embodiments.
Examples: 1-3, the invention provides an intelligent assessment method for building earthquake damage based on a multi-mode large model, which comprises the following steps:
s1, acquiring a seismic injury picture construction data set of a building:
the method comprises the steps of collecting damage pictures in different areas, different building types and different earthquake levels through a web crawler, site investigation, post-disaster images and expert reports, adjusting the collected damage pictures to the same size of 1024 x 1024, and labeling text description on all the collected damage pictures according to a building earthquake damage grade division standard to construct a multi-mode earthquake damage evaluation data set.
Building types include masonry houses, reinforced concrete frame structures, and wood-structured houses; building failure grades are classified as substantially intact, slightly damaged, moderately damaged, severely damaged and destroyed.
The specific labeling modes and specifications are as follows: { "Picture sequence number": "001", "text description": building types (masonry house, reinforced concrete frame structure, wood structure house, etc.), building failure grades (I-V, basically intact, slightly broken, moderately broken, severely broken and destroyed), "injury location": spatial location of lesions in the picture, "maintenance opinion": and aiming at the damage condition in each picture, giving out the guiding repair opinion in detail. }
S2, constructing a multi-mode earthquake damage evaluation large model:
the built multi-mode earthquake damage assessment big model is shown in fig. 2, wherein 'multi-mode' refers to a model which can process and understand various types of data input such as text, images, sound and the like, so that the model can learn and infer across multiple perception modes; "Large model" refers to a deep learning model with a large number of parameters, typically requiring a large amount of data to train, and having powerful representation learning and generalization capabilities.
The method comprises the following specific steps: s21, constructing a multi-mode earthquake damage assessment large model by two image encoders and two text encoders, wherein the inputs corresponding to the four encoders are respectively thermal imaging images, RGB earthquake damage pictures, damage text information description and laser radar data, the inside of each encoder gradually extracts and encodes the characteristics of different mode data by using a self-attention mechanism and forward propagation, the fixed length characteristic representation of the corresponding input data is obtained through the processing of each encoder, and the fixed length characteristic representation comprises the relevance and important characteristics among different mode data and carries out parameter sharing between two adjacent encoders.
For an image encoder, the data processing process inside the image encoder is: dividing the damaged picture into smaller non-overlapping blocks, such as 3x3 grids, through a visual feature extraction end inside the image encoder to obtain 9 blocks; applying the linear projection layer to each block generates a planar vector, e.g., a vector of length 64; adding a position code to each plane vector, enabling a self-attention mechanism to distinguish blocks at different positions; finally, taking the plane vector as the query, key and value vector of the attention mechanism, calculating the correlation between each block and other blocks through the multi-head self-attention mechanism, and outputting a new vector representation, for example, using 8 heads, outputting a vector with the length of 8 by each head, and then splicing the vectors to obtain a vector with the length of 64 as the final output, thus obtaining 9 new vector representations, wherein each vector contains visual information and spatial information in a damaged picture.
For a text encoder, the data processing process inside the text encoder is: marking text information description and laser radar data of building earthquake damage assessment through text coding, and converting the text information description and the laser radar data into a marking sequence; for example, byte Pair Encoding (BPE) is used to divide the text description into subword units and convert the lidar data into digital labels; generating a fixed-size vector representation for each tag and adding position codes to each vector to enable the self-attention mechanism to capture the context and order of the input text; finally, the vector representations are used as query, key and value vectors of an attention mechanism, the relevance between each mark and other marks is calculated through a multi-head self-attention mechanism, and a new vector representation is output, wherein each vector comprises text description and text information and position information in laser radar data.
S22, optimizing the multi-modal characteristics according to a minimization target of an image-text contrast loss function, wherein the image-text contrast loss function is used for measuring similarity or relativity between images and texts and consists of three parts: the image-text matching loss is used for measuring the matching degree between the image and the corresponding text description; the text generation loss is used for measuring the similarity between the text description generated by the model and the real text description; the damage information classification loss is used for measuring accuracy between damage information predicted by the model and real damage information.
Assuming that a damaged picture I and a text description T corresponding to the damaged picture I are input, the matching degree between the image and the text is measured by using cosine similarity, and the expression of the matching degree is as follows:
L match =1-Sim(I,T),
wherein Sim represents cosine similarity between the image and the text, L match Representing image-text matching loss;
assuming that the text description generated by the model is T', using cross entropy loss to measure similarity between the generated text description and the real text description, wherein the similarity is expressed as follows:
L generate =-∑(T*log(T′)+(1-T)*log(1-T′),
wherein T represents a real text description, T' represents a text description generated by a model, L generate Representing text generation loss;
assuming that the prediction result of the model for classifying the damage information is P, the real damage information is D, and the accuracy between the predicted damage information and the real damage information is measured by using cross entropy loss, wherein the expression of the accuracy is as follows:
L classification =-∑(D*log(P)+(1-D)*log(1-P),
wherein D represents one-hot vector of real damage information, P represents prediction result of model, L classification Indicating a loss of classification of the impairment information.
The image-text contrast loss function is used for measuring similarity or correlation between images and texts, so that a model is promoted to learn better image-text matching and alignment features, and after excellent multi-modal alignment features are obtained, multi-modal features need to be encoded.
S23, constructing a multi-mode encoder by a self-attention mechanism, forward propagation, a multi-head self-attention mechanism and a multi-head cross-mode attention mechanism, and outputting a new multi-mode vector representation, wherein each vector comprises fusion of vision, text and position information.
Communicating and sharing parameters between the visual feature extraction end and the text encoder in the multi-mode encoder, which means that the visual encoder and the text encoder can learn information of each other and adjust own parameters; two sub-layers are then added in the multi-mode encoder: one is a multi-headed self-attention mechanism for further enhancing the representation inside each modality; the other is a multi-head cross-modal attention mechanism for calculating the correlation between different modalities and outputting a new multi-modal vector representation, each vector comprising a fusion of visual, textual and positional information. The multi-modal encoder maps the data of the different modalities into a shared low-dimensional representation, thereby converting the multi-modal data into a unified representation that, by integrating the multiple modalities, can effectively utilize the supplemental information provided by the lidar, thermal imaging, and visual RGB data, enabling the model to provide a more accurate, comprehensive impairment assessment during seismic response and recovery operations.
S3, training a multi-mode earthquake damage assessment large model:
performing model training on the basis of preparation of the multi-mode earthquake damage assessment data set in the step S1 and completion of construction of the multi-mode earthquake damage assessment large model in the step S2, and obtaining training weight of the model, wherein the training process comprises two stages of pre-training and fine-tuning, the general understanding of the model to the picture characteristics is trained by adopting a large-scale building damage picture crawled through the Internet and a data set of word description corresponding to the damage phenomenon in the pre-training stage, and the accuracy and generalization capability of the model are improved by adopting a finer data set selected in the step S1 in the fine-tuning stage;
in practical application, the multi-mode earthquake damage assessment large model provides guidance comments of damage category division, damage grade assessment and post-earthquake repair reinforcement under various scenes after building damage pictures are acquired in real time on a given site by loading pre-training weights.
S4, integrating a multi-mode earthquake damage evaluation large model algorithm into the unmanned plane or the patrol vehicle:
and (3) integrating the multi-mode earthquake damage assessment large model loaded with the pre-training weight in the step (S3) into an unmanned plane or an inspection vehicle system, taking pictures of an earthquake damage site in real time by an inspection vehicle or unmanned plane device integrated with the multi-mode large model under an actual disaster scene, recording geographical position information of each picture, and carrying out earthquake damage analysis on a taken building through the multi-mode earthquake damage assessment large model.
S5, evaluating the earthquake damage of each building in the disaster-affected target area:
when the earthquake hazard investigation is carried out for regional building groups, an unmanned plane or a patrol vehicle integrated with a multi-mode earthquake hazard evaluation large model plans a route in advance, and the access sequence of each building in a target region to be scanned is determined. Specifically, the disaster area is divided into a plurality of grids, and priority scores are assigned according to the attributes (such as population density, building type, terrain, etc.) of the grids, reflecting the urgency of evaluation thereof. Specifically, each grid may be represented by a tuple (x, y), where x represents the number of the grid and y represents the priority score. The order of access is represented by a list S, where each element corresponds to the number of a grid. The higher the priority score, the more forward the grid is in the list.
Assuming that n grids are provided, the priority scores of the grids are stored in an n-dimensional vector P, and pi represents the priority score of the ith grid. Ranking according to the priority score can obtain a ranking P ', wherein P' i represents the grid number of the ith position after ranking. The access order list S may be defined as follows:
S=[P′[1],P′[2],...,P′[n]],
and obtaining the distribution priority scores according to the grid attributes and calculating the access sequence through a formula.
Collecting image, temperature and geometric dimension data of disaster areas by using cameras, thermal imagers and laser radar equipment carried by an unmanned plane or a patrol vehicle; specifically, the unmanned aerial vehicle and the inspection vehicle fly according to the planned route, and two images are shot on each grid: one is a common RGB image for displaying the appearance and structure of a building; the other is a thermal imaging image for displaying the temperature distribution and thermal anomalies of the building; meanwhile, the unmanned aerial vehicle and the inspection vehicle scan each grid by using a laser radar, and record the distance data of each point for displaying the shape and the height of the building.
And (2) inputting the collected multi-mode earthquake damage data into a multi-mode earthquake damage evaluation large model constructed in the step (S2), and performing operations of visual feature extraction, text coding, multi-mode coding and loss function optimization to obtain damage information of each building damage picture in each grid.
S6, generating a building earthquake damage evaluation report in the disaster-affected target area:
according to the evaluation results of the building earthquake damages obtained by the unmanned plane and the inspection vehicle system integrated with the multi-mode earthquake damage evaluation large model in the step S5, the damage condition of similar buildings in the neighborhood of the given geographical position is evaluated by utilizing the graphic neural network model, the target disaster area is subjected to grid division, each grid is subjected to damage grade division, the earthquake economic loss of each grid is calculated, the disaster area is subjected to overall evaluation according to the damage grade and the loss amount index, the normal earthquake damage evaluation report with the geographical position information is organized, the analysis result is transmitted to a local interface in real time, and the detailed earthquake damage and loss distribution is displayed on a map.
As shown in fig. 3, after the seismic damage evaluation is performed on the four A, B, C and D location scenes by using the multi-modal large model, the damage condition of similar buildings in the neighborhood is evaluated by using the graph neural network. The method comprises the following specific steps:
s61, when the adjacent buildings are similar in structural system, geometric dimension and dynamic characteristic, after the same earthquake disaster is experienced, the earthquake damage degree of the similar buildings can be obtained through the building similarity relation, so that after the earthquake damage evaluation is carried out on A, B, C and D four position scenes through the multi-mode earthquake damage evaluation large model, damage images, text description and damage information of the buildings of each position scene are obtained;
s62, representing each position scene as a node in the target evaluation area graph according to the attribute (such as geographic position, structural system, building year and geometric dimension) of the position scene, and constructing a graph structure according to the similarity or distance connecting edges among the nodes;
s63, information transmission and updating are carried out on the graph structure by utilizing the graph neural network, so that each building node which is not checked and evaluated can learn information about damage conditions from adjacent building earthquake damage evaluation result nodes, and the damage information of the building node is updated;
s64, according to the result output by the multi-mode earthquake damage evaluation large model and the graphic neural network, economic loss evaluation is carried out on the damage condition of each building in the target disaster area, and regional earthquake damage and loss evaluation reports are generated.
Through the steps, the damage condition of similar buildings in the neighborhood can be evaluated by using the graph neural network, so that the efficiency and the accuracy of earthquake damage evaluation are improved.
In addition, according to the embodiment, the model can be automatically learned and optimized along with the continuous collection of the building earthquake damage pictures, so that the accuracy and the efficiency of earthquake damage assessment are improved. In general, the specific embodiment mainly relates to collecting vibration damage pictures in real time, preprocessing the pictures, sending the pictures into a multi-mode large model for training, then outputting damage classification, damage grade assessment and damage geographic positioning information, and carrying out self-learning and optimization.
Different from the method for evaluating the earthquake damage by using the building earthquake damage picture or text only in the current stage, the invention provides an intelligent building earthquake damage evaluation system based on a multi-mode large model, which is used for comprehensively training a plurality of pictures and text description information containing damage information types as data and aims to solve the problems of low field work efficiency, difficult data collection and quantitative analysis of professional persons and the like in the traditional earthquake damage evaluation method.
The method includes the steps of crawling pictures from a social media platform, and inputting the pictures of the building earthquake damages into a multi-mode large model for discrimination; the multi-mode large model is integrated into an unmanned plane system or a patrol vehicle to shoot building earthquake damage pictures in real time and record geographical position information, and the regional damage information of the region to be evaluated can be recorded in real time by the evaluation mode of associating the damage information with the geographical position information; in addition, a non-professional person can obtain the damage degree of the home house of the professional judgment by photographing and inputting pictures into the large model, and the professional person can classify the earthquake damage by means of the large model and quantitatively analyze the classification result, such as the crack size, the peeling area and the like of the concrete member.
The method simplifies the building earthquake damage judging process and improves the accuracy and efficiency of judgment; the professional judging method for house damage of the non-professional is provided, and no expert participation is needed; providing an integrated solution for classification and quantitative analysis for professionals; the method can adapt to different scenes and requirements, and can be updated and iterated continuously along with the increase of data.
The method can provide more accurate and comprehensive results for damage deduction of the building in the disaster-stricken target area by using the graph neural network based on the earthquake damage evaluation results of the buildings; by accurately capturing the structural similarity and the spatial relationship, the integrated graph neural network can remarkably improve the prediction precision of the seismic damage distribution of the building group.
The foregoing is merely illustrative of the present invention and not restrictive, and other modifications and equivalents thereof may occur to those skilled in the art without departing from the spirit and scope of the present invention.

Claims (6)

1. The intelligent assessment method for the damage of the building earthquake damage based on the multi-mode large model is characterized by comprising the following steps of: the method comprises the following steps:
s1, acquiring a seismic damage picture construction data set of a building:
the method comprises the steps of adjusting collected damage pictures to be of the same size through damage pictures of different areas, different building types and different earthquake levels contained in web crawlers, field investigation, post-disaster images and investigation reports, and labeling text description on all collected damage pictures according to a building earthquake damage grade division standard to construct a multi-mode earthquake damage evaluation data set;
s2, constructing a multi-mode earthquake damage evaluation large model:
s21, constructing a multi-mode earthquake damage assessment large model by two image encoders and two text encoders, wherein the inputs corresponding to the four encoders are respectively a thermal imaging image, an RGB earthquake damage picture, damage text information description and laser radar data, the inside of each encoder gradually extracts and encodes the characteristics of different mode data by using a self-attention mechanism and forward propagation, the fixed length characteristic representation of the corresponding input data is obtained through the processing of each encoder, the fixed length characteristic representation comprises the relevance and important characteristics among different mode data, and the parameters are shared among two adjacent encoders;
s22, optimizing the multi-modal characteristics according to a minimization target of an image-text contrast loss function, wherein the image-text contrast loss function is used for measuring similarity or relativity between images and texts and consists of three parts: the image-text matching loss is used for measuring the matching degree between the image and the corresponding text description; the text generation loss is used for measuring the similarity between the text description generated by the model and the real text description; the damage information classification loss is used for measuring the accuracy between damage information predicted by the model and real damage information;
s23, constructing a multi-mode encoder by a self-attention mechanism, forward propagation, a multi-head self-attention mechanism and a multi-head cross-mode attention mechanism, and outputting a new multi-mode vector representation, wherein each vector comprises fusion of vision, text and position information;
s3, training a multi-mode earthquake damage assessment large model:
performing model training on the basis of preparation of the multi-mode earthquake damage assessment data set in the step S1 and completion of construction of the multi-mode earthquake damage assessment large model in the step S2, and obtaining training weight of the model, wherein the training process comprises two stages of pre-training and fine-tuning, the general understanding of the model to the picture characteristics is trained by adopting a large-scale building damage picture crawled through the Internet and a data set of word description corresponding to the damage phenomenon in the pre-training stage, and the accuracy and generalization capability of the model are improved by adopting a finer data set selected in the step S1 in the fine-tuning stage;
in practical application, the multi-mode earthquake damage assessment large model provides guiding comments of damage category division, damage grade assessment and post-earthquake repair reinforcement under various scenes after building damage pictures are acquired in real time on a given site by loading pre-training weights;
s4, integrating a multi-mode earthquake damage evaluation large model algorithm into the unmanned plane or the patrol vehicle:
integrating the multi-mode earthquake damage assessment large model loaded with the pre-training weight in the step S3 into an unmanned plane or an inspection vehicle system, under an actual disaster scene, shooting pictures of an earthquake damage site in real time by an inspection vehicle or unmanned plane device integrated with the multi-mode large model, recording geographical position information of each picture, and carrying out earthquake damage analysis on a shooting building through the multi-mode earthquake damage assessment large model;
s5, evaluating the earthquake damage of each building in the disaster area of the target:
when the earthquake hazard investigation is carried out for regional building groups, an unmanned plane or a patrol vehicle integrated with a multi-mode earthquake hazard evaluation large model plans a route in advance, and determines the access sequence of each building in a target region to be scanned; collecting images, temperatures and geometric dimension data of each building in disaster areas by using cameras, thermal imagers and laser radar equipment carried by unmanned aerial vehicles or patrol vehicles; inputting the collected disaster area building earthquake damage image data into the multi-mode earthquake damage evaluation large model constructed and trained in the steps S2 and S3, and performing operations of visual feature extraction, text coding, multi-mode coding and loss function optimization to evaluate the earthquake damage degree of each building;
s6, generating a building earthquake damage evaluation report in the disaster-affected target area:
according to the evaluation results of the building earthquake damages obtained by the unmanned plane and the inspection vehicle system integrated with the multi-mode earthquake damage evaluation large model in the step S5, the damage condition of similar buildings in the neighborhood of the given geographical position is evaluated by utilizing the graphic neural network model, the target disaster area is subjected to grid division, each grid is subjected to damage grade division, the earthquake economic loss of each grid is calculated, the disaster area is subjected to overall evaluation according to the damage grade and the loss amount index, the normal earthquake damage evaluation report with the geographical position information is organized, the analysis result is transmitted to a local interface in real time, and the detailed earthquake damage and loss distribution is displayed on a map.
2. The intelligent assessment method for building earthquake damage based on the multi-mode large model according to claim 1, wherein the method comprises the following steps: in the step S1, building types comprise masonry houses, reinforced concrete frame structures and wood structure houses; building failure grades are classified as substantially intact, slightly damaged, moderately damaged, severely damaged and destroyed; the damaged pictures are adjusted to the same size of 1024 x 1024 size.
3. The intelligent assessment method for building earthquake damage based on the multi-mode large model according to claim 1, wherein the method comprises the following steps: in the step S2, the data processing process inside the image encoder is as follows: the method comprises the steps of dividing a damaged picture into smaller and non-overlapping blocks through a visual feature extraction end in an image encoder, applying a linear projection layer to each block to generate a plane vector, adding position codes to each plane vector to enable a self-attention mechanism to distinguish blocks at different positions, finally taking the plane vector as query, key and value vectors of the attention mechanism, calculating correlation between each block and other blocks through a multi-head self-attention mechanism, and outputting a new vector representation, wherein each vector contains visual information and spatial information in the damaged picture.
4. The intelligent assessment method for building earthquake damage based on the multi-mode large model according to claim 1, wherein the method comprises the following steps: in the step S2, the data processing process inside the text encoder is as follows: the text information description and the laser radar data of the building earthquake damage assessment are marked through text coding, the text information description and the laser radar data are converted into a marking sequence, a vector representation with a fixed size is generated for each marking, position coding is added to each vector, a self-attention mechanism can maintain the context and sequence of input text, finally the vector representations are used as query, key and value vectors of the attention mechanism, the relevance between each marking and other marking is calculated through a multi-head self-attention mechanism, and a new vector representation is output, wherein each vector comprises the text description and the text information and the position information in the laser radar data.
5. The intelligent assessment method for building earthquake damage based on the multi-mode large model according to claim 1, wherein the method comprises the following steps: in the step S2, assume that the damaged picture I and the text description T corresponding thereto are input, and the degree of matching between the image and the text is measured by using the cosine similarity, where the expression of the degree of matching is:
L match =1-Sim(I,T),
wherein Sim represents cosine similarity between the image and the text, L natch Representing image-text matching loss;
assuming that the text description generated by the model is T', using cross entropy loss to measure similarity between the generated text description and the real text description, wherein the similarity is expressed as follows:
L generate =-∑(T*log(T′)+(1-T)*log(1-T′),
wherein T represents a real text description, T' represents a text description generated by a model, L generate Representing text generation loss;
assuming that the prediction result of the model for classifying the damage information is P, the real damage information is D, and the accuracy between the predicted damage information and the real damage information is measured by using cross entropy loss, wherein the expression of the accuracy is as follows:
L classification the = - [ delta ] (D ] [ log (P) + (1-D ] [ log (1-P) ] wherein D represents one-hot vector of real damage information, P represents prediction result of model, L classification Indicating a loss of classification of the impairment information.
6. The intelligent assessment method for building earthquake damage based on the multi-mode large model according to claim 1, wherein the method comprises the following steps: in the step S6, the specific steps of evaluating by using the graph neural network include:
s61, when the adjacent buildings are similar in structural system, geometric dimension and dynamic characteristic, after the same earthquake disaster is experienced, the earthquake damage degree of the similar buildings can be obtained through the building similarity relation, so that after earthquake damage evaluation is carried out on representative single buildings at a plurality of positions through a multi-mode earthquake damage evaluation large model, damage images, text description and damage information of scene buildings at each position are obtained;
s62, representing the position scenes as nodes in the target evaluation area graph according to the geographic position, the structural system, the building year and the geometric dimension attribute of each position scene, and constructing a graph structure according to the similarity or distance connecting edges among the nodes;
s63, information transmission and updating of a plurality of position scenes in the area are carried out by utilizing the graph structure, so that each building node which is not checked and evaluated can learn information about damage conditions from adjacent building earthquake damage evaluation result nodes, and the damage information of the building node is updated;
s64, according to the result output by the multi-mode earthquake damage evaluation large model and the graphic neural network, economic loss evaluation is carried out on the damage condition of each building in the target disaster area, and regional earthquake damage and loss evaluation reports are generated.
CN202311278623.0A 2023-10-07 2023-10-07 Building earthquake damage intelligent assessment method based on multi-mode large model Pending CN117292230A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311278623.0A CN117292230A (en) 2023-10-07 2023-10-07 Building earthquake damage intelligent assessment method based on multi-mode large model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311278623.0A CN117292230A (en) 2023-10-07 2023-10-07 Building earthquake damage intelligent assessment method based on multi-mode large model

Publications (1)

Publication Number Publication Date
CN117292230A true CN117292230A (en) 2023-12-26

Family

ID=89247725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311278623.0A Pending CN117292230A (en) 2023-10-07 2023-10-07 Building earthquake damage intelligent assessment method based on multi-mode large model

Country Status (1)

Country Link
CN (1) CN117292230A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117557881A (en) * 2024-01-12 2024-02-13 城云科技(中国)有限公司 Road crack detection method based on feature map alignment and image-text matching and application thereof
CN117909853A (en) * 2024-03-19 2024-04-19 合肥通用机械研究院有限公司 Intelligent monitoring method and system for equipment damage based on mechanism and working condition big data
CN117951783A (en) * 2024-01-10 2024-04-30 中国科学院空天信息创新研究院 Method and system for constructing dynamic model of earthquake damage response of cultural relic building
CN118038281A (en) * 2024-04-15 2024-05-14 三峡金沙江川云水电开发有限公司 Crack detection method and device, storage medium and electronic equipment
CN118170933A (en) * 2024-05-13 2024-06-11 之江实验室 Construction method and device of multi-mode corpus data oriented to scientific field

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117951783A (en) * 2024-01-10 2024-04-30 中国科学院空天信息创新研究院 Method and system for constructing dynamic model of earthquake damage response of cultural relic building
CN117557881A (en) * 2024-01-12 2024-02-13 城云科技(中国)有限公司 Road crack detection method based on feature map alignment and image-text matching and application thereof
CN117557881B (en) * 2024-01-12 2024-04-05 城云科技(中国)有限公司 Road crack detection method based on feature map alignment and image-text matching and application thereof
CN117909853A (en) * 2024-03-19 2024-04-19 合肥通用机械研究院有限公司 Intelligent monitoring method and system for equipment damage based on mechanism and working condition big data
CN117909853B (en) * 2024-03-19 2024-05-31 合肥通用机械研究院有限公司 Intelligent monitoring method and system for equipment damage based on mechanism and working condition big data
CN118038281A (en) * 2024-04-15 2024-05-14 三峡金沙江川云水电开发有限公司 Crack detection method and device, storage medium and electronic equipment
CN118170933A (en) * 2024-05-13 2024-06-11 之江实验室 Construction method and device of multi-mode corpus data oriented to scientific field

Similar Documents

Publication Publication Date Title
CN117292230A (en) Building earthquake damage intelligent assessment method based on multi-mode large model
US20230351573A1 (en) Intelligent detection method and unmanned surface vehicle for multiple type faults of near-water bridges
Roberts et al. Towards low-cost pavement condition health monitoring and analysis using deep learning
Wang et al. Machine learning-based regional scale intelligent modeling of building information for natural hazard risk management
CN112800913B (en) Pavement damage data space-time analysis method based on multi-source feature fusion
CN107977656A (en) A kind of pedestrian recognition methods and system again
CN116539004A (en) Communication line engineering investigation design method and system adopting unmanned aerial vehicle mapping
CN115240093B (en) Automatic power transmission channel inspection method based on visible light and laser radar point cloud fusion
CN118470550B (en) Natural resource asset data acquisition method and platform
CN114529721B (en) Urban remote sensing image vegetation coverage recognition method based on deep learning
CN116011816A (en) Building structure-oriented multi-disaster monitoring and early warning method and device
Zou et al. Systematic framework for post-earthquake bridge inspection through UAV and 3D BIM reconstruction
Katrojwar et al. Design of Image based Analysis and Classification using Unmanned Aerial Vehicle
CN110827264A (en) Evaluation system for apparent defects of concrete member
CN112904778B (en) Wild animal intelligent monitoring method based on multi-dimensional information fusion
CN112906511A (en) Wild animal intelligent monitoring method combining individual image and footprint image
Yang et al. Vision transformer-based visual language understanding of the construction process
CN118012977B (en) AI and GIS fusion-based two-dimensional multi-mode data processing method
Khajwal et al. A novel automated post-disaster damage assessment based on multi-view imagery
CN117611877B (en) LS-YOLO network-based remote sensing image landslide detection method
CN117152646B (en) Unmanned electric power inspection AI light-weight large model method and system
CN116739357B (en) Multi-mode fusion perception city existing building wide area monitoring and early warning method and device
CN116363530B (en) Method and device for positioning expressway pavement diseases
CN118277840B (en) Structural damage identification method and device based on migration learning and heterologous data alignment
CN118674886B (en) Intelligent geographic mapping data processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination