CN114880468A - Building specification examination method and system based on BilSTM and knowledge graph - Google Patents

Building specification examination method and system based on BilSTM and knowledge graph Download PDF

Info

Publication number
CN114880468A
CN114880468A CN202210421056.9A CN202210421056A CN114880468A CN 114880468 A CN114880468 A CN 114880468A CN 202210421056 A CN202210421056 A CN 202210421056A CN 114880468 A CN114880468 A CN 114880468A
Authority
CN
China
Prior art keywords
sen
construction drawing
single sentence
examination
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210421056.9A
Other languages
Chinese (zh)
Inventor
冯万利
弭云国
孙欣
马志鹏
乔伟锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaiyin Institute of Technology
Original Assignee
Huaiyin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaiyin Institute of Technology filed Critical Huaiyin Institute of Technology
Priority to CN202210421056.9A priority Critical patent/CN114880468A/en
Publication of CN114880468A publication Critical patent/CN114880468A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a building code examination method and a building code examination system based on BilSTM and a knowledge graph, wherein the method comprises the following steps: 1. carrying out specification preprocessing and BIO (building information organization) annotation on the standard construction drawing examination specification clause to obtain an annotation data set StandData; 2. embedding a BilSTM-CRF neural network model based on BERT by using the StandData training to obtain a construction drawing examination specification entity attribute recognition model SubModle; 3. inputting a SubModle after processing a standard constraint clause of a construction drawing to be inspected; decoding by viterbi and extracting entity attributes to form an entity attribute set EnityData; 4. extracting the relationship of EnityData, establishing a triple list of a standard constraint text, and establishing a construction drawing inspection standard knowledge map by using Neo4 j; 5. and carrying out standard matching on the BIM construction drawing file to be inspected and the construction drawing inspection standard knowledge map to obtain an inspection result. The examination method can realize intelligent examination and improve the examination efficiency of the building construction drawing.

Description

Building specification examination method and system based on BilSTM and knowledge graph
Technical Field
The invention belongs to the technical field of building code examination, and particularly relates to a building code examination method and a corresponding system based on BilSTM and a knowledge graph.
Background
Currently, the inspection of the building construction drawing is generally carried out in a manual mode. And (4) carrying out digital processing on the construction drawing and uploading the construction drawing to a drawing examination platform, and examining the construction drawing one by a drawing examination expert according to a building examination specification. Because of the large number of examination regulations and different specifications of different types of buildings, even by means of an examination auxiliary tool, the examination mode needs to consume huge manpower and material resources.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a building specification examination method and a building specification examination system based on BilSTM and a knowledge graph, and the examination method and the examination system can realize intelligent examination and improve the examination efficiency of a building construction drawing.
The technical scheme is as follows: the invention discloses a building code examination method based on BilSTM and knowledge graph, which comprises the following steps:
s1, carrying out standard preprocessing on the text data in the standard construction drawing examination standard clause and carrying out manual BIO (building information organization) annotation to obtain an annotation data set Standard data;
s2, embedding a BilSTM-CRF neural network model based on BERT by using the StandData training to obtain a construction drawing examination specification entity attribute recognition model submode;
s3, converting the text data in the standard constraint clause of the construction drawing to be inspected into a single sentence in a long sentence mode to form a standard constraint single sentence list PreText; inputting the single sentence in the PreText into a SubModle model to obtain the maximum labeling category probability of each single sentence character element; obtaining the labeling category of each single sentence character element after viterbi decoding; extracting entity attributes of the single sentence in the PreText to form an entity attribute set EnityData;
s4, extracting the relation by adopting an examination specification constraint template according to the entity attribute set EnityData, establishing a triple list of specification constraint texts, and establishing a construction drawing examination specification knowledge graph by using Neo4 j;
and S5, carrying out standard matching on the BIM construction drawing file input by the user and the construction drawing examination specification knowledge map to intelligently obtain examination results.
Specifically, the step S1 specifically includes:
s1.1, converting text data in standard construction drawing examination specification clauses into single sentences in a long sentence mode;
s1.2, manually labeling character elements in each word segment in each single sentence text according to a text sequence labeling mode BIO rule; the labeled labels are: "B-X", "I-X", or "O", wherein "B-X" represents that the word element belongs to X type and that the element is at the beginning of the word segment, "I-X" represents that the word element belongs to X type and that the element is at the middle or end of the word segment, and "O" represents that the word element does not belong to any type; the X types are: name, Attr and Value;
the annotated text constitutes an annotated data set StandData:
StandData={[line 1 ,'|||',lab 1 ],…,[line n ,'|||',lab n ],…,[line num ,'|||',lab num ]}
wherein line n Examining the nth sentence, lab, in the text data in the specification for a standard construction drawing n Is a single sentence line n A label list formed by labels at corresponding positions of each character element, wherein '| | |' is line n And lab n The boundary identification between; n is 1,2, …, num, num is the number of text data single sentences in the standard construction drawing examination specification.
Specifically, the step S2 specifically includes:
s2.1, dividing data in the data set StandData into a training set and a verification set;
s2.2, building a BERT-based embedded BilSTM-CRF neural network model, wherein the input of the network model is a single sentence, and the output is the maximum labeled category probability of each character element in the input single sentence;
the network model firstly carries out word segmentation processing on an input single sentence, adds a character "[ CLS ]" at the head part and a character "[ SEP ]" at the tail part of a word segmentation result, and obtains a token list:
tokens=[[CLS],X 1 ,X 2 ,...,X j ,...,X maxLength-2 ,[SEP]]
wherein X j J +1 th word element of tokens, j being 1,2, …, maxlength, maxlength being the maximum length of the participle list;
converting each word element in the word segmentation list into a word list coding vector according to the Chinese word list data to form a word list coding vector list: tokebe ═ E 1 ,E 2 ,...,E j ,...,E maxLength-2 }; wherein E j Is X j The vocabulary encoding vector of (a);
converting each word element in the word segmentation list into a position embedding vector by adopting one-hot coding according to the index of each word element to form a position embedding vector table: PoEmbe ═ Po 1 ,Po 2 ,...,Po j ,...,Po maxLength-2 }; wherein Po j Is X j The position of (2) embedding the vector;
adding the vocabulary coding vector table and the position embedding vector table to form tensor fianData serving as BERT model incoming data;
tensor fianData is processed by an optimized embedding layer to obtain an optimized embedding vector bertResult of an input single sentence; the optimized embedding layer consists of an optimized layer and a full-connection layer, wherein the optimized layer is formed by alternately cascading an N-layer coding layer and an N-layer pooling layer;
optimizing an embedded vector bertRESULT, inputting a BilSTM model, performing forward LSTM and reverse LSTM training, and then transmitting the forward LSTM and reverse LSTM training into a full-connection layer for dimension conversion to obtain a category vector lstmResult corresponding to an input single sentence; inputting the lstmResult into a CRF layer to obtain the maximum labeling category probability corresponding to each word element in the input single sentence;
s2.3, dividing the training set into a plurality of batches, and training parameters of the BilSTM model and the CRF layer in each batch to obtain a plurality of training models; the training optimizes parameters of the BilSTM model and the CRF layer by reducing a loss function, wherein the loss function is as follows:
Figure BDA0003607656200000031
wherein P is j The true labeled category probability, q, of the jth character element in a training sample single sentence j The maximum labeling category probability output by a CRF layer;
s2.4, inputting the sample single sentence in the verification set into the trained models, and respectively calculating the accuracy; and selecting the model with the highest accuracy as a construction drawing examination specification entity attribute recognition model SubModle.
Specifically, in step 3, entity attribute extraction is performed on the single sentence in the PreText, and the step of forming an entity attribute set is as follows:
s3.1, extracting entity attributes of each single sentence in PreText: senense of the Sen-th single sentence sen Annotated class composition vector PreLab of word element sen =["O",Pl 1 ,Pl 2 ,...,Pl p ,...,Pl len ,"O"](ii) a Wherein Pl p Represents sensents sen The marking category of the p word; len is sensenees sen Length of (d);
general will Sentensices sen Extracting the characters with the label category of B-Name or I-Name for character string splicing to obtain entity information Name sen (ii) a Extracting characters marked with the type of 'B-Attr' or 'I-Attr' for character string splicing to obtain the common attribute Attr sen (ii) a Extracting words marked with the category of 'B-Value' or 'I-Value' for character string splicing to obtain numerical Value attribute Value sen ;(Name sen ,Attr sen ,Value sen ) Namely, the single sentence sentences sen The entity attribute of (2);
s3.2, the set formed by the entity attributes of each single sentence in PreText is an entity attribute set EnityData:
EnityData={(Name 1 ,Attr 1 ,Value 1 ),...,(Name sen ,Attr sen ,Value sen ),...,(Name senLen ,Attr senLen ,Value senLen )}
sen ═ 1,2, …, senLen being the number of single sentences in the PreText.
Specifically, the step S4 specifically includes:
s4.1, extracting the relationship among elements in the entity attribute of each single sentence, and regarding the senseences of the Senst single sentence sen : if the entity information Name sen With the generic attribute Attr sen If there is an inclusion relationship between them, then the entity information Name sen With the generic attribute Attr sen Relation R of NAsen Is "continain";
if the entity information Name sen And Value attribute Value sen If there is a negative trigger word, the entity information Name sen And Value attribute Value sen Relation R of NVsen Is "cannot"; if the entity information Name sen And Value attribute Value sen If there is a positive trigger, the relationship R NVsen Is "should"; the negative trigger includes: "should not", "must", "should not", "strictly prohibited"; the positive trigger includes: "must", "ought", "should", "can";
if the entity information Name sen And Value attribute Value sen There is neither a negative trigger nor a positive trigger, then the generic attribute Attr sen And Value attribute Value sen Relation R of AVsen Is;
s4.2, establishing a triple according to the extracted relation to obtain a triple list tripletList of the standard constraint text: [ Name sen ,R NAsen ,Attr sen ]、[Name sen ,R NVsen ,Value sen ]、[Attr sen ,R AVsen ,Value sen ];
sen ═ 1, 2., senLen is the number of single sentences in the canonical constraint text;
and S4.3, writing the triple list tripletList into a csv file, importing the triple list tripletList into a MySQL database, and establishing a construction drawing examination specification knowledge graph by using Neo4 j.
Specifically, the step S5 specifically includes:
s5.1, converting the BIM construction drawing file input by the user into a JSON file, and analyzing to obtain JSON data _ JSON;
s5.2, performing text matching on the 'item name' attribute value in the data _ json and a tripletList in the tripletList, screening out triplets of which the entity information matches the 'item name' attribute value in the tripletList, forming a candidate triple set, and realizing building type matching;
s5.3, performing text matching on the 'size marking' attribute value in the data _ json and the candidate triple set, screening out a triple of which the numerical attribute matches the 'size marking' attribute value in the candidate triple set, checking whether the size meets the specification according to the relation value in the matched triple, and performing character string splicing on each element of the matched triple if the size does not meet the specification to obtain an opinion on examining the picture; all the examination opinions are combined into an examination result.
On the other hand, the invention also discloses a system for realizing the building code examination method based on the BilSTM and the knowledge graph, which comprises the following steps:
the marking data set acquisition module 1 is used for carrying out standard preprocessing on text data in standard construction drawing examination standard provisions and carrying out manual BIO marking;
the construction drawing examination specification entity attribute recognition model establishing module 2 is used for training a BilSTM-CRF neural network model embedded based on BERT to obtain a construction drawing examination specification entity attribute recognition model submodule;
the entity attribute set establishing module 3 is used for converting the text data in the standard constraint clause of the construction drawing to be examined into a single sentence in a long sentence mode to form a standard constraint single sentence list PreText; inputting the single sentence in the PreText into a SubModle model to obtain the maximum labeling category probability of each single sentence character element; obtaining the labeling category of each single sentence character element after viterbi decoding; extracting entity attributes of the single sentence in the PreText to form an entity attribute set EnityData;
the construction drawing examination specification knowledge graph establishing module 4 is used for extracting the relationship by adopting an examination specification constraint template according to the entity attribute set EnityData, establishing a triple list of a specification constraint text, and establishing a construction drawing examination specification knowledge graph by using Neo4 j;
and the examination result acquisition module 5 is used for carrying out standard matching on the BIM construction drawing file input by the user and the construction drawing examination standard knowledge map to intelligently acquire an examination result.
Further, the system receives a BIM construction drawing file input by a user by adopting a client based on a Web page.
Further, a visualization interface 6 is also included for visually displaying the examination result.
Further, the system also comprises an examination result file generation module 7 for exporting the examination result in a file form.
Has the advantages that: the invention discloses a building specification examination method and a building specification examination system based on BilSTM and a knowledge graph, which adopt a BERT embedded BilSTM + CRF neural network model to establish a construction drawing examination specification entity attribute identification model to label entity attributes of specification texts, extract examination specification relations through a specification text constraint template, further construct triples, and construct the construction drawing examination specification knowledge graph by means of a Neo4j library. By performing text matching on the construction drawing data and the inspection standard knowledge map, the intelligent inspection effect of the construction drawing is achieved, the traditional manual inspection situation of the construction drawing is broken, and the inspection efficiency of the construction drawing is greatly improved.
Drawings
FIG. 1 is a flow chart of a BilSTM and knowledge-graph based building code review method disclosed in the present invention;
FIG. 2 is a schematic structural diagram of an entity attribute identification model of a construction drawing inspection specification;
FIG. 3 is a schematic diagram of the construction code inspection system based on BilSTM and knowledge-graph according to the present invention.
Detailed Description
The invention is further elucidated with reference to the drawings and the detailed description.
The invention discloses a building code examination method based on BilSTM and a knowledge graph, as shown in figure 1, comprising the following steps:
s1, carrying out standard preprocessing on the text data in the standard construction drawing examination standard clause and carrying out manual BIO (building information organization) annotation to obtain an annotation data set Standard data; the method specifically comprises the following steps:
s1.1, converting text data in standard construction drawing examination specification clauses into single sentences in a long sentence mode;
in the invention, standard construction drawing examination specification texts are downloaded from a construction standard network and exported by a TXT file to be used as an original data set OrginText;
and (3) dividing data in the OrginText according to punctuation marks by using a split function, converting a long sentence in the OrginText into a single sentence, defining a lines list to store single sentence data, and writing lines into a preprocessed text PendText in a line-changing manner by using a write function.
S1.2, manually labeling word elements in each word segment in each single sentence text of the PendText according to a text sequence labeling mode BIO rule; the labeled labels are: "B-X", "I-X", or "O", wherein "B-X" represents that the word element belongs to X type and that the element is at the beginning of the word segment, "I-X" represents that the word element belongs to X type and that the element is at the middle or end of the word segment, and "O" represents that the word element does not belong to any type; the X types are: name, Attr and Value;
the annotated text constitutes an annotated data set StandData:
StandData={[line 1 ,'|||',lab 1 ],…,[line n ,'|||',lab n ],…,[line num ,'|||',lab num ]}
wherein line n Examining the nth sentence, lab, in the text data in the specification for a standard construction drawing n Is a single sentence line n A label list formed by labels at corresponding positions of each character element, wherein '| | |' is line n And lab n The boundary identification between; n is 1,2, …, num, num is standard construction drawing examination specification stripNumber of text data single sentences in the text.
S2, embedding a BilSTM-CRF neural network model based on BERT by using the StandData training to obtain a construction drawing examination specification entity attribute recognition model submode; the method specifically comprises the following steps:
s2.1, dividing data in the data set StandData into a training set and a verification set;
in the embodiment, in order to test the accuracy of the model, a part of the data set StandData is used as a test set, and the StandData is divided into a training data set TrData, a test data set TeData and a verification data set DevData according to a ratio of 7:2: 1;
s2.2, building a BERT-based embedded BilSTM-CRF neural network model, wherein the input of the network model is a single sentence, and the output is the maximum labeled category probability of each character element in the input single sentence, and the structure of the network model is shown in FIG 2.
The network model firstly carries out word segmentation processing on an input single sentence by using a library token, adds a character "[ CLS ]", and adds a character "[ SEP ]", at the tail part of a result after word segmentation, and obtains a token list:
tokens=[[CLS],X 1 ,X 2 ,...,X j ,...,X maxLength-2 ,[SEP]]
wherein X j J +1 th word element of tokens, j being 1,2, …, maxlength, maxlength being the maximum length of the participle list;
converting each word element in the word segmentation list into a word list coding vector according to the Chinese word list data to form a word list coding vector list: tokebe ═ E 1 ,E 2 ,...,E j ,...,E maxLength-2 }; wherein E j Is X j The vocabulary encoding vector of (a); in the embodiment, the vocabulary coding vectors of the word elements are obtained by adopting the Google open-source Chinese vocabulary data.
Converting each word element in the word segmentation list into a position embedding vector by adopting one-hot coding according to the index of each word element to form a position embedding vector table: PoEmbe ═ Po 1 ,Po 2 ,...,Po j ,...,Po maxLength-2 }; wherein Po j Is X j Position of (3) embedding vectors, phase discrimination can be achievedWord elements of the same content but representing different semantics;
adding the vocabulary coding vector table and the position embedding vector table to form tensor fianData serving as BERT model incoming data;
tensor fianData is processed by an optimized embedding layer to obtain an optimized embedding vector bertResult of an input single sentence; the optimized embedding layer consists of an optimized layer and a full connection layer, wherein the optimized layer is formed by alternately cascading an N-layer coding layer and an N-layer pooling layer, and the full connection layer activation function is a relU;
optimizing an embedded vector bertRESULT, inputting a BilSTM model, performing forward LSTM and reverse LSTM training, and then transmitting the forward LSTM and reverse LSTM training into a full-connection layer for dimension conversion to obtain a category vector lstmResult corresponding to an input single sentence; inputting the lstmResult into a CRF layer to obtain the maximum labeling category probability corresponding to each word element in the input single sentence;
s2.3, dividing the training set into a plurality of batches, and training parameters of the BilSTM model and the CRF layer in each batch to obtain a plurality of training models; the training optimizes parameters of the BilSTM model and the CRF layer by reducing a loss function, wherein the loss function is as follows:
Figure BDA0003607656200000081
wherein P is j The true labeled category probability, q, of the jth character element in a training sample single sentence j The maximum labeling category probability output by a CRF layer;
in this embodiment, a data iterator is constructed by using a library DataLoader during network model training, and the input parameters include a single training sample number batSize and a training period baseEpoch.
S2.4, inputting the sample single sentence in the verification set into the trained models, and respectively calculating the accuracy; and selecting the model with the highest accuracy as a construction drawing inspection standard entity attribute recognition model submodule.
S3, converting the text data in the standard constraint clause of the construction drawing to be inspected into a single sentence in a long sentence mode to form a standard constraint single sentence list PreText; inputting the single sentence in the PreText into a SubModle model to obtain the maximum labeling category probability of each single sentence character element; obtaining the labeling category of each single sentence character element after viterbi decoding; extracting entity attributes of the single sentence in the PreText to form an entity attribute set EnityData;
the method comprises the following steps of extracting entity attributes of single sentences in PreText, and forming an entity attribute set, wherein the step of extracting the entity attributes of the single sentences in PreText comprises the following steps:
s3.1, extracting entity attributes of each single sentence in PreText: senense of the Sen-th single sentence sen Annotated class composition vector PreLab of word element sen =["O",Pl 1 ,Pl 2 ,...,Pl p ,...,Pl len ,"O"](ii) a Wherein Pl p Represents sensents sen The marking category of the p word; len is sensenees sen Length of (d);
general will Sentensices sen Extracting the characters with the label category of B-Name or I-Name for character string splicing to obtain entity information Name sen (ii) a Extracting characters marked with the type of 'B-Attr' or 'I-Attr' for character string splicing to obtain the common attribute Attr sen (ii) a Extracting words marked with the category of 'B-Value' or 'I-Value' for character string splicing to obtain numerical Value attribute Value sen ;(Name sen ,Attr sen ,Value sen ) Namely, the single sentence sentences sen The entity attribute of (2);
s3.2, the set formed by the entity attributes of each single sentence in PreText is an entity attribute set EnityData:
EnityData={(Name 1 ,Attr 1 ,Value 1 ),...,(Name sen ,Attr sen ,Value sen ),...,(Name senLen ,Attr senLen ,Value senLen )}
sen ═ 1,2, …, senLen being the number of single sentences in the PreText.
S4, extracting the relation by adopting an examination specification constraint template according to the entity attribute set EnityData, establishing a triple list of specification constraint texts, and establishing a construction drawing examination specification knowledge graph by using Neo4 j;
the method specifically comprises the following steps:
s4.1, extracting each of the attributes of each single sentence entityRelationships between elements, sensents for the sen-st single sentence sen : if the entity information Name sen With the generic attribute Attr sen If there is an inclusion relationship, the entity information Name sen With the generic attribute Attr sen Relation R of NAsen Is "continain";
if the entity information Name sen And Value attribute Value sen If there is a negative trigger word, the entity information Name sen And Value attribute Value sen Relation R of NVsen Is "cannot"; if the entity information Name sen And Value attribute Value sen If there is a positive trigger, the relationship R NVsen Is "should"; the negative trigger includes: "should not", "must", "should not", "strictly prohibited"; the positive trigger includes: "must", "ought", "should", "can";
if the entity information Name sen And Value attribute Value sen There is neither a negative trigger nor a positive trigger in between, then the generic attribute Attr sen And Value attribute Value sen Relation R of AVsen Is;
s4.2, establishing a triple according to the extracted relation to obtain a triple list tripletList of the standard constraint text: [ Name sen ,R NAsen ,Attr sen ]、[Name sen ,R NVsen ,Value sen ]、[Attr sen ,R AVsen ,Value sen ];
sen ═ 1,2, …, senLen, senLen is the number of single sentences in the specification constraint text;
and S4.3, writing the triple list tripletList into a csv file by using an open function, importing the triple list tripletList into a MySQL database, and establishing a construction drawing examination specification knowledge graph by using Neo4 j.
S5, carrying out standard matching on the BIM construction drawing file input by the user and the construction drawing examination specification knowledge map to intelligently obtain examination results, which specifically comprises the following steps:
s5.1, converting the BIM construction drawing file input by the user into a JSON file, and analyzing to obtain JSON data _ JSON;
in the embodiment, the BIM file is firstly converted into the IFC format, then the BIM data in the IFC format is converted into the JSON file, and then the JSON file is analyzed by using the library ijson library to obtain the JSON data _ JSON.
S5.2, performing text matching on the 'item name' attribute value in the data _ json and a tripletList in the tripletList, screening out triplets of which the entity information matches the 'item name' attribute value in the tripletList, forming a candidate triple set, and realizing building type matching;
s5.3, performing text matching on the 'size marking' attribute value in the data _ json and the candidate triple set, screening out a triple of which the numerical attribute matches the 'size marking' attribute value in the candidate triple set, checking whether the size meets the specification according to the relation value in the matched triple, and performing character string splicing on each element of the matched triple if the size does not meet the specification to obtain an opinion on examining the picture; all the examination opinions are combined into an examination result.
In the embodiment, after data preprocessing is performed on 4533 construction drawing inspection specifications, construction of the building inspection specification knowledge graph is realized through the method, and the specification constraint inspection is applied to the construction drawing. The entity attribute extraction model achieves 99% of recognition accuracy on the test data set, 97% of prediction results and 98% of recall rate. Wherein, the test rate, the recall rate and the F value of each label are all over 95 percent. The experimental results are shown in Table 1 and Table 2.
TABLE 1 comparison of the results
precision recall f1-score
BiLSTM+CRF 86.98 65.71 70.97
Method of the invention 97.37 98.29 97.56
TABLE 2 Experimental results Table
precision recall f1-score
Attr 97 98 98
Name 94 96 95
Value 98 98 98
In table 1, "BiLSTM + CRF" is that the BERT model is not included in the entity attribute identification model of the construction drawing examination specification, i.e., the list after the input of the single sentence and the word is input into the BiLSTM model. By comparison, the method improves the identification accuracy of the model by adding the optimized embedding of the BERT model.
The embodiment also discloses an examination system for implementing the examination method, as shown in fig. 3, including:
the annotation data set acquisition module 1 is used for performing standard preprocessing on the text data in the standard construction drawing examination standard clause and performing manual BIO annotation according to the step S1;
the construction drawing examination specification entity attribute recognition model establishing module 2 is used for training a BERT-based embedded BilSTM-CRF neural network model according to the step S2 to obtain a construction drawing examination specification entity attribute recognition model SubModle;
the entity attribute set establishing module 3 is used for performing long sentence to single sentence conversion on text data in the standard constraint clause of the construction drawing to be inspected according to the step S3 to form a standard constraint single sentence list PreText; inputting the single sentence in the PreText into a SubModle model to obtain the maximum labeling category probability of each single sentence character element; obtaining the labeling category of each single sentence character element after viterbi decoding; extracting entity attributes of the single sentence in the PreText to form an entity attribute set EnityData;
the construction drawing examination specification knowledge graph establishing module 4 is used for performing relationship extraction by adopting an examination specification constraint template according to the entity attribute set EnityData according to the step S4, establishing a triple list of specification constraint texts, and establishing a construction drawing examination specification knowledge graph by using Neo4 j;
and the examination result acquisition module 5 is configured to perform specification matching on the BIM construction drawing file input by the user and the construction drawing examination specification knowledge graph according to the step S5 to intelligently acquire an examination result.
In this embodiment, a client based on a Web page is used to receive a BIM construction drawing file input by a user. In addition, in order to improve the user experience, the system further comprises a visual interface 6 for visually displaying the examination result; the examination result may also be exported in a file form by the examination result file generation module 7.
The above description is only an example of the present invention and is not intended to limit the present invention. All equivalents which come within the spirit of the invention are therefore intended to be embraced therein. Details not described herein are within the skill of those in the art.

Claims (10)

1. A building code inspection method based on BilSTM and knowledge graph is characterized by comprising the following steps:
s1, carrying out standard preprocessing on the text data in the standard construction drawing examination standard clause and carrying out manual BIO (building information organization) annotation to obtain an annotation data set Standard data;
s2, embedding a BilSTM-CRF neural network model based on BERT by using the StandData training to obtain a construction drawing examination specification entity attribute recognition model submode;
s3, converting the text data in the standard constraint clause of the construction drawing to be inspected into a single sentence in a long sentence mode to form a standard constraint single sentence list PreText; inputting the single sentence in the PreText into a SubModle model to obtain the maximum labeling category probability of each single sentence character element; obtaining the labeling category of each single sentence character element after viterbi decoding; extracting entity attributes of the single sentence in the PreText to form an entity attribute set EnityData;
s4, extracting the relation by adopting an examination specification constraint template according to the entity attribute set EnityData, establishing a triple list of specification constraint texts, and establishing a construction drawing examination specification knowledge graph by using Neo4 j;
and S5, carrying out standard matching on the BIM construction drawing file input by the user and the construction drawing examination specification knowledge map to intelligently obtain examination results.
2. The BilSTM and knowledge-graph-based building code review method according to claim 1, wherein said step S1 specifically comprises:
s1.1, converting text data in standard construction drawing examination specification clauses into single sentences in a long sentence mode;
s1.2, manually labeling character elements in each word segment in each single sentence text according to a text sequence labeling mode BIO rule; the labeled labels are: "B-X", "I-X", or "O", wherein "B-X" represents that the word element belongs to X type and that the element is at the beginning of the word segment, "I-X" represents that the word element belongs to X type and that the element is at the middle or end of the word segment, and "O" represents that the word element does not belong to any type; the X types are: three types of Name, Attr and Value;
the annotated text constitutes an annotated data set StandData:
StandData={[line 1 ,'|||',lab 1 ],…,[line n ,'|||',lab n ],…,[line num ,'|||',lab num ]}
wherein line n Examining the nth sentence, lab, in the text data in the specification for a standard construction drawing n Is a single sentence line n A label list formed by labels at corresponding positions of each character element, wherein '| | |' is line n And lab n The boundary identification between; n is 1,2, …, num, num is the number of text data single sentences in the standard construction drawing examination specification.
3. The BilSTM and knowledge-graph-based building code review method according to claim 1, wherein said step S2 specifically comprises:
s2.1, dividing data in the data set StandData into a training set and a verification set;
s2.2, building a BERT-based embedded BilSTM-CRF neural network model, wherein the input of the network model is a single sentence, and the output is the maximum labeled category probability of each character element in the input single sentence;
the network model firstly carries out word segmentation processing on an input single sentence, adds a character "[ CLS ]" at the head part and a character "[ SEP ]" at the tail part of a word segmentation result, and obtains a token list:
tokens=[[CLS],X 1 ,X 2 ,…,X j ,…,X maxLength-2 ,[SEP]]
wherein X j J +1 th word element of tokens, wherein j is 1,2, and maxlength which is the maximum length of a participle list;
converting each word element in the word segmentation list into a word list coding vector according to the Chinese word list data to form a word list coding vector list: tokebe ═ E 1 ,E 2 ,...,E j ,...,E maxLength-2 }; wherein E j Is X j The vocabulary encoding vector of (a);
converting each word element in the word segmentation list into a position embedding vector by adopting one-hot coding according to the index of each word element to form a position embedding vector table: PoEmbe ═ Po 1 ,Po 2 ,…,Po j ,…,Po maxLength-2 }; wherein Po j Is X j The position of (2) embedding the vector;
adding the vocabulary coding vector table and the position embedding vector table to form tensor fianData serving as BERT model incoming data;
tensor fianData is processed by an optimized embedding layer to obtain an optimized embedding vector bertResult of an input single sentence; the optimized embedding layer consists of an optimized layer and a full-connection layer, wherein the optimized layer is formed by alternately cascading an N-layer coding layer and an N-layer pooling layer;
optimizing an embedded vector bergResult input BilSTM model, performing forward LSTM and reverse LSTM training, and then transmitting the training result into a full-link layer for dimension conversion to obtain a category vector lstmResult corresponding to an input single sentence; inputting the lstmResult into a CRF layer to obtain the maximum labeling category probability corresponding to each word element in the input single sentence;
s2.3, dividing the training set into a plurality of batches, and training parameters of the BilSTM model and the CRF layer in each batch to obtain a plurality of training models; the training optimizes parameters of the BilSTM model and the CRF layer by reducing a loss function, wherein the loss function is as follows:
Figure FDA0003607656190000021
wherein P is j The true labeled category probability, q, of the jth character element in a training sample single sentence j The maximum labeling category probability output by a CRF layer;
s2.4, inputting the sample single sentence in the verification set into the trained models, and respectively calculating the accuracy; and selecting the model with the highest accuracy as a construction drawing inspection standard entity attribute recognition model submodule.
4. The method for examining building codes based on BilSTM and knowledge-graph according to claim 1, wherein said step 3 of extracting entity attributes from the single sentence in PreText comprises the steps of:
s3.1, extracting entity attributes of each single sentence in PreText: senense of the Sen-th single sentence sen Annotated class composition vector PreLab of word element sen =["O",Pl 1 ,Pl 2 ,…,Pl p ,…,Pl len ,"O"](ii) a Wherein Pl p Represents sensents sen The marking category of the p word; len is sensenees sen Length of (d);
general will Sentensices sen Extracting the characters with the label category of B-Name or I-Name for character string splicing to obtain entity information Name sen (ii) a Extracting characters marked with the type of 'B-Attr' or 'I-Attr' for character string splicing to obtain the common attribute Attr sen (ii) a Extracting words marked with the category of 'B-Value' or 'I-Value' for character string splicing to obtain numerical Value attribute Value sen ;(Name sen ,Attr sen ,Value sen ) Namely, the single sentence sentences sen The entity attribute of (2);
s3.2, the set formed by the entity attributes of each single sentence in PreText is an entity attribute set EnityData: EnityData { (Name) 1 ,Attr 1 ,Value 1 ),…,(Name sen ,Attr sen ,Value sen ),…,(Name senLen ,Attr senLen ,Value senLen ) Sen ═ 1,2, …, senLen, senLen is the number of single sentences in PreText.
5. The BilSTM and knowledge-graph-based building code review method according to claim 4, wherein the step S4 specifically comprises:
s4.1, extracting the relationship among elements in the entity attribute of each single sentence, and regarding the senseences of the Senst single sentence sen : if the entity information Name sen With the generic attribute Attr sen If there is an inclusion relationship, the entity information Name sen With the generic attribute Attr sen Relation R of NAsen Is "continain";
if the entity information Name sen And Value attribute Value sen If there is a negative trigger word, the entity information Name sen And Value attribute Value sen Relation R of NVsen Is "cannot"; if the entity information Name sen And Value attribute Value sen If there is a positive trigger, the relationship R NVsen Is "should"; the negative trigger includes: "should not", "must", "should not", "strictly prohibited"; the positive trigger includes: "must", "ought", "should", "can";
if the entity information Name sen And Value attribute Value sen There is neither a negative trigger nor a positive trigger in between, then the generic attribute Attr sen And Value attribute Value sen Relation R of AVsen Is;
s4.2, establishing a triple according to the extracted relation to obtain a triple list tripletList of the standard constraint text: [ Name sen ,R NAsen ,Attr sen ]、[Name sen ,R NVsen ,Value sen ]、[Attr sen ,R AVsen ,Value sen ](ii) a sen ═ 1,2, …, senLen, senLen is the number of single sentences in the specification constraint text;
and S4.3, writing the triple list tripletList into a csv file, importing the triple list tripletList into a MySQL database, and establishing a construction drawing examination specification knowledge graph by using Neo4 j.
6. The BilSTM and knowledge-graph-based building code review method according to claim 5, wherein the step S5 specifically comprises:
s5.1, converting the BIM construction drawing file input by the user into a JSON file, and analyzing to obtain JSON data _ JSON;
s5.2, performing text matching on the 'item name' attribute value in the data _ json and a tripletList in the tripletList, screening out triplets of which the entity information matches the 'item name' attribute value in the tripletList, forming a candidate triple set, and realizing building type matching;
s5.3, performing text matching on the 'size marking' attribute value in the data _ json and the candidate triple set, screening out a triple of which the numerical attribute matches the 'size marking' attribute value in the candidate triple set, checking whether the size meets the specification according to the relation value in the matched triple, and performing character string splicing on each element of the matched triple if the size does not meet the specification to obtain an opinion on examining the picture; all the examination opinions are combined into an examination result.
7. A building code review system based on BiLSTM and knowledge-graph, comprising:
the marking data set acquisition module (1) is used for carrying out standard preprocessing on text data in standard construction drawing examination standard provisions and carrying out manual BIO marking;
the construction drawing examination specification entity attribute recognition model establishing module (2) is used for training a BERT-based embedded BilSTM-CRF neural network model to obtain a construction drawing examination specification entity attribute recognition model submodule;
the entity attribute set establishing module (3) is used for converting the text data in the standard constraint clause of the construction drawing to be examined into a single sentence in a long sentence way to form a standard constraint single sentence list PreText; inputting the single sentence in the PreText into a SubModle model to obtain the maximum labeling category probability of each single sentence character element; obtaining the labeling category of each single sentence character element after viterbi decoding; extracting entity attributes of the single sentence in the PreText to form an entity attribute set EnityData;
the construction drawing examination specification knowledge graph establishing module (4) is used for extracting the relation by adopting an examination specification constraint template according to the entity attribute set EnityData, establishing a triple list of a specification constraint text, and establishing a construction drawing examination specification knowledge graph by using Neo4 j;
and the examination result acquisition module (5) is used for carrying out standard matching on the BIM construction drawing file input by the user and the construction drawing examination standard knowledge map to intelligently acquire an examination result.
8. The BilSTM and knowledge-graph based building code review system according to claim 7, wherein the system employs a Web page based client to receive user input of the BIM construction map file.
9. The BilSTM and knowledge-graph based building code review system according to claim 7, further comprising a visualization interface (6) for visually displaying the review results.
10. A BilSTM and knowledge-graph based building code review system according to claim 7, further comprising a review result file generation module (7) for exporting the review result in a file form.
CN202210421056.9A 2022-04-21 2022-04-21 Building specification examination method and system based on BilSTM and knowledge graph Pending CN114880468A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210421056.9A CN114880468A (en) 2022-04-21 2022-04-21 Building specification examination method and system based on BilSTM and knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210421056.9A CN114880468A (en) 2022-04-21 2022-04-21 Building specification examination method and system based on BilSTM and knowledge graph

Publications (1)

Publication Number Publication Date
CN114880468A true CN114880468A (en) 2022-08-09

Family

ID=82671339

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210421056.9A Pending CN114880468A (en) 2022-04-21 2022-04-21 Building specification examination method and system based on BilSTM and knowledge graph

Country Status (1)

Country Link
CN (1) CN114880468A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115271683A (en) * 2022-09-26 2022-11-01 西南交通大学 BIM automatic standard examination system based on standard knowledge map element structure
CN115293749A (en) * 2022-10-08 2022-11-04 深圳市地铁集团有限公司 Method, system and equipment for deriving engineering quantity list based on BIM
CN115909386A (en) * 2023-01-06 2023-04-04 中国石油大学(华东) Method, equipment and storage medium for completing and correcting pipeline instrument flow chart
CN117453923A (en) * 2023-08-30 2024-01-26 广东电白建设集团有限公司 Method for optimizing relation between construction site construction equipment and building facilities

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115271683A (en) * 2022-09-26 2022-11-01 西南交通大学 BIM automatic standard examination system based on standard knowledge map element structure
CN115271683B (en) * 2022-09-26 2023-01-13 西南交通大学 BIM automatic standard checking system based on standard knowledge map element structure
CN115293749A (en) * 2022-10-08 2022-11-04 深圳市地铁集团有限公司 Method, system and equipment for deriving engineering quantity list based on BIM
CN115293749B (en) * 2022-10-08 2022-12-27 深圳市地铁集团有限公司 Method, system and equipment for deriving engineering quantity list based on BIM
CN115909386A (en) * 2023-01-06 2023-04-04 中国石油大学(华东) Method, equipment and storage medium for completing and correcting pipeline instrument flow chart
CN115909386B (en) * 2023-01-06 2023-05-12 中国石油大学(华东) Method, equipment and storage medium for supplementing and correcting pipeline instrument flow chart
CN117453923A (en) * 2023-08-30 2024-01-26 广东电白建设集团有限公司 Method for optimizing relation between construction site construction equipment and building facilities
CN117453923B (en) * 2023-08-30 2024-03-19 广东电白建设集团有限公司 Method for optimizing relation between construction site construction equipment and building facilities

Similar Documents

Publication Publication Date Title
CN110019839B (en) Medical knowledge graph construction method and system based on neural network and remote supervision
CN108717406B (en) Text emotion analysis method and device and storage medium
CN111177326B (en) Key information extraction method and device based on fine labeling text and storage medium
CN114880468A (en) Building specification examination method and system based on BilSTM and knowledge graph
CN110196906B (en) Deep learning text similarity detection method oriented to financial industry
CN112560478B (en) Chinese address Roberta-BiLSTM-CRF coupling analysis method using semantic annotation
CN110175334B (en) Text knowledge extraction system and method based on custom knowledge slot structure
CN111160023B (en) Medical text named entity recognition method based on multi-way recall
CN111198948A (en) Text classification correction method, device and equipment and computer readable storage medium
CN109598517B (en) Commodity clearance processing, object processing and category prediction method and device thereof
CN110580308B (en) Information auditing method and device, electronic equipment and storage medium
CN110046356B (en) Label-embedded microblog text emotion multi-label classification method
CN108363691B (en) Domain term recognition system and method for power 95598 work order
CN114139533A (en) Text content auditing method for Chinese novel field
CN113204967B (en) Resume named entity identification method and system
CN112364623A (en) Bi-LSTM-CRF-based three-in-one word notation Chinese lexical analysis method
CN111858842A (en) Judicial case screening method based on LDA topic model
CN111259153A (en) Attribute-level emotion analysis method of complete attention mechanism
CN105389303B (en) A kind of automatic fusion method of heterologous corpus
CN110910175A (en) Tourist ticket product portrait generation method
CN115905553A (en) Construction drawing inspection specification knowledge extraction and knowledge graph construction method and system
CN114722204A (en) Multi-label text classification method and device
CN113360647B (en) 5G mobile service complaint source-tracing analysis method based on clustering
CN113378024A (en) Deep learning-based public inspection field-oriented related event identification method
CN114970554B (en) Document checking method based on natural language processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination