CN111104525B - Construction method of building design specification knowledge graph based on graph database - Google Patents

Construction method of building design specification knowledge graph based on graph database Download PDF

Info

Publication number
CN111104525B
CN111104525B CN201911409285.3A CN201911409285A CN111104525B CN 111104525 B CN111104525 B CN 111104525B CN 201911409285 A CN201911409285 A CN 201911409285A CN 111104525 B CN111104525 B CN 111104525B
Authority
CN
China
Prior art keywords
row
column
entity
relation
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911409285.3A
Other languages
Chinese (zh)
Other versions
CN111104525A (en
Inventor
赵钦
贾博
黑新宏
李宇超
朱磊
杨明松
方潇颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN201911409285.3A priority Critical patent/CN111104525B/en
Publication of CN111104525A publication Critical patent/CN111104525A/en
Application granted granted Critical
Publication of CN111104525B publication Critical patent/CN111104525B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/08Construction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a construction method of a building design specification knowledge graph based on a graph database, which comprises the steps of firstly extracting all entities in a storage specification to be processed, constructing a specification semantic storage entity set, then combing clauses of the storage specification according to semantic logic and syntactic characteristics of each specification clause, using three types of special nodes to assist a common node to mark and express the clauses of the storage specification, using a CSV file to store an expression result, and finally using an import rule in batches according to a Neo4j database to adjust the CSV file storage format to finish graph structure data storage and expression of specification semantic information by means of an import tool. The invention discloses a construction method of a graph database-based building design specification knowledge graph, which solves the problem that Chinese specification semantic information cannot be accurately expressed by aiming at an English graph structure data storage mode in the prior art, and provides a tabulated specification semantic expression and storage template.

Description

Construction method of building design specification knowledge graph based on graph database
Technical Field
The invention belongs to the technical field of building map construction methods, and relates to a building design specification knowledge map construction method based on a map database.
Background
The expression and storage of Chinese design specifications in the form of knowledge maps become a key link for intelligent development in the field of construction of constructional engineering. The data storage schema of the graph structure consists of nodes, edges, labels, and node attributes. Entities are usually represented by nodes, relationships between two entities are expressed by types of edges, and properties of entities are expressed by node attributes. The storage mode can accurately express English semantic information, but Chinese and English have certain difference in the aspects of syntax and language states, sentence pattern structures and the like, so the storage mode is suitable for an English graph structure storage mode and can not effectively store and express Chinese semantic information. Meanwhile, the expression of industrial standard words covered by the building design is relatively obscure, and implicit logical relations often exist between entities and subconstrumentations. The huge number of entities and the complicated association relationship bring great difficulty to the Chinese standard semantic information storage and expression.
Disclosure of Invention
The invention aims to provide a construction method of a building design specification knowledge graph based on a graph database, which solves the problem that Chinese specification semantic information cannot be accurately expressed by aiming at an English graph structure data storage mode in the prior art, and provides a tabulated specification semantic expression and storage template.
The invention adopts the technical scheme that a construction method of a building design specification knowledge graph based on a graph database is implemented according to the following steps:
step 1, extracting all entities in a storage specification to be processed, and constructing a specification semantic storage Entity Set Entity _ Set1{ E }1,E2,E3,…};
Step 2, for the Entity Set Entity _ Set1{ E obtained in step 11,E2,…,E3Performing clustering and duplicate removal operation to obtain an Entity Set Entity _ Set3 after duplicate removal;
step 3, establishing a common node table file, taking the Name as a first column head line unit value, taking the unit as a second column head line unit value, taking the LABEL as a third column head line unit value, taking the ID as a fourth column head line unit value, and then taking each Entity E in the Entity Set Entity _ Set3i(i ═ 1,2,3 … …) is filled in the common node table file;
step 4, establishing a special node table file, wherein the special node table file takes 'Name', 'LABEL' and 'ID' as first row first to third column unit values respectively;
step 5, establishing a node relation table file, wherein the node relation table file respectively takes START _ ID, END _ ID, Start, END, Flag, IDD and TYPE as first row first to seven column unit values, each row sequentially stores An entity relation from the second row in the node relation table, and each row adds one relation, so as to fill An in the 'IDD' column unit of the row where the relation is located, wherein n is a stored relation coefficient;
step 6, aiming at specific standard provisions, setting a column of cell values immediately after the head line of the node relation table established in the step 5 as the serial number of the standard provisions, and dividing each second-level provision into a column for the standard provisions with the second-level serial numbers;
step 7, analyzing and combing predicate relations among the entities in the current standard clause, and respectively searching related entity E in the standard clause in the common node table filei(i is 1,2,3 … …) and ID column cell value, noted as Ei_ID(i=1,2,3……);
Step 8, expressing and storing each standard article in the standard in sequence according to the step 6;
step 9, storing the common node table file, the special node table file and the node relation table file in a CSV file format by adopting a UTF-8 coding mode so as to be conveniently imported into a Neo4j database;
and step 10, sequentially importing the CSV files into a Neo4j database according to the sequence of a common node table, a special node table and a node relation table and according to a Neo4j data batch import format by using a Neo4j-import tool, thereby finishing the storage and expression of the graph structure of the standard semantic information covered by the architectural design.
The present invention is also characterized in that,
the step 2 specifically comprises the following steps:
step 2.1, clustering entities with similar semantic expressions, uniformly abstracting the entities into corresponding entities to obtain an Entity Set Entity _ Set 2;
and 2.2, removing repeated entities in the Entity Set Entity _ Set2 to obtain an Entity Set Entity _ Set 3.
In step 3, each Entity E in the Entity Set Entity _ Set3i(i ═ 1,2,3 … …) is filled in the common node table file specifically:
step 3.1, all entities E in the Entity Set Entity _ Set3i(i is 1,2,3 … …) in turn as the cell value of the i +1 st row in the first column of the ordinary node table;
step 3.2, if entity EiIf the digital entity is a digital entity with units, only the digital entity is taken as the cell value of the i +1 th row in the first column, and the measurement unit is stored in the cell in the same row in the second column;
step 3.3, involving entity E in the processed specificationiIf entity E is the i +1 th cell valueiWhen multiple specifications are involved, the numbers of the specifications are equal to each other; "separate;
step 3.4, Set the cell value of the i +1 th row in the fourth column to i (i equals 1,2,3 … …) until i equals the total number of entities in the Entity Set Entity _ Set 3.
The step 7 specifically comprises the following steps:
step 7.1, if an entity EiIf the entity E has a unitary predicate relation with the entity E, the entity E is connected with the entity EiCorresponding to EiSTART _ ID and END _ ID are respectively filled into space cells in the row immediately after the row where the END _ ID is located, the number of rows is recorded as y, and the univariate predicate relation is the y row, the TYPE column cell value;
step 7.2, if the unitary predicate relation contains the constraint degree relation at the same time, filling the constraint degree words into the cells of the column where the y-th row current processing specification clause number is located, otherwise filling the cells in ' the ' y ' -th row current processing specification clause number;
step 7.3, if there is an entity E in the specificationjAnd the binary predicate relation between the unitary predicate entirety is carried out according to the following operation steps:
step 7.3.1, if the special node table file has the cell value of LABEL column as simple node and the current standard clause number, only the row of the cell is required to be recorded, the cell value of ID column is Ori_ID;
Step 7.3.2, if the special node table file does not have the 'LABEL' column that a certain cell value is a 'simple node' and the specification clause number, then the steps are carried out according to the step 7.3.3 to the step 7.3.5;
step 7.3.3, filling space occupation in a row of blank cells after the special node table 'Name' is tightly listed;
step 7.3.4, filling the simple nodes and the current specification clause numbers into the same row, LABEL column cells and the same interval; "separate;
step 7.3.5, set the ID column cell value as the sum of the current normal node number and the special node number, and mark as Ori_ID;
Step 7.3.6, identify entity EjCorresponding to EjFilling the _IDin a node relation table, wherein a row of blank cells is arranged immediately after the START _ ID, and the row number is recorded as y;
step 7.3.7, including the entity E in the unary relationshipiCorresponding to EiEND _ ID column cells in row y ";
step 7.3.8, the binary predicate relation is the y row 'TYPE' column cell value;
step 7.3.9, if the binary predicate relation has a constraint degree relation at the same time, then using the constraint degree word as the unit value of the current specification number column of the y-th row; if no constraint degree relationship exists, the cell occupies the space in the form of 'occupation';
step 7.3.10, mixing the orderiThe _IDis filled into the y +1 th row ": START _ ID" column cell;
step 7.3.11, EiThe _IDis filled into the y +1 th row, END _ ID column cell;
step 7.3.12, use the simple side as the y +1 row, TYPE column cell value;
step 7.3.13, taking the unit cell value of row "IDD" of the unitary predicate relation as the unit cell value of row "Start" of the (y + 1) th row;
step 7.3.14, use the y row "IDD" column cell value as the y +1 row "End" column cell value;
step 7.4, if there is some entity E in the specificationiTo another entity EjA binary predicate relationship between them, thenEntity EiCorresponding to EiFilling in the _ID, the row immediately after the row where the START _ ID is located is blank cell, noting that the number of rows is y, and filling the entity EjCorresponding to EjFilling _IDin the y row, END _ ID column cell, and taking the binary predicate relation as the same row, TYPE column cell value;
step 7.5, if the binary predicate relation contains the constraint degree relation at the same time, filling the constraint degree words into the cells of the row where the current processing specification clause number of the y-th row is located, otherwise filling the cells in ' the ' y ' form;
step 7.6, if some three entities E in the specificationi、Ej、EkWith ternary predicate relation TriThen, the following operations are carried out:
step 7.6.1, if there is a ' Name ' column in the special node table file, a certain cell value is the ternary predicate relation, then only the same row of the cell is needed, the LABEL ' column is added with the standard clause number, and the standard clause number is adopted with the original cell value; ID column cell value is Tri_ID;
Step 7.6.2, if a certain cell value of the Name column does not exist in the special node table file and is the ternary predicate relation, the steps are carried out according to the steps 7.6.3 to 7.6.5;
step 7.6.3, filling the ternary predicate relation into a row of blank cells after the special node table 'Name' is listed closely;
step 7.6.4, fill the "complex node" and the current specification clause number into the same row ": LABEL" column cell, in the middle of it "; "separate;
step 7.6.5, set the ID column cell value as the sum of the current normal node number and the special node number, and mark as Tri_ID;
Step 7.6.6, identify entity Ei、Ej、EkCorresponding to Ei_ID、Ej_ID、EkFilling the _IDin the node relation table file respectively, wherein the row of blank cells immediately follows the row of START _ ID, and filling TriEND _ ID column cells;
step 7.6.7, the ternary predicate relation is used as the cell value of the TYPE column in each row of the three rows;
step 7.6.8, according to entity Ei、EjAnd EkSetting the cell value of a column in which the current specification clause number of each row in the three rows is located as Seq (Seq is 1,2 and 3) in the semantic logic sequence in the ternary predicate relation;
step 7.7, if an entity E in the specificationiWith some ternary predicate TrjAnd if a binary predicate relation exists between the whole, the method comprises the following steps:
step 7.7.1, according to the entity E in the binary predicate relationiAnd ternary predicates TrjThe blank cell value of the row immediately after the START _ ID column in the node relation table is set as EiID or TrjEND _ ID column cell value is set as another value, and the node is pointed to by the mark relationship;
step 7.7.2, using the binary predicate relation as the column cell value of the same row TYPE;
step 7.7.3, if the binary predicate relation has a constraint degree relation at the same time, then using the constraint degree word as the unit value of the current specification number column of the y-th row; if no constraint degree relationship exists, the cell occupies the space in the form of 'occupation';
step 7.8, if an Entity E in the Entity Set Entity _ Set3 of the current specificationiIf the binary predicate relation exists among more than one entity in the specification, and the multiple entities or the simple sentences in which the entities are located have the semantic logical relation of ' and ' or ', the binary predicate relation is stored, and the semantic logical relation is expressed according to the following steps:
step 7.8.1, if the special node table file already exists, the cell value in the LABEL column is the selected node and the current specification bar number, only the row of the cell is required to be recorded, the cell value in the ID column is Dri_ID;
Step 7.8.2, if the special node table file does not have the 'LABEL' column, a certain cell value is 'selected node' and the specification clause number, then the steps are carried out according to the steps 7.8.3 to 7.8.5;
step 7.8.3, filling "space occupation" for the blank cell in the row immediately after the special node table "Name" column;
step 7.8.4, fill "select node" and current specification clause number into the same row ": LABEL" column cell, in the middle of it "; "separate;
step 7.8.5, set the ID column cell value as the sum of the current normal node number and the special node number, and record as Dri_ID;
At step 7.8.6, DriThe _IDis used as a blank cell value in a row immediately after the START _ ID, and the number of rows is recorded as y;
step 7.8.7, EiEND _ ID column cells in row y ";
step 7.8.8, sequentially searching the current specification clauses in the entity relationship table by EiThe _IDis used as "". START _ ID "column cell value, and the row 'IDD' column cell value of binary predicate with the same semantic logic relationship is located, the value is used as the y-th row 'Flag' column cell value in the entity relationship table, and all the values are counted"; "separate;
step 7.8.9, according to the semantic logic relationship between the binary predicates, taking the AND OR as the y row and the TYPE column cell value;
step 7.8.10, setting the cell value of the column with the current specification number of the y-th row as "" occupied ";
step 7.8.11, if the current specification is in the Specification clause, take the form of EiIf more than one semantic logical relationship exists among the binary predicate relationships stored in the row of the column cell value of START _ ID in the node relationship table, repeating the steps 7.8.6 to 7.8.10;
step 7.8.12, if the current specification contains and has only one entity EiIf the binary predicate relations between more than one entity exist in the specification and the binary predicate relations are the same semantic logic relation type, the graph structure expression and storage of the specification can be completed, otherwise, the steps 7.8.13 to 7.8.16 are performed;
step 7.8.13, extracting simple sentences S in the current standard clausejForming a clause Set sequence _ Set { S1, S2, S3 … … };
step 7.8.14, using each clause S in the clause Set Sennce _ SetjFirst entity EjCorresponding to EjThe _IDis sequentially in the node relation table, the row number of the END _ ID is recorded as yj
7.8.14, with DriID is the y-th of the node relation tablejRow START _ ID column cell value;
step 7.8.15, using the clause as the y-th node relation tablejTYPE column cell value;
7.8.16, searching the node relation table file, extracting the entity nodes with more than one degree in the specification, sequentially leading out the entity nodes and representing the current clause SjThe "IDD" column cell value of the row in which the predicate relation is located is the y-th of the node relation tablejThe row "Flag" is the cell value, the values are used and the "partition".
The semantic logical relationship of more than one entity, the entities, the simple sentences in which the entities are located in step 7.8, and the or exists between the entities, and the entities in the more than one entity in step 7.8.12 refer to univariate predicate integers or ternary predicate integers.
The invention has the advantages that
The invention is suitable for various specifications covered in the building design process, lays a foundation for Chinese computer natural language processing and the conversion of unstructured documents into structured documents, and the tabular normalized semantic expression and storage template provided by the invention is simple, visual and convenient to operate, meets the requirement of Neo4j on importing data formats in batches, can greatly save manpower, and improves the structural storage efficiency of the normalized semantic information graph.
Drawings
FIG. 1 is a general flow diagram of a method for constructing a knowledge graph of building design specifications based on a graph database according to the present invention;
FIG. 2 is a flow chart of a simple node labeled unitary predicate semantic logical order portion in a graph database-based building design specification knowledge graph construction method of the present invention;
FIG. 3 is a flow chart of a method for constructing a knowledge graph of architectural design specifications based on a graph database according to the present invention selecting a node to mark a logical relationship portion of a specification clause;
FIG. 4 is a flow chart of a ternary predicate semantic representation part in the graph database-based building design specification knowledge graph construction method of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to a construction method of a knowledge graph of building design specifications based on a graph database, the flow of which is shown in figure 1 and is specifically implemented according to the following steps:
step 1, extracting all entities in a storage specification to be processed, and constructing a specification semantic storage Entity Set Entity _ Set1{ E }1,E2,E3,…};
Step 2, for the Entity Set Entity _ Set1{ E obtained in step 11,E2,…,E3Performing clustering and duplicate removal operation to obtain an Entity Set Entity _ Set3 after duplicate removal; the method specifically comprises the following steps:
step 2.1, clustering entities with similar semantic expressions, uniformly abstracting the entities into corresponding entities to obtain an Entity Set Entity _ Set 2;
step 2.2, removing repeated entities in the Entity Set Entity _ Set2 to obtain an Entity Set Entity _ Set 3;
step 3, establishing a common node table file, taking the Name as a first column head line unit value, taking the unit as a second column head line unit value, taking the LABEL as a third column head line unit value, taking the ID as a fourth column head line unit value, and then taking each Entity E in the Entity Set Entity _ Set3i(i ═ 1,2,3 … …) is filled in the common node table file; wherein, each Entity E in the Entity Set Entity _ Set3i(i ═ 1,2,3 … …) is filled in the common node table file specifically:
and 3. step 3.1, each Entity E in the Entity Set Entity _ Set3i(i is 1,2,3 … …) in turn as the cell value of the i +1 st row in the first column of the ordinary node table;
step 3.2, if entity EiIf the digital entity is a digital entity with units, only the digital entity is taken as the cell value of the i +1 th row in the first column, and the measurement unit is stored in the cell in the same row in the second column;
step 3.3, involving entity E in the processed specificationiIf entity E is the i +1 th cell valueiWhen multiple specifications are involved, the numbers of the specifications are equal to each other; "separate;
step 3.4, setting the cell value of the i +1 th row in the fourth column to i (i equals to 1,2,3 … …) until i equals to the total number of entities in the Entity Set Entity _ Set 3;
step 4, establishing a special node table file, wherein the special node table file takes 'Name', 'LABEL' and 'ID' as first row first to third column unit values respectively;
step 5, establishing a node relation table file, wherein the node relation table file respectively takes START _ ID, END _ ID, Start, END, Flag, IDD and TYPE as first row first to seven column unit values, each row sequentially stores An entity relation from the second row in the node relation table, and each row adds one relation, so as to fill An in the 'IDD' column unit of the row where the relation is located, wherein n is a stored relation coefficient;
step 6, aiming at specific standard provisions, setting a column of cell values immediately after the head line of the node relation table established in the step 5 as the serial number of the standard provisions, and dividing each second-level provision into a column for the standard provisions with the second-level serial numbers;
step 7, analyzing and combing predicate relations among the entities in the current standard clause, and respectively searching related entity E in the standard clause in the common node table filei(i is 1,2,3 … …) and ID column cell value, noted as EiID (i ═ 1,2,3 … …); the method specifically comprises the following steps:
step 7.1, if an entity EiThere is a univariate predicate relationship with itself, such as: "another train" and "platform edge", etc., then the entityEiCorresponding to EiSTART _ ID and END _ ID are respectively filled into space cells in the row immediately after the row where the END _ ID is located, the number of rows is recorded as y, and the univariate predicate relation is the y row, the TYPE column cell value;
step 7.2, if the unitary predicate relation contains the constraint degree relation at the same time, filling the constraint degree words into the cells of the column where the y-th row current processing specification clause number is located, otherwise filling the cells in ' the ' y ' -th row current processing specification clause number;
step 7.3, as shown in FIG. 2, if there is some entity E in the specificationjAnd the binary predicate relation between the unitary predicate entirety is carried out according to the following operation steps:
step 7.3.1, if the special node table file has the cell value of LABEL column as simple node and the current standard clause number, only the row of the cell is required to be recorded, the cell value of ID column is Ori_ID;
Step 7.3.2, if the special node table file does not have the 'LABEL' column that a certain cell value is a 'simple node' and the specification clause number, then the steps are carried out according to the step 7.3.3 to the step 7.3.5;
step 7.3.3, filling space occupation in a row of blank cells after the special node table 'Name' is tightly listed;
step 7.3.4, filling the simple nodes and the current specification clause numbers into the same row, LABEL column cells and the same interval; "separate;
step 7.3.5, set the ID column cell value as the sum of the current normal node number and the special node number, and mark as Ori_ID;
Step 7.3.6, identify entity EjCorresponding to EjFilling the _IDin a node relation table, wherein a row of blank cells is arranged immediately after the START _ ID, and the row number is recorded as y;
step 7.3.7, including the entity E in the unary relationshipiCorresponding to EiEND _ ID column cells in row y ";
step 7.3.8, the binary predicate relation is the y row 'TYPE' column cell value;
step 7.3.9, if the binary predicate relation has a constraint degree relation at the same time, then using the constraint degree word as the unit value of the current specification number column of the y-th row; if no constraint degree relationship exists, the cell occupies the space in the form of 'occupation';
step 7.3.10, mixing the orderiThe _IDis filled into the y +1 th row ": START _ ID" column cell;
step 7.3.11, EiThe _IDis filled into the y +1 th row, END _ ID column cell;
step 7.3.12, use the simple side as the y +1 row, TYPE column cell value;
step 7.3.13, taking the unit cell value of row "IDD" of the unitary predicate relation as the unit cell value of row "Start" of the (y + 1) th row;
step 7.3.14, use the y row "IDD" column cell value as the y +1 row "End" column cell value;
step 7.4, if there is some entity E in the specificationiTo another entity EjA binary predicate relation between the two entities, then the entity EiCorresponding to EiFilling in the _ID, the row immediately after the row where the START _ ID is located is blank cell, noting that the number of rows is y, and filling the entity EjCorresponding to EjFilling _IDin the y row, END _ ID column cell, and taking the binary predicate relation as the same row, TYPE column cell value;
step 7.5, if the binary predicate relation contains the constraint degree relation at the same time, filling the constraint degree words into the cells of the row where the current processing specification clause number of the y-th row is located, otherwise filling the cells in ' the ' y ' form;
step 7.6, if some three entities E in the specificationi、Ej、EkWith ternary predicate relation TriAnd, as follows: "... and" connect.. and.. etc., then the following operations are performed:
step 7.6.1, if there is a ' Name ' column in the special node table file, a certain cell value is the ternary predicate relation, then only the same row of the cell is needed, the LABEL ' column is added with the standard clause number, and the standard clause number is adopted with the original cell value; ID column cell value is Tri_ID;
Step 7.6.2, if a certain cell value of the Name column does not exist in the special node table file and is the ternary predicate relation, the steps are carried out according to the steps 7.6.3 to 7.6.5;
step 7.6.3, filling the ternary predicate relation into a row of blank cells after the special node table 'Name' is listed closely;
step 7.6.4, fill the "complex node" and the current specification clause number into the same row ": LABEL" column cell, in the middle of it "; "separate;
step 7.6.5, set the ID column cell value as the sum of the current normal node number and the special node number, and mark as Tri_ID;
Step 7.6.6, identify entity Ei、Ej、EkCorresponding to Ei_ID、Ej_ID、EkFilling the _IDin the node relation table file respectively, wherein the row of blank cells immediately follows the row of START _ ID, and filling TriEND _ ID column cells;
step 7.6.7, the ternary predicate relation is used as the cell value of the TYPE column in each row of the three rows;
step 7.6.8, according to entity Ei、EjAnd EkSetting the cell value of a column in which the current specification clause number of each row in the three rows is located as Seq (Seq is 1,2 and 3) in the semantic logic sequence in the ternary predicate relation;
step 7.7, if an entity E in the specificationiWith some ternary predicate TrjAnd if a binary predicate relation exists between the whole, the method comprises the following steps:
step 7.7.1, according to the entity E in the binary predicate relationiAnd ternary predicates TrjThe blank cell value of the row immediately after the START _ ID column in the node relation table is set as EiID or TrjEND _ ID column cell value is set as another value, and the node is pointed to by the mark relationship;
step 7.7.2, using the binary predicate relation as the column cell value of the same row TYPE;
step 7.7.3, if the binary predicate relation has a constraint degree relation at the same time, then using the constraint degree word as the unit value of the current specification number column of the y-th row; if no constraint degree relationship exists, the cell occupies the space in the form of 'occupation';
as shown in FIG. 3, step 7.8, if an Entity E in the Entity Set Entity _ Set3 of the current specificationiIf a binary predicate relationship exists among more than one entity (unitary predicate entirety or ternary predicate entirety) in the specification, and a semantic logic relationship of ' and ' or ' exists among simple sentences where the multiple entities (unitary predicate entirety or ternary predicate entirety) or the entities (unitary predicate entirety or ternary predicate entirety) are located, the binary predicate relationship among the entities is stored, and then the semantic logic relationship among the entities is expressed according to the following steps:
step 7.8.1, if the special node table file already exists, the cell value in the LABEL column is the selected node and the current specification bar number, only the row of the cell is required to be recorded, the cell value in the ID column is Dri_ID;
Step 7.8.2, if the special node table file does not have the 'LABEL' column, a certain cell value is 'selected node' and the specification clause number, then the steps are carried out according to the steps 7.8.3 to 7.8.5;
step 7.8.3, filling "space occupation" for the blank cell in the row immediately after the special node table "Name" column;
step 7.8.4, fill "select node" and current specification clause number into the same row ": LABEL" column cell, in the middle of it "; "separate;
step 7.8.5, set the ID column cell value as the sum of the current normal node number and the special node number, and record as Dri_ID;
At step 7.8.6, DriThe _IDis used as a blank cell value in a row immediately after the START _ ID, and the number of rows is recorded as y;
step 7.8.7, EiEND _ ID column cells in row y ";
step 7.8.8, in entity relationship tableSecondary search for current specification article by EiThe _IDis used as "". START _ ID "column cell value, and the row 'IDD' column cell value of binary predicate with the same semantic logic relationship is located, the value is used as the y-th row 'Flag' column cell value in the entity relationship table, and all the values are counted"; "separate;
step 7.8.9, according to the semantic logic relationship between the binary predicates, taking the AND OR as the y row and the TYPE column cell value;
step 7.8.10, setting the cell value of the column with the current specification number of the y-th row as "" occupied ";
step 7.8.11, if the current specification is in the Specification clause, take the form of EiIf more than one semantic logical relationship exists among the binary predicate relationships stored in the row of the column cell value of START _ ID in the node relationship table, repeating the steps 7.8.6 to 7.8.10;
step 7.8.12, if the current specification contains and has only one entity EiIf the binary predicate relations between more than one entity (unitary predicate entirety or ternary predicate entirety) exist in the specification and the binary predicate relations are the same semantic logic relation type, the graph structure expression and storage of the specification is completed, otherwise, the steps are performed according to steps 7.8.13 to 7.8.16;
step 7.8.13, extracting simple sentences S in the current standard clausejForming a clause Set sequence _ Set { S1, S2, S3 … … };
step 7.8.14, using each clause S in the clause Set Sennce _ SetjFirst entity EjCorresponding to EjThe _IDis sequentially in the node relation table, the row number of the END _ ID is recorded as yj
7.8.14, with DriID is the y-th of the node relation tablejRow START _ ID column cell value;
step 7.8.15, using the clause as the y-th node relation tablejTYPE column cell value;
7.8.16, searching the node relation table file, extracting the entity nodes with more than one degree in the specification, and sequentially searching each entity nodeThe entity node is led out and represents the current clause SjThe "IDD" column cell value of the row in which the predicate relation is located is the y-th of the node relation tablejRow 'Flag' column cell values, adopting and 'separating' between the values;
step 8, expressing and storing each standard article in the standard in sequence according to the step 6;
step 9, storing the common node table file, the special node table file and the node relation table file in a CSV file format by adopting a UTF-8 coding mode so as to be conveniently imported into a Neo4j database;
and step 10, sequentially importing the CSV files into a Neo4j database according to the sequence of a common node table, a special node table and a node relation table and according to a Neo4j data batch import format by using a Neo4j-import tool, thereby finishing the storage and expression of the graph structure of the standard semantic information covered by the architectural design.
Examples
Take the 15.1.23 th specification in the 15.1 th section of power supply of GB 50157-2013 subway design Specification as an example.
15.1.23 the main material used underground should be selected from halogen-free, low-smoke, flame-retardant or fire-resistant products.
Step 1, extracting all entities in subway design specification to form an Entity Set Entity _ Set1{ E ] of the specification1,E2,E3… }. In this specification, entities should be extracted: e1"power supply system", E2As "material", E3(ii) halogen-free, E4As "low smoke", E5As "flame retardant products", E6As "refractory material".
And 2, performing clustering and duplicate removal operation on the Entity Set Entity _ Set1 to obtain an Entity Set Entity _ Set 3. Since this example only uses one specification as an example, and there are no similar and duplicate entities in this column, the Entity Set Entity _ Set1 has the same content as the Entity Set Entity _ Set 3.
And 3, establishing a common node table file, wherein the Name is taken as a first column head row unit cell value, the unit is taken as a second column head row unit cell value, the LABEL is taken as a third column head row unit cell value, and the ID is taken as a fourth column head row unit cell value.
And 3.1, sequentially and respectively taking the Entity 'power supply system', 'material', 'halogen-free', 'low smoke', 'flame retardant product' and 'refractory material' in the Entity Set Entity _ Set3 as cell values of a first row, a second row and a seventh row.
Step 3.2, the third column, second to seventh rows of cell values are numbered "15.1.23" according to the present specification.
And 3.3, respectively setting cell values of second to seventh rows in a fourth column as 1,2,3, 4, 5 and 6.
And step 4, establishing a special node table file, wherein the special node table file takes 'Name', 'LABEL' and 'ID' as the first row, the first column and the third column of cell values respectively.
And step 5, establishing a node relation table file, wherein the node relation table file respectively takes the values of 'START _ ID', 'END _ ID', 'Start', 'END', 'Flag', 'IDD' and 'TYPE' as first row first to seventh column unit cell values.
5.1, filling the current specification clause number '15.1.23' into the first row and the eighth column of cells in the node relation table.
Step 5.2, entity E by looking up the common node table file1The unit cell value of the IDD column of the row where the power supply system is positioned is 1, namely E1And (4) sequentially searching the IDD values of the rows where the rest entities are located, wherein the IDs are 1.
Step 6, according to the standard clause semantics, entity E1The power supply system and the underground power supply system have a unitary predicate relation.
Step 6.1, entity E1E corresponding to "power supply system1The _ ID (i.e., "1") fills the first column, second row, and second column, second row cells in the node relationship table.
And 6.2, filling the second row cell of the seventh column with the unary predicate 'used underground'.
And 6.3, since the unary predicate does not have the semantic relation of the constraint degree, taking ' the ' eight ' column and the second row of the cell values in the node relation table.
Step 6.4, take "A1" as the sixth column, second row cell value.
And 7, according to the semantic of the specification, the entity E2 'material' has a univariate predicate relation with the entity E2 'material' and the entity E2 'material'.
Step 7.1, fill the E2_ ID (i.e., "2") corresponding to the entity E2 "Material" into the first column, the third row and the second column, the third row of cells in the node relation table.
And 7.2, filling the unary predicate 'main' into the seventh column and the third row of cells.
And 7.3, since the unitary predicate does not have the semantic relation of the constraint degree, taking "" as the cell value of the eighth column and the third row in the node relation table.
Step 7.4, take "A2" as the sixth column and third row cell value.
And 8, according to the semantic of the standard clause, a binary predicate attribute relation exists between the entity E1 'power supply system' and the unitary predicate 'main material'.
Step 8.1, fill the E1_ ID (i.e., "1") corresponding to the entity E1 "power supply system" into the first column and the fourth row cells in the node relation table.
Step 8.2, fill the E2_ ID (i.e., "2") corresponding to the "material" of entity E2 into the fourth row of cells in the second column of the node relation table.
And 8.3, filling the attribute of the binary predicate into the fourth row cell of the seventh column in the node relation table.
And 8.4, since the binary predicate does not have the semantic relation of the constraint degree, taking "" as a fourth row unit lattice value of the eighth column in the node relation table.
Step 8.5, take "A3" as the sixth column and fourth row cell value.
And 8.6, using the cell values in the first column and the second row in the special node table file.
Step 8.7, with "simple node; 15.1.23 "is the second column and second row cell value in the special node table file.
In step 8.8, the total node count "7" is used as the cell value in the second row of the third column in the special node table file, i.e. Or1_ ID is 7.
Step 8.9, let Or1_ ID (i.e., "7") be the first column, fifth row cell value in the node relationship table file.
Step 8.10, take the E2_ ID (i.e., "2") corresponding to the "material" of the entity E2 as the fifth row cell value in the second column of the node relation table file.
And 8.11, taking the simple side as a cell value of the seventh column and the fifth row in the node relation table file.
Step 8.12, take the "IDD" column cell value (i.e., "a 2") of the row where the unary predicate relationship "primary material" is located as the cell value of the fifth column in the node relationship table file.
Step 8.13, taking the "IDD" column cell value (namely "A3") of the row where the binary predicate relation "attribute" is located as the cell value of the fourth column and the fifth row in the node relation table file.
And 8.14, taking the 'A4' as the cell value of the sixth column and the fifth row in the node relation table file.
And 9, according to the semantic meaning of the standard article, a binary predicate relation from an entity E2 material to an entity E5 flame-retardant product exists.
Step 9.1, fill the E2_ ID (i.e., "2") corresponding to the "material" of entity E2 into the first column and the sixth row of cells in the node relation table.
And 9.2, filling the cell in the sixth row of the second column in the node relation table with the E5_ ID (namely 5) corresponding to the entity E5 'flame-retardant product'.
And 9.3, filling the seventh column and the sixth row of cells in the node relation table with the binary predicate 'selected'.
And 9.4, because the binary predicate has the constraint degree semantic relation of 'answer' at the same time, taking 'answer' as the cell value of the sixth row in the eighth column in the node relation table.
Step 9.5, take "A5" as the sixth column and sixth row cell value.
Step 10, according to the semantic of the rule of this specification, there is a binary predicate relationship from entity E2 "material" to entity E6 "refractory material".
Step 10.1, fill the E2_ ID (i.e., "2") corresponding to the "material" of entity E2 into the seventh row of cells in the first column of the node relation table.
Step 10.2, fill the cell in the seventh row of the second column in the node relation table with the E6_ ID (i.e., "6") corresponding to the entity E6 "refractory".
And step 10.3, filling the seventh row of cells in the seventh column in the node relation table with the binary predicate 'selected'.
And step 10.4, because the binary predicate has the constraint degree semantic relation of 'answer' at the same time, taking 'answer' as a cell value of the seventh row in the eighth column in the node relation table.
Step 10.5, take "A6" as the cell value of the sixth column and the seventh row.
Step 11, according to the semantic of the rule of this specification, there is a binary predicate relation from entity E6 "refractory material" to entity E3 "halogen-free" characteristic ".
Step 11.1, fill the E6_ ID (i.e., "6") corresponding to the "refractory" entity E6 into the eighth column of cells in the node relation table.
Step 11.2, fill the cell in the eighth row of the second column in the node relation table with the E3_ ID (i.e., "3") corresponding to the "no halogen" entity E3.
And 11.3, filling the seventh column and the eighth row of cells in the node relation table with the binary predicate 'characteristics'.
And 11.4, because the binary predicate has the constraint degree semantic relation of 'answer' at the same time, taking 'answer' as the cell value of the eighth row in the eighth column in the node relation table.
Step 11.5, take "A7" as the cell value of the eighth row of the sixth column.
Step 12, according to the semantic of the rule of this specification, there is a binary predicate relation from entity E6 "refractory material" to entity E4 "low smoke" characteristic ".
Step 12.1, fill the cell in the ninth row of the first column in the node relation table with the E6_ ID (i.e., "6") corresponding to the entity E6 "refractory material".
Step 12.2, fill the cell in the ninth row of the second column in the node relation table with the E4_ ID (i.e., "4") corresponding to the entity E4 "Low Smoke".
And step 12.3, filling the cells in the ninth row of the seventh column in the node relation table with the binary predicate 'characteristics'.
And step 12.4, because the binary predicate has the constraint degree semantic relation of 'answer' at the same time, taking 'answer' as a cell value of the ninth row in the eighth column in the node relation table.
Step 12.5, take "A8" as the ninth row cell value in the sixth column.
Step 13, according to the semantic of the rule of the specification, a binary predicate relation of the entity E5 ' flame-retardant product ' to the entity E3 ' halogen-free ' characteristic ' exists.
Step 13.1, fill the E5_ ID (i.e., "5") corresponding to the entity E5 "flame retardant product" into the first column, tenth row of cells in the node relationship table.
Step 13.2, fill the E3_ ID (i.e., "3") corresponding to the "no halogen" entity E3 into the tenth row of cells in the second column of the node relation table.
And step 13.3, filling the characteristics of the binary predicate into the cells in the seventh column and the tenth row in the node relation table.
And step 13.4, because the binary predicate has the constraint degree semantic relation of 'answer' at the same time, taking 'answer' as the cell value of the tenth row in the eighth column in the node relation table.
Step 13.5, take "A9" as the cell value of the tenth row in the sixth column.
Step 14, according to the semantic of the rule of this specification, there is a binary predicate relationship from entity E5 "flame-retardant product" to entity E4 "low smoke" characteristic ".
Step 14.1, fill the E5_ ID (i.e., "5") corresponding to the entity E5 "flame retardant product" into the first column, eleventh row of cells in the node relationship table.
Step 14.2, fill the E4_ ID (i.e., "4") corresponding to the "low smoke" of the entity E4 into the eleventh row of cells in the second column of the node relation table.
And 14.3, filling the cells in the eleventh row of the seventh column in the node relation table with the binary predicate 'characteristics'.
And step 14.4, because the binary predicate has the constraint degree semantic relation of 'answer' at the same time, taking 'answer' as a cell value of the eleventh row in the eighth column in the node relation table.
Step 14.5, take "A10" as the eleventh row cell value in the sixth column.
Step 15, because the specification contains the entity E2 material, the binary predicate relation exists between the entity E5 flame-retardant product and the entity E6 flame-retardant material, and the unitary predicate relation exists between the entity E2 flame-retardant product and the entity E6 flame-retardant material.
And step 15.1, using the "" as the first column and the third row of the cell values in the special node table file.
Step 15.2, selecting nodes; 15.1.23 "is the second column and third row cell value in the special node table file.
In step 15.3, the total number of all nodes "8" is used as the cell value in the third row in the third column of the special node table file, i.e. Dr1_ ID equals to 8.
At step 15.4, Dr1_ ID (i.e., "8") is used as the first column and twelfth row cell value in the node relationship table file.
Step 15.5, take the E2_ ID corresponding to the "material" of the entity E2 as the cell value of the twelfth column in the node relation table file.
Step 15.6, according to the semantic of the standard, the semantic logical relationship of "or" exists between "the main material should be selected as the flame retardant product" and "the main material should be selected as the fire resistant material", so that the value of the twelfth row of the seventh column in the node relationship table file is the value of "or".
Step 15.7, the cell value of the 'IDD' column of the row of the two-element predicate is 'A4'; a5' is the cell value of the twelfth row in the fifth column of the node relationship table file.
Step 15.8, use "A11" as the cell value in the twelfth column of the sixth column in the node relation table file.
Step 16, because entity E5 "flame retardant product" exists in the specification, there is a binary predicate relationship between entity E3 "no halogen" and entity E4 "low smoke".
Step 16.1, searching for a special node table file, wherein the cell value already existing in the third row of the second column is 'selected node'; 15.1.23 ". Therefore, it is only necessary to remember that row "IDD" column cell value Dr1_ ID is 8.
At step 16.2, Dr1_ ID (i.e., "8") is used as the first column, thirteenth row, cell value in the node relationship table file.
Step 16.3, taking the E5_ ID corresponding to the entity E5 "flame-retardant product" as the cell value in the thirteenth row of the second column in the node relation table file.
Step 16.4, according to the semantics of the provisions of the specification, a "and" semantic logical relationship exists between "the flame-retardant product should be free of halogen" and "the flame-retardant product should be low-smoke", and therefore the "and" is taken as a cell value in the seventh column and the thirteenth row in the node relationship table file.
Step 16.5, the cell value of the 'IDD' column of the row of the two-element predicate is 'A9'; a10' is the cell value in the thirteenth row of the fifth column of the node relationship table file.
Step 16.:6, the cell value in the sixth column and the thirteenth row in the node relation table file is taken as 'A12'.
Step 17, since the specification contains the entity E6 "refractory material", there is a binary predicate relationship between the entity E3 "no halogen" and the entity E4 "low smoke".
Step 17.1, searching for a special node table file, wherein the cell value existing in the third row of the second column is 'selected node'; 15.1.23 ". Therefore, it is only necessary to remember that row "IDD" column cell value Dr1_ ID is 8.
At step 17.2, Dr1_ ID (i.e., "8") is used as the first column and fourth row cell value in the node relationship table file.
And step 17.3, taking the E6_ ID corresponding to the entity E6 "refractory material" as the second column and the fourth row of cell values in the node relation table file.
And step 17.4, according to the semantic meaning of the standard, the 'fire-resistant material should have no halogen' and the 'fire-resistant material should have low smoke', so that the 'and' is taken as the unit cell value in the seventh column and the fourteenth row in the node relation table file.
Step 17.5, the cell value of the 'IDD' column of the row of the two-element predicate is 'A7'; a8 "is the cell value in the fifth column and the fourteenth row in the node relationship table file.
Step 17.6, take "a 13" as the cell value in the sixth column and the fourteenth row in the node relation table file.
Step 18, because there are binary predicate relations between three entities and more than one entity in the specification, the clauses in the specification are combed to form a clause Set sequence _ Set { S1, S2, S3, S4 }. S1 is that the main material of underground power supply equipment should be halogen-free flame retardant product; s2 is the main material of underground power supply equipment, namely, a low-smoke flame-retardant product is selected; s3, the main material of the underground power supply equipment should be halogen-free refractory material; s4 is "low smoke refractory should be used as the main material of underground power supply equipment".
In step 18.1, for the clause S1, the E1_ ID corresponding to the first entity E1 "power supply equipment" of the clause is used as the cell value of the fifteenth row in the second column of the node relation table.
At step 18.2, Dr1_ ID (i.e., "8") is used as the cell value in the fifteenth column of the node relationship table file.
And 18.3, taking the clause as a cell value in the fifteenth row in the seventh column of the node relation table file.
Step 18.4, after searching the node relationship table file, the nodes "power supply system", "material", "flame retardant product" and "refractory material" in the specification are nodes with multiple degrees, which represent the predicate relationship "power supply system used underground", "material of power supply system", "material should be selected as flame retardant product" in clause S1, and "flame retardant product should have halogen-free property". The cell value of the column of row "IDD" where this predicate relation is located (i.e., "A1, A3, A5, A9") is taken as the cell value of the fifteenth row in the fifth column of the node relation table file.
In step 18.5, for the clause S2, the E1_ ID corresponding to the first entity E1 "power supply equipment" of the clause is used as the cell value of the sixteenth row in the second column of the node relation table.
At step 18.6, Dr1_ ID (i.e., "8") is used as the first column and sixteenth row cell value in the node relationship table file.
And step 18.7, taking the clause as a sixteenth row unit cell value in the seventh column of the node relation table file.
Step 18.8, after searching the node relationship table file, the nodes "power supply system", "material", "flame retardant product" and "refractory material" in the specification are nodes with multiple degrees, which represent the predicate relationship "power supply system used underground", "material of power supply system", "material should be selected as flame retardant product" in clause S2, and "flame retardant product should have low smoke characteristics". The cell value of the column of row "IDD" where this predicate relation is located (i.e., "A1, A3, A5, A10") is taken as the cell value of the fifteenth row in the fifth column of the node relation table file.
In step 18.9, for the clause S3, the E1_ ID corresponding to the first entity E1 "power supply equipment" of the clause is used as the cell value of the seventeenth row in the second column in the node relation table.
At step 18.10, Dr1_ ID (i.e., "8") is used as the first column and seventeenth row cell value in the node relationship table file.
And 18.11, taking the clause as a cell value in the seventeenth row of the seventh column in the node relation table file.
Step 18.12, after searching the node relationship table file, the nodes "power supply system", "material", "flame retardant product" and "refractory material" in the specification are nodes with multiple degrees, which represent the predicate relationship "power supply system used underground", "material of power supply system", "refractory material should be selected", "refractory material should have halogen-free property" in clause S3. The cell value of the column of row "IDD" where this predicate relation is located (i.e., "A1, A3, A6, A7") is taken as the cell value of the seventeenth row of the fifth column in the node relation table file.
In step 18.13, for the clause S4, the E1_ ID corresponding to the first entity E1 "power supply equipment" of the clause is used as the eighteenth cell value in the second column of the node relation table.
At step 18.14, Dr1_ ID (i.e., "8") is used as the first column and eighteenth row cell value in the node relationship table file
And step 18.15, taking the clause as a cell value of the eighteenth row in the seventh column in the node relation table file.
Step 18.16, after searching the node relationship table file, the nodes "power supply system", "material", "flame retardant product" and "refractory" in the specification are nodes with multiple degrees, which represent the predicate relationship "power supply system used underground", "material of power supply system", "refractory material" should be selected "and" refractory material should have low smoke characteristics "in clause S4. The cell value of the column of row "IDD" where this predicate relation is located (i.e., "A1, A3, A6, A8") is taken as the eighteenth row of cell value of the fifth column in the node relation table file.
And step 19, storing the common node table file, the special node table file and the node relation table file in a CSV file format by adopting a UTF-8 coding mode.
And 20, sequentially importing the CSV files into a Neo4j database according to the sequence of a common node table, a special node table and a node relation table by means of a Neo4j-import tool, thereby completing the graph structure storage and expression of the semantic information of the clause of the specification.

Claims (5)

1. A construction method of a building design specification knowledge graph based on a graph database is characterized by comprising the following steps:
step 1, extracting all entities in a storage specification to be processed, and constructing a specification semantic storage Entity Set Entity _ Set1{ E }1,E2,E3,…};
Step 2, for the Entity Set Entity _ Set1{ E obtained in step 11,E2,…,E3Performing clustering and duplicate removal operation to obtain an Entity Set Entity _ Set3 after duplicate removal;
step 3, establishing a common node table file, taking the Name as a first column head line unit value, taking the unit as a second column head line unit value, taking the LABEL as a third column head line unit value, taking the ID as a fourth column head line unit value, and then taking each Entity E in the Entity Set Entity _ Set3iFilling in a common node table file, i =1,2,3 … …;
Step 4, establishing a special node table file, wherein the special node table file takes 'Name', 'LABEL' and 'ID' as first row first to third column unit values respectively;
step 5, establishing a node relation table file, wherein the node relation table file respectively takes START _ ID, END _ ID, Start, END, Flag, IDD and TYPE as first row first to seven column unit values, each row sequentially stores An entity relation from the second row in the node relation table, and each row adds one relation, so as to fill An in the 'IDD' column unit of the row where the relation is located, wherein n is a stored relation coefficient;
step 6, aiming at specific standard provisions, setting a column of cell values immediately after the head line of the node relation table established in the step 5 as the serial number of the standard provisions, and dividing each second-level provision into a column for the standard provisions with the second-level serial numbers;
step 7, analyzing and combing predicate relations among the entities in the current standard clause, and respectively searching related entity E in the standard clause in the common node table fileiCorresponding column unit cell value of ID is marked as Ei_ID,i=1,2,3……;
Step 8, expressing and storing each standard article in the standard in sequence according to the step 6;
step 9, storing the common node table file, the special node table file and the node relation table file in a CSV file format by adopting a UTF-8 coding mode so as to be conveniently imported into a Neo4j database;
and step 10, sequentially importing the CSV files into a Neo4j database according to the sequence of a common node table, a special node table and a node relation table and according to a Neo4j data batch import format by using a Neo4j-import tool, thereby finishing the storage and expression of the graph structure of the standard semantic information covered by the architectural design.
2. The method for constructing a knowledge-graph of building design specifications based on a graph database according to claim 1, wherein the step 2 specifically comprises:
step 2.1, clustering entities with similar semantic expressions, uniformly abstracting the entities into corresponding entities to obtain an Entity Set Entity _ Set 2;
and 2.2, removing repeated entities in the Entity Set Entity _ Set2 to obtain an Entity Set Entity _ Set 3.
3. The method as claimed in claim 1, wherein each Entity E in the Entity Set Entity _ Set3 in step 3 is selected as an Entity EiFilling into the common node form file specifically comprises:
step 3.1, all entities E in the Entity Set Entity _ Set3iSequentially serving as cell values of the i +1 th row in the first column of the common node table, i =1,2,3 … …;
step 3.2, if entity EiIf the digital entity is a digital entity with units, only the digital entity is taken as the cell value of the i +1 th row in the first column, and the measurement unit is stored in the cell in the same row in the second column;
step 3.3, involving entity E in the processed specificationiIf entity E is the i +1 th cell valueiWhen multiple specifications are involved, the numbers of the specifications are equal to each other; "separate;
step 3.4, Set the cell value of the i +1 th row in the fourth column to i, i =1,2,3 … … until i equals to the total number of entities in the Entity Set Entity _ Set 3.
4. The method for constructing a knowledge-graph of building design specifications based on a graph database according to claim 1, wherein the step 7 is specifically as follows:
step 7.1, if an entity EiIf the entity E has a unitary predicate relation with the entity E, the entity E is connected with the entity EiCorresponding to EiSTART _ ID and END _ ID are respectively filled into space cells in the row immediately after the row where the END _ ID is located, the number of rows is recorded as y, and the univariate predicate relation is the y row, the TYPE column cell value;
step 7.2, if the unitary predicate relation contains the constraint degree relation at the same time, filling the constraint degree words into the cells of the column where the y-th row current processing specification clause number is located, otherwise filling the cells in ' the ' y ' -th row current processing specification clause number;
step 7.3, if there is an entity E in the specificationjAnd the binary predicate relation between the unitary predicate entirety is carried out according to the following operation steps:
step 7.3.1, if the special node table file has the cell value of LABEL column as simple node and the current standard clause number, only the row of the cell is required to be recorded, the cell value of ID column is Ori_ID;
Step 7.3.2, if the special node table file does not have the 'LABEL' column that a certain cell value is a 'simple node' and the specification clause number, then the steps are carried out according to the step 7.3.3 to the step 7.3.5;
step 7.3.3, filling space occupation in a row of blank cells after the special node table 'Name' is tightly listed;
step 7.3.4, filling the simple nodes and the current specification clause numbers into the same row, LABEL column cells and the same interval; "separate;
step 7.3.5, set the ID column cell value as the sum of the current normal node number and the special node number, and mark as Ori_ID;
Step 7.3.6, identify entity EjCorresponding to EjFilling the _IDin a node relation table, wherein a row of blank cells is arranged immediately after the START _ ID, and the row number is recorded as y;
step 7.3.7, including the entity E in the unitary predicate relationiCorresponding to EiEND _ ID column cells in row y ";
step 7.3.8, the binary predicate relation is the y row 'TYPE' column cell value;
step 7.3.9, if the binary predicate relation has a constraint degree relation at the same time, then using the constraint degree word as the unit value of the current specification number column of the y-th row; if no constraint degree relationship exists, the cell occupies the space in the form of 'occupation';
step 7.3.10, mixing the orderiThe _IDis filled into the y +1 th row ": START _ ID" column cell;
step 7.3.11, EiThe _IDis filled into the y +1 th row, END _ ID column cell;
step 7.3.12, use the simple side as the y +1 row, TYPE column cell value;
step 7.3.13, taking the unit cell value of row "IDD" of the unitary predicate relation as the unit cell value of row "Start" of the (y + 1) th row;
step 7.3.14, use the y row "IDD" column cell value as the y +1 row "End" column cell value;
step 7.4, if there is some entity E in the specificationiTo another entity EjA binary predicate relation between the two entities, then the entity EiCorresponding to EiFilling in the _ID, the row immediately after the row where the START _ ID is located is blank cell, noting that the number of rows is y, and filling the entity EjCorresponding to EjFilling _IDin the y row, END _ ID column cell, and taking the binary predicate relation as the same row, TYPE column cell value;
step 7.5, if the binary predicate relation contains the constraint degree relation at the same time, filling the constraint degree words into the cells of the row where the current processing specification clause number of the y-th row is located, otherwise filling the cells in ' the ' y ' form;
step 7.6, if some three entities E in the specificationi、Ej、EkWith ternary predicate relation TriThen, the following operations are carried out:
step 7.6.1, if there is a ' Name ' column in the special node table file, a certain cell value is the ternary predicate relation, then only the same row of the cell is needed, the LABEL ' column is added with the standard clause number, and the standard clause number is adopted with the original cell value; ID column cell value is Tri_ID;
Step 7.6.2, if a certain cell value of the Name column does not exist in the special node table file and is the ternary predicate relation, the steps are carried out according to the steps 7.6.3 to 7.6.5;
step 7.6.3, filling the ternary predicate relation into a row of blank cells after the special node table 'Name' is listed closely;
step 7.6.4, fill the "complex node" and the current specification clause number into the same row ": LABEL" column cell, in the middle of it "; "separate;
step 7.6.5, set the ID column cell value as the sum of the current normal node number and the special node number, and mark as Tri_ID;
Step 7.6.6, identify entity Ei、Ej、EkCorresponding to Ei_ID、Ej_ID、EkFilling the _IDin the node relation table file respectively, wherein the row of blank cells immediately follows the row of START _ ID, and filling TriEND _ ID column cells;
step 7.6.7, the ternary predicate relation is used as the cell value of the TYPE column in each row of the three rows;
step 7.6.8, according to entity Ei、EjAnd EkSetting the cell value of a column where the current standard clause numbers of all rows in the three rows are located as Seq in the semantic logic sequence in the ternary predicate relation, wherein Seq =1,2 and 3;
step 7.7, if an entity E in the specificationiWith some ternary predicate TrjAnd if a binary predicate relation exists between the whole, the method comprises the following steps:
step 7.7.1, according to the entity E in the binary predicate relationiAnd ternary predicates TrjThe blank cell value of the row immediately after the START _ ID column in the node relation table is set as EiID or TrjEND _ ID column cell value is set as another value, and the node is pointed to by the mark relationship;
step 7.7.2, using the binary predicate relation as the column cell value of the same row TYPE;
step 7.7.3, if the binary predicate relation has a constraint degree relation at the same time, then using the constraint degree word as the unit value of the current specification number column of the y-th row; if no constraint degree relationship exists, the cell occupies the space in the form of 'occupation';
step 7.8, if an Entity E in the Entity Set Entity _ Set3 of the current specificationiIf there is a binary predicate relation between more than one entity in the specification and there is a semantic logical relation of "and" OR "between the multiple entities or the simple sentences in which the entities are located, then the semantic logical relation is performedAfter the inter-binary predicate relation is stored, the expression of the semantic logic relation is carried out according to the following steps:
step 7.8.1, if the special node table file already exists, the cell value in the LABEL column is the selected node and the current specification bar number, only the row of the cell is required to be recorded, the cell value in the ID column is Dri_ID;
Step 7.8.2, if the special node table file does not have the 'LABEL' column, a certain cell value is 'selected node' and the specification clause number, then the steps are carried out according to the steps 7.8.3 to 7.8.5;
step 7.8.3, filling "space occupation" for the blank cell in the row immediately after the special node table "Name" column;
step 7.8.4, fill "select node" and current specification clause number into the same row ": LABEL" column cell, in the middle of it "; "separate;
step 7.8.5, set the ID column cell value as the sum of the current normal node number and the special node number, and record as Dri_ID;
At step 7.8.6, DriThe _IDis used as a blank cell value in a row immediately after the START _ ID, and the number of rows is recorded as y;
step 7.8.7, EiEND _ ID column cells in row y ";
step 7.8.8, sequentially searching the current specification clauses in the entity relationship table by EiThe _IDis used as "". START _ ID "column cell value, and the row 'IDD' column cell value of binary predicate with the same semantic logic relationship is located, the value is used as the y-th row 'Flag' column cell value in the entity relationship table, and all the values are counted"; "separate;
step 7.8.9, according to the semantic logic relationship between the binary predicates, taking the AND OR as the y row and the TYPE column cell value;
step 7.8.10, setting the cell value of the column with the current specification number of the y-th row as "" occupied ";
step 7.8.11, if the current specification is in the Specification clause, take the form of EiIs a column cell value of START _ ID in the node relation tableRepeating steps 7.8.6-7.8.10 if there is more than one semantic logical relationship between the binary predicate relationships stored by the row;
step 7.8.12, if the current specification contains and has only one entity EiIf the binary predicate relations between more than one entity exist in the specification and the binary predicate relations are the same semantic logic relation type, the graph structure expression and storage of the specification can be completed, otherwise, the steps 7.8.13 to 7.8.16 are performed;
step 7.8.13, extracting simple sentences S in the current standard clausejForming a clause Set sequence _ Set { S1, S2, S3 … … };
step 7.8.14, using each clause S in the clause Set Sennce _ SetjFirst entity EjCorresponding to EjThe _IDis sequentially in the node relation table, the row number of the END _ ID is recorded as yj
7.8.14, with DriID is the y-th of the node relation tablejRow START _ ID column cell value;
step 7.8.15, using the clause as the y-th node relation tablejTYPE column cell value;
7.8.16, searching the node relation table file, extracting the entity nodes with more than one degree in the specification, sequentially leading out the entity nodes and representing the current clause SjThe "IDD" column cell value of the row in which the predicate relation is located is the y-th of the node relation tablejThe row "Flag" is the cell value, the values are used and the "partition".
5. The method of claim 4, wherein the at least one entity, the plurality of entities, the semantic logical relationship of "and" or "between the simple sentences in which the entities exist in step 7.8, and the entity in the at least one entity in step 7.8.12 is a unitary predicate integer or a ternary predicate integer.
CN201911409285.3A 2019-12-31 2019-12-31 Construction method of building design specification knowledge graph based on graph database Active CN111104525B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911409285.3A CN111104525B (en) 2019-12-31 2019-12-31 Construction method of building design specification knowledge graph based on graph database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911409285.3A CN111104525B (en) 2019-12-31 2019-12-31 Construction method of building design specification knowledge graph based on graph database

Publications (2)

Publication Number Publication Date
CN111104525A CN111104525A (en) 2020-05-05
CN111104525B true CN111104525B (en) 2022-03-25

Family

ID=70424587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911409285.3A Active CN111104525B (en) 2019-12-31 2019-12-31 Construction method of building design specification knowledge graph based on graph database

Country Status (1)

Country Link
CN (1) CN111104525B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111915010B (en) * 2020-06-19 2024-02-02 西安理工大学 Canonical knowledge storage method based on combined structure
CN111930958B (en) * 2020-07-13 2023-12-01 车智互联(北京)科技有限公司 Graph database construction method, computing device and readable storage medium
CN115905577B (en) * 2023-02-08 2023-06-02 支付宝(杭州)信息技术有限公司 Knowledge graph construction method and device and rule retrieval method and device
CN117252449B (en) * 2023-11-20 2024-01-30 水润天府新材料有限公司 Full-penetration drainage low-noise pavement construction process and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8527517B1 (en) * 2012-03-02 2013-09-03 Xerox Corporation Efficient knowledge base system
CN109446341A (en) * 2018-10-23 2019-03-08 国家电网公司 The construction method and device of knowledge mapping
CN110222199A (en) * 2019-06-20 2019-09-10 青岛大学 A kind of character relation map construction method based on ontology and a variety of Artificial neural network ensembles

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8527517B1 (en) * 2012-03-02 2013-09-03 Xerox Corporation Efficient knowledge base system
CN109446341A (en) * 2018-10-23 2019-03-08 国家电网公司 The construction method and device of knowledge mapping
CN110222199A (en) * 2019-06-20 2019-09-10 青岛大学 A kind of character relation map construction method based on ontology and a variety of Artificial neural network ensembles

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于异构网络的关系推理与预测方法研究;郭坤铭;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》;20171215;全文 *

Also Published As

Publication number Publication date
CN111104525A (en) 2020-05-05

Similar Documents

Publication Publication Date Title
CN111104525B (en) Construction method of building design specification knowledge graph based on graph database
Chang et al. A survey of web information extraction systems
Gray et al. All complete intersection Calabi-Yau four-folds
US7680333B2 (en) System and method for binary persistence format for a recognition result lattice
CN109446221B (en) Interactive data exploration method based on semantic analysis
CN106066866A (en) A kind of automatic abstracting method of english literature key phrase and system
CA2413183A1 (en) System and method for sharing data between hierarchical databases
CN105159998A (en) Keyword calculation method based on document clustering
CN109299200A (en) It is the method, device and equipment of database by data model translation
US20180173738A1 (en) Constant Range Minimum Query
CN111178051A (en) Building information model self-adaptive Chinese word segmentation method and device
CN111858567A (en) Method and system for cleaning government affair data through standard data elements
US20090234852A1 (en) Sub-linear approximate string match
CN113190687A (en) Knowledge graph determining method and device, computer equipment and storage medium
CN115203337A (en) Database metadata relation knowledge graph generation method
CN111291573A (en) Phrase semantic mining method driven by directed graph meaning guide model
CN113191118A (en) Text relation extraction method based on sequence labeling
CN108509397A (en) Storage, analytic method and the system of hierarchical structure data based on identifier technology
JP2007535009A (en) A data structure and management system for a superset of relational databases.
CN114207598A (en) Electronic form conversion
CN115827885A (en) Operation and maintenance knowledge graph construction method and device and electronic equipment
Andrews et al. Knowledge discovery through creating formal contexts
CN111324690B (en) FrameNet-based graphical semantic database processing method
CN113885846A (en) Data layer code generation method based on markdown format data dictionary
Shchur et al. Fast and scalable genome-wide inference of local tree topologies from large number of haplotypes based on tree consistent PBWT data structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant