CN116069951A - Construction worker safety knowledge extraction and knowledge graph construction method - Google Patents
Construction worker safety knowledge extraction and knowledge graph construction method Download PDFInfo
- Publication number
- CN116069951A CN116069951A CN202310175037.7A CN202310175037A CN116069951A CN 116069951 A CN116069951 A CN 116069951A CN 202310175037 A CN202310175037 A CN 202310175037A CN 116069951 A CN116069951 A CN 116069951A
- Authority
- CN
- China
- Prior art keywords
- construction
- knowledge
- safety
- safety knowledge
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title claims abstract description 161
- 238000000605 extraction Methods 0.000 title claims abstract description 37
- 238000012549 training Methods 0.000 claims abstract description 37
- 238000000034 method Methods 0.000 claims abstract description 23
- 238000012360 testing method Methods 0.000 claims abstract description 9
- 230000000694 effects Effects 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 20
- 238000013507 mapping Methods 0.000 claims description 13
- 230000003449 preventive effect Effects 0.000 claims description 11
- 230000002776 aggregation Effects 0.000 claims description 9
- 238000004220 aggregation Methods 0.000 claims description 9
- 238000007689 inspection Methods 0.000 claims description 9
- 238000004519 manufacturing process Methods 0.000 claims description 8
- 230000000052 comparative effect Effects 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000003860 storage Methods 0.000 abstract description 2
- 239000003550 marker Substances 0.000 description 16
- 230000008569 process Effects 0.000 description 9
- 239000004973 liquid crystal related substance Substances 0.000 description 8
- 238000013135 deep learning Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 3
- 238000011835 investigation Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 208000027418 Wounds and injury Diseases 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 239000000306 component Substances 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Animal Behavior & Ethology (AREA)
- Human Resources & Organizations (AREA)
- Databases & Information Systems (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Educational Administration (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the technical field related to construction safety management, and discloses a construction worker safety knowledge extraction and knowledge graph construction method. The method comprises the following steps: s1, acquiring construction safety knowledge text data, constructing a safety knowledge framework, and defining a hierarchy level contained in the framework, a sub-hierarchy level corresponding to each hierarchy level and attribute relations among the hierarchy levels; s2, dividing the collected partial data into a training set and a testing set, and classifying all entries in each piece of text data according to a hierarchy and a sub-hierarchy defined in the safety knowledge framework; s3, constructing a prediction model, and predicting text data in the residual acquired data to obtain attribute relations among the levels, sub-levels and levels corresponding to the entries in each piece of text data, namely, extracting safety knowledge. Finally, the storage and retrieval of the extracted security knowledge are realized based on the graph database. The invention solves the problem of automatically acquiring the relationship between entities in the extraction of the safety knowledge.
Description
Technical Field
The invention belongs to the technical field related to construction safety management, and particularly relates to a construction worker safety knowledge extraction and knowledge graph construction method.
Background
Various engineering document materials contain abundant construction safety knowledge, such as construction safety standard specifications, accident investigation reports and the like, but the accumulated safety knowledge is not effectively managed and organized, and is not beneficial to supporting efficient personalized safety knowledge training. In order to orderly organize construction safety knowledge resources and serve the integrated application of safety knowledge, a safety knowledge extraction and knowledge graph construction method oriented to safety knowledge training of construction workers is necessary to be constructed. At present, in the field of construction safety of construction engineering, the semantic expression of safety risk knowledge is promoted mainly by constructing a construction safety field ontology model so as to support applications such as risk identification and the like. However, the existing construction safety ontology model lacks definition of safety knowledge and core concepts and relationships between concepts facing the construction worker's safety knowledge learning. For instantiation of the ontology model, a researcher mainly adopts a manual filling mode initially, and the model construction process is time-consuming, labor-consuming and difficult to update. The text knowledge extraction is performed by manually defining rules, so that requirements on application scenes such as sentence patterns are high, and the applicability and robustness of the rules are very limited. In addition, the existing method for extracting entity objects based on natural language processing and deep learning cannot realize synchronous automatic extraction of the relationships among the entities.
Therefore, the following problems still need to be solved in the construction worker safety knowledge extraction and knowledge graph construction process: (1) Constructing a safety knowledge element system oriented to construction worker safety knowledge learning. (2) A method for automatically and synchronously extracting the relation between the entities of the safety knowledge concept in the text is constructed.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the invention provides a construction worker safety knowledge extraction and knowledge graph construction method, which solves the problem of automatic acquisition of the relationship between entities in the safety knowledge extraction.
In order to achieve the above object, according to the present invention, there is provided a construction worker safety knowledge extraction and knowledge graph construction method, comprising the steps of:
the data form an original data set, a security knowledge framework is constructed, the levels contained in the framework, the sub-level corresponding to each level and the attribute relationship among the levels are defined;
s2, dividing partial data in the original data set into a training set and a testing set, classifying all entries in each piece of text data in the training set and the testing set according to the levels and sub-levels defined in the security knowledge framework, and determining attribute relations among all levels so as to obtain the corresponding level, sub-level and attribute relations among the levels of each piece of text data;
s3, constructing a prediction model, taking the text data in the training set in the step S2 as input, taking the corresponding level, sub-level and attribute relation of the text data as output, training the prediction model, checking the prediction model by using the test set, and predicting the text data in the residual original data set by using the prediction model obtained after training, so as to obtain the attribute relation among the corresponding level, sub-level and each level of each term in each text data in the residual original data set, namely realizing extraction of safety knowledge.
Further preferably, after step S3, the attribute relationships among the levels, sub-levels and each level corresponding to all the text data in the text data obtained by performing the security knowledge extraction in step S3 are imported into the graph database, so as to obtain the construction security knowledge graph.
Further preferably, in step S1, the hierarchy includes construction resources, construction activities, construction locations, risk factors, potential risks and precautions.
Further preferably, in step S1, the attribute relationships between the construction resources and the construction activities, the preventive measures and the risk factors are implemented, possessed and implemented, the attribute relationships between the construction activities and the construction sites, the risk factors and the preventive measures are located, possessed and required, the attribute relationships between the risk factors and the potential risks and the construction sites are located and caused, the attribute relationships between the potential risks and the preventive measures are required, and the attribute relationships between the preventive measures and the construction sites are located.
Further preferably, in step S1, the collected construction safety knowledge text data is obtained from "the people' S republic of China building law", "the regulations for construction engineering safety production management", "the unified standards for construction quality inspection of construction engineering", "the specifications for construction engineering high-place operation safety techniques", "the regulations for construction machine use safety inspection, the regulations for construction safety operation or the standards for subway engineering construction safety evaluation", and the production safety accident reports disclosed by the living building and the living building in each province.
Further preferably, in step S3, the prediction model is a modified cassel model obtained by combining a cassel model with a comparative learning idea model, i.e. a CL-cassel model.
Further preferably, the contrast learning concept includes a data enhancement module, a BERT encoder, a mapping header, and a contrast loss function.
Further preferably, the CL-cassel model comprises a data enhancement module and a BERT encoder, the data enhancement module is used for converting data into positive samples and negative samples, the output characteristics of the BERT encoder comprise two branches, namely BERT embedding and an aggregation sequence of safety knowledge texts, the BERT embedding is used for extracting safety knowledge, the aggregation sequence of safety knowledge texts is used for performing contrast learning to obtain contrast loss, and the contrast loss is used for correcting parameters of the BERT encoder.
Further preferably, the method for extracting the security knowledge by the CL-CASREL model comprises the following steps:
s31, generating a positive sample and a negative sample by using text data in the training set to form a positive sample set and a negative sample set;
s32, dividing the data in the positive and negative sample sets into two parts, respectively inputting the data of the two parts into a BERT encoder, and simultaneously outputting an aggregation sequence of the safety knowledge text by the encoder;
s33, respectively obtaining an aggregation sequence of the safety knowledge text by two parts of data, and obtaining a comparison loss by comparing and learning thought calculation;
s34, the obtained comparison loss is returned to the BERT encoder, and the parameters of the encoder are corrected, so that the encoder is updated continuously until the loss is stable.
In general, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:
1. the invention utilizes a safety knowledge framework, natural language processing and deep learning technology to realize automatic extraction of construction worker safety knowledge, constructs a safety knowledge map to support efficient organization, utilization, storage and update of construction safety knowledge, and provides a basis for self-adaptive learning of construction safety knowledge;
2. according to the invention, the CASREL model is improved to be a CL-CASREL model, the CL-CASREL model combines the CASREL model and a contrast learning idea, and the prediction precision is improved by a data enhancement method, so that the problems of insufficient data and insufficient model generalization capability in a training stage are solved;
3. the technical scheme of the invention can improve the capability of extracting the safety knowledge from various construction safety knowledge texts, establish a safety knowledge map, assist construction safety training and improve the safety knowledge level of construction workers.
Drawings
FIG. 1 is a flow chart of knowledge extraction and knowledge graph construction of the present invention constructed in accordance with a preferred embodiment of the present invention;
FIG. 2 is a flow chart of the secure knowledge framework construction of the present invention constructed in accordance with a preferred embodiment of the present invention;
FIG. 3 is a framework diagram of a CL-CASREL model constructed in accordance with a preferred embodiment of the present invention;
FIG. 4 is a training process of a CL-CASREL model constructed in accordance with a preferred embodiment of the present invention;
FIG. 5 is a framework diagram of a construction safety knowledge triplet extraction algorithm constructed in accordance with a preferred embodiment of the invention;
FIG. 6 is a schematic diagram of BERT construction safety knowledge text input constructed in accordance with a preferred embodiment of the present invention;
fig. 7 is a schematic flow chart of a process of an MLP mapping header constructed in accordance with a preferred embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The construction worker safety knowledge extraction method comprises the following two steps as shown in fig. 1 and 2:
s1, constructing a framework of construction safety knowledge, which comprises the following steps:
s1.1 data acquisition, more specifically:
the method comprises the steps of collecting publicly available construction safety standard specifications and production safety accident investigation reports as construction safety knowledge text data sources, wherein the construction safety standard specifications and production safety accident investigation reports comprise 'the people's republic of China 'building law', 'the regulations on construction engineering safety production management', 'the unified standards for construction quality inspection of building engineering', 'the technical specifications for construction high-place operation safety of building machinery', 'the technical rules for construction safety inspection, the construction safety inspection standards', 'the construction safety operation rules', 'the subway engineering construction safety evaluation standards', and the production safety accident reports disclosed by the living building and the living building bureau of each province.
S1.2, constructing a safety knowledge framework, namely constructing a worker safety knowledge body by adopting a Prot e and seven-step method, wherein the construction method comprises the following steps of:
s1.2.1 defines the field and the scope of the construction safety knowledge framework, defines the intention of the construction of the safety knowledge framework, and defines the professional field and the application scene.
More specifically, the intention of the construction of the safety knowledge framework is to serve the definition of the construction worker safety knowledge graph data mode, and the construction safety knowledge related concept terms and the relations among concepts are covered; the safety knowledge specifically covers the knowledge related to the construction worker's use to understand the potential safety risk of the construction activity, to grasp the safety measures that should be taken and implemented, and to clarify the consequences of violating the safety regulations.
S1.2.2 review the existing construction safety knowledge framework and investigate the possibility of reusing the existing construction safety knowledge framework;
s1.2.3 defines important terms in the construction safety knowledge framework;
more specifically, the method comprises two modes of top-down and bottom-up; the top-down refers to condensing the most common concepts by abstracting knowledge points, terms and concepts in the construction safety field. And the bottom-up method is used for clustering specific concepts with high-overlap attributes or relationships based on text data in the aspect of a large amount of existing construction safety knowledge, so that common concept expressions are abstracted.
Further, important terms in the construction safety knowledge framework are defined, and top-level concept classes of the construction safety knowledge framework are determined.
Still further, the top-level concept class includes construction resources, construction activities, construction sites, risk factors, potential risks, and precautions.
Further, the construction resource refers to various elements forming productivity in construction activities; the construction activity refers to construction engineering construction activity; the construction position refers to a space position where construction activities are carried out; the risk factor refers to a potential factor which can cause accidents; the potential risk refers to a risk event possibly caused by a risk factor; the precaution measures are measures which are needed to be taken for guaranteeing construction safety and controlling safety accident hazard.
S1.2.4 defining construction safety knowledge frame classes and hierarchical structures of the classes, and establishing relations among concepts;
more specifically, in defining a class and a hierarchy of classes of a construction safety knowledge framework, the classes of the safety knowledge framework represent a set of concepts having the same characteristics, and a hierarchy structure of the classes is defined by a relationship between the classes.
Further, in the Prot g security knowledge framework modeling tool, classes represent "Class", the hierarchy of classes is expressed in "Class hierarchy", and generic relationships are defined in "subs of", set "Owl: thing "as the father of all classes.
Still further, in setting "Owl: in the lying process, 6 top-level conceptual security knowledge frames of the construction resource, the construction activity, the construction position, the risk factor, the potential risk and the preventive measure are determined according to the step S1.2.3 and serve as parents of all the classes.
Further, the 6 top-level conceptual security knowledge frameworks all have corresponding subclasses. The construction resource class comprises construction workers, materials and mechanical equipment; the construction activities are composed of 'sub-sub project construction' and 'measure project construction', and specific contents can be listed by referring to specifications such as 'unified standards for construction quality inspection and acceptance of building engineering'; the construction position comprises two subclasses of engineering components which are constructed on an engineering main body and safety facilities which are constructed on temporary facilities and supporting structures; the "risk factor" covers three categories of "worker unsafe behavior", "unsafe state of the object" and "unsafe factor of the environment"; the potential risk mainly refers to five major casualties which occur most frequently in the construction process of a construction project, namely 'high falling', 'object striking', 'collapse', 'mechanical injury' and 'electric shock'; the "precautions" include "personal protection means", "restricted areas", "safety training", "inspection tour", "other safeguards", and can be further extended.
S1.2.5 defines the attributes of the construction safety knowledge class and the relationships between the attributes;
more specifically, the attributes defining the construction safety knowledge class are for supplementing features describing the class defined in the safety knowledge framework or adding constraints to the class, including object attributes and data attributes.
Further, an object attribute refers to a relationship between two classes or instances, and a data attribute refers to a characteristic of an aspect of a class or instance, and may be expressed by numeric data or string data.
Further, the object attributes include a generic relationship defined by "Subclass of", which uses "is_a" to establish a relationship between the child and parent classes, forming a hierarchy of the security knowledge framework class.
Still further, the object properties also include the following definitions given for the top-level concept class in step S1.2.3: the construction resource has an implementation relationship with the construction activity, the risk factor and the preventive measure; the construction activity, the risk factor, the preventive measure and the construction position have the positional relationship; the construction activity has a risk factor; implementing "construction activities" and "potential risks" requires corresponding risk "precautions"; the "risk factor" may lead to the occurrence of a "potential risk".
S1.2.6 defines attribute limitations;
more specifically, the S1.2.6 attribute constraints serve security knowledge framework reasoning by adding different constraint limits through data attributes and object attributes, such as constraints like quantitative descriptions, containment descriptions, and the like.
Construction of S1.2.7 Security knowledge framework instance
S2 is based on natural language processing and deep learning security knowledge extraction, and is described below with reference to fig. 3, 4, 5, and 6.
The invention provides a model framework integrating unsupervised learning and supervised learning, namely a model framework integrating contrast learning and an existing CASREL model, and named CL-CASREL as shown in figure 3. The model combines the existing CASREL model with the contrast learning thought, compensates the problems of insufficient data and insufficient generalization capability of the model in the training stage, and extracts safety knowledge.
Further, the extracted security knowledge refers to a knowledge triplet including an entity and a relationship between entities, taking "construction activity" having "risk factor" as an example: "high construction" has "risk of falling", two entities refer to "high construction" and "risk of falling", and the relationship refers to "having".
Further, the core idea of contrast learning is to generate positive and negative sample pairs through a data enhancement structure, and design contrast loss to perform unsupervised learning, so that positive sample pairs are close in projection space, and negative sample pairs are distant in projection space.
Further, the contrast learning core component consists of a data enhancement module, a BERT encoder f (·) and a mapping head g (·) and a contrast loss function. The CASREL model consists of a BERT encoder f (,), an entity marker and an entity marker for a particular relationship. The overall structure of the CASREL model-CL-CASREL, which incorporates the comparative learning concept, is shown in FIG. 3.
Further, as shown in fig. 4, the overall training process of the model adopts a step-by-step training mode, and the specific steps are as follows:
s1, performing unsupervised learning by using a training set and positive and negative sample pairs generated by the training set, and calculating a contrast loss function, so that BERT encoder parameters are adjusted in a data enhancement mode by the positive and negative samples, and the negative sample pairs and the positive sample pairs are far away from each other, and the positive sample pairs are close to each other.
S2, after the first step is completed, taking the training set to conduct supervised learning, comparing the knowledge triples extracted through the model with the marked triples, and calculating a loss function, so that parameters in the CL-CASREL are further adjusted. And the model is verified by the test set.
Still further, step 1 includes:
s1.1, dividing the collected partial original data into a training set and a testing set, and generating positive and negative sample pairs through the training set
S1.2, extracting text aggregation sequence expression h [ cls ] from positive and negative sample pairs and training sets (equivalent to positive sample pairs) generated by a BERT encoder
S1.3 mapping h [ cls ] to "space where contrast loss is applied" in a nonlinear mapping manner by using a mapping head g (& gt) to obtain z [ cls ]
S1.4 calculates the loss using a contrast loss function. And (2) when the loss is stable, continuing to adjust parameters in the BERT encoder in the CL-CASREL model.
The step 2 comprises the following steps:
s2.1, the data in the training set is processed by a BERT encoder in a CL-CASREL model to obtain the BERT embedding of the text
S2.2 extracting the knowledge triples by using an entity relation extracting module consisting of an entity marker and an entity marker of a specific relation
S2.3, calculating the loss between the extracted knowledge triples and the marked knowledge triples through the triplet extraction loss function, so that parameters of other modules except the mapping head g (-) in the CL-CASREL model are adjusted until the loss is stable.
The positive and negative sample pair generation methods, BERT encoder, mapping header g (·), entity marker of specific relationship, and loss function mentioned in step 1 and step 2 will be described below.
Further, in the training phase, the output characteristics of the BERT encoder of the CL-cassel model are used as two branch tasks, the aggregate sequence of the security knowledge text represents h [ CLs ] for comparative loss measurement, and the BERT embedding is then input into the entity marker module of the cassel model for knowledge triplet extraction. The contrast loss training effect can influence the extraction effect of the knowledge triples by influencing the hyper-parameters of the BERT encoder.
Further, the data enhancement module is designed to design positive and negative pairs of samples for comparison loss measurement for unsupervised learning.
Further: positive sample pairs are generated in three ways: (1) Machine translation, translating construction safety knowledge text into English and then into Chinese; (2) Exchanging, namely randomly exchanging the positions of two words in the construction safety knowledge text; and (3) cutting off, and randomly changing a certain proportion of words into [ MASK ].
The negative sample pairs randomly replace entities marked in the safety knowledge text with entities of different categories in other sentences with a certain probability to generate new sentences.
Further, the BERT encoder encodes the context information of the text by using a pre-trained BERT model proposed by Devlin, and the core structure of the BERT encoder is composed of N identical stacks of transducers encoders. From sentence X j Extracting characteristic information x j For input into a subsequent entity marker and entity markers of a particular relationship. As shown in fig. 5, BERT consists of three parts for a given text input identification: (1) mark embedding; (2) segmentation embedding; (3) position embedding.
Further, as shown in fig. 6, a [ CLS ] tag is embedded at the beginning of each sentence as a start tag, the [ CLS ] is processed by the BERT to obtain an aggregate sequence of the security knowledge text, which represents h [ CLS ], as shown in fig. 5, in the first step of model training, the h [ CLS ] obtained by the BERT encoder process is output as a part of contrast learning, and the BERT embedded obtained in the second step of model extracts the knowledge triples through the entity markers and the entity markers of the specific relationship.
Further, the mapping header g (·) maps the aggregate sequence representation h [ cls ] of the security knowledge text resulting from data enhancement to the space where contrast loss is applied. The projection representation is obtained by mapping h [ cls ] through nonlinear mapping, and as shown in FIG. 7, the mapping operation is realized by adopting a two-layer multi-layer perceptron, and z [ cls ] is obtained by h [ cls ] through two fully connected layers, batch standard processing and a ReLU activation function.
Further, the entity labeler aims to identify all possible entities in the input security knowledge text by decoding the encoded vectors generated by the BERT encoder.
Further, as shown in the entity marker section of fig. 5, the entity marker adopts two identical binary classifiers to indicate whether the current marker corresponds to the start position or the end position of the entity by assigning a binary tag (0/1) to each marker, i.e. the sentence portion corresponding to the two 1 s is an entity. The entity annotators operate in detail for each tag as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,and->The probabilities of identifying the i-th marker in the input sequence as the start and end positions of the entity are represented, respectively. If the probability exceeds a certain threshold, the corresponding flag will be assigned as tag 1, otherwise tag 0.X is x i Is the coded representation of the ith marker in the input sequence, i.e. x i =h N [i]。W start And W is end Is a trainable weight, b start And b end For bias, σ represents the sigmoid function.
Still further, the entity labeler optimizes the following likelihood function to identify the span of entities s for a given input text representation x:
where L represents the length of the text. If it isTrue, then->Otherwise, 0. If it isTrue, then->Otherwise, 0./>Is a binary tag of the start position of the entity of the i-th tag in x,/and>indicating the end position of the entity character. Parameters (parameters)
θ={W srarr ,W end ,b srart ,b end }
Further, the entity marker of the specific relationship simultaneously identifies the relationship related to the entity obtained by the entity marker and the entity corresponding to the relationship. For example: as shown in fig. 5, in this relationship of "implementation," the entity "unbelted" starts with 1 and ends with 1. In the "cause" relationship, the entity "fall" starts with 1 and ends with 1.
Further, it operates as follows for each marking:
wherein, the liquid crystal display device comprises a liquid crystal display device,and->Representing the probability of identifying the i-th tag in the input sequence as the start and end position of the entity, respectively,/->The coded representation vector representing the kth entity detected by the entity marker. />And->Is a trainable weight, ++>And->For bias, σ represents the sigmoid function.
Still further, the entity labeler for a particular relationship r optimizes the following likelihood function to identify the span of entity o for which a given sentence representation x and entity s have a relationship r:
wherein, the liquid crystal display device comprises a liquid crystal display device,is a binary tag of the start position of the entity of the i-th tag in x,/and>indicating the end position of the entity character. For empty objects-> Parameters (parameters)
Further, the loss function designs a contrast loss function and a triplet extraction function. In the actual training process, a step training mode is adopted, firstly, the generated positive and negative samples and 80% of original data are utilized to carry out unsupervised learning, and then 80% of the original data are utilized to carry out supervised learning.
Further, the contrast loss function aims at training the CL-cassel model so that the positive sample pair is closer in the representation space and the negative sample pair is farther, thereby adjusting the hyper-parameters of the BERT encoder, acting on the extraction of the security knowledge triples.
Wherein, the liquid crystal display device comprises a liquid crystal display device,is an indication function, if k is not equal to i, the result is 1, z i And z j For projection representation, τ represents the temperature coefficient.
Still further, the triplet extraction loss function is used for relation triplet extraction to identify all possible construction safety knowledge triples (entity s, relation r, entity o). The model was trained by Adam random gradient descent to maximize the objective function J (Θ) over a small batch of upsets.
Further, annotated text x of a given training set D j A set of possibly overlapping triples T j = { (s, r, o) } with the objective being to maximize the data likelihood objective of training set D, J (Θ) being expressed as
Wherein s is E T j Representation appears at T j Entity s in the triplet. T (T) j S is the triplet set that is led by the entity s in. (r, o) ε T j S is T j In a triplet guided by entity s(r, o) pairs of (d). R is the set of all possible relationships. R\T j S represents all relationships except those that are led by s. o (o) φ Representing an "empty" entity. Parameters (parameters)p θ (s|x) and p θr (o|s, x) has been defined in the foregoing formula
The construction safety knowledge graph is constructed according to the following method:
and importing the safety knowledge entity extracted from the text data and the relation thereof into a Neo4j graph database, and storing and generating to obtain a construction safety knowledge graph.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (9)
1. The construction worker safety knowledge extraction and knowledge graph construction method is characterized by comprising the following steps of:
s1, acquiring construction safety knowledge text data to form an original data set, constructing a safety knowledge framework, and defining a hierarchy included in the framework, sub-hierarchies corresponding to each hierarchy and attribute relations among the hierarchies;
s2, dividing partial data in the original data set into a training set and a testing set, classifying all entries in each piece of text data in the training set and the testing set according to the levels and sub-levels defined in the security knowledge framework, and determining attribute relations among all levels so as to obtain the corresponding level, sub-level and attribute relations among the levels of each piece of text data;
s3, constructing a prediction model, taking the text data in the training set in the step S2 as input, taking the corresponding level, sub-level and attribute relation of the text data as output, training the prediction model, checking the prediction model by using the test set, and predicting the text data in the residual original data set by using the prediction model obtained after training, so as to obtain the attribute relation among the corresponding level, sub-level and each level of each term in each text data in the residual original data set, namely realizing extraction of safety knowledge.
2. A construction worker safety knowledge extraction and knowledge graph construction method as claimed in claim 1, wherein in step S3, the prediction model is a modified cassel model obtained by combining a cassel model with a comparative learning idea box, i.e., a CL-cassel model.
3. The construction worker safety knowledge extraction and knowledge graph construction method according to claim 2, wherein the comparison learning concept comprises a data enhancement module, a BERT encoder, a mapping head and a comparison loss function.
4. A construction worker safety knowledge extraction and knowledge graph construction method as claimed in claim 3, wherein the CL-cam el model comprises a data enhancement module and a BERT encoder, the data enhancement module is used for converting data into positive and negative samples, the output characteristics of the BERT encoder comprise two branches, respectively BERT embedding and an aggregation sequence of safety knowledge texts, the BERT embedding is used for extraction of safety knowledge, the aggregation sequence of safety knowledge texts is used for performing contrast learning to obtain contrast loss, and the contrast loss is used for correcting parameters of the BERT encoder.
5. The construction worker safety knowledge extraction and knowledge graph construction method according to claim 4, wherein the CL-cassel model safety knowledge extraction method comprises the following steps:
s31, generating a positive sample and a negative sample by using text data in the training set to form a positive sample set and a negative sample set;
s32, dividing the data in the positive and negative sample sets into two parts, respectively inputting the data of the two parts into a BERT encoder, and simultaneously outputting an aggregation sequence of the safety knowledge text by the encoder;
s33, respectively obtaining an aggregation sequence of the safety knowledge text by two parts of data, and obtaining a comparison loss by comparing and learning thought calculation;
s34, the obtained comparison loss is returned to the BERT encoder, and the parameters of the encoder are corrected, so that the encoder is updated continuously until the loss is stable.
6. The construction worker safety knowledge extraction and knowledge graph construction method according to claim 1, wherein after step S3, the construction worker safety knowledge graph is obtained by performing safety knowledge extraction in step S3 to obtain all the text data corresponding levels, sub-levels and attribute relationships between the levels in the text data, and importing the attribute relationships into the graph database.
7. A construction worker safety knowledge extraction and knowledge graph construction method as claimed in claim 6, wherein in step S1, the hierarchy includes construction resources, construction activities, construction locations, risk factors, potential risks and precautions.
8. The method for extracting safety knowledge and constructing a knowledge graph of construction workers according to claim 7, wherein in the step S1, the construction resource is implemented, provided with and implemented with the construction activity, the preventive measure and the risk factor, the construction activity is located, provided with and required with the construction position, the risk factor and the preventive measure, the risk factor is located and caused with the potential risk and the construction position, the potential risk is required with the preventive measure, and the preventive measure is located with the construction position.
9. The method for extracting safety knowledge and constructing a knowledge map of construction workers according to claim 7, wherein in the step S1, the collected construction safety knowledge text data is obtained from "the people' S republic of China" building law, "regulations for construction engineering safety production management," unified standards for construction quality inspection of construction engineering, "technical specifications for construction work at high altitudes," regulations for use safety of construction machines, "standards for construction safety inspection," regulations for construction safety operation, or "standards for subway engineering safety evaluation," and reports of production safety accidents disclosed by the living building and the living building bureau of each province.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310175037.7A CN116069951A (en) | 2023-02-28 | 2023-02-28 | Construction worker safety knowledge extraction and knowledge graph construction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310175037.7A CN116069951A (en) | 2023-02-28 | 2023-02-28 | Construction worker safety knowledge extraction and knowledge graph construction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116069951A true CN116069951A (en) | 2023-05-05 |
Family
ID=86176796
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310175037.7A Pending CN116069951A (en) | 2023-02-28 | 2023-02-28 | Construction worker safety knowledge extraction and knowledge graph construction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116069951A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117473431A (en) * | 2023-12-22 | 2024-01-30 | 青岛民航凯亚系统集成有限公司 | Airport data classification and classification method and system based on knowledge graph |
-
2023
- 2023-02-28 CN CN202310175037.7A patent/CN116069951A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117473431A (en) * | 2023-12-22 | 2024-01-30 | 青岛民航凯亚系统集成有限公司 | Airport data classification and classification method and system based on knowledge graph |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111026842B (en) | Natural language processing method, natural language processing device and intelligent question-answering system | |
CN113392986B (en) | Highway bridge information extraction method based on big data and management maintenance system | |
CN109871955B (en) | Aviation safety accident causal relation extraction method | |
CN106815293A (en) | System and method for constructing knowledge graph for information analysis | |
CN108447534A (en) | A kind of electronic health record data quality management method based on NLP | |
CN113468888A (en) | Entity relation joint extraction method and device based on neural network | |
CN112434535B (en) | Element extraction method, device, equipment and storage medium based on multiple models | |
CN112559734B (en) | Brief report generating method, brief report generating device, electronic equipment and computer readable storage medium | |
CN114168745A (en) | Knowledge graph construction method for production process of ethylene oxide derivative | |
CN113791757B (en) | Software requirement and code mapping method and system | |
CN113191148A (en) | Rail transit entity identification method based on semi-supervised learning and clustering | |
CN116205482A (en) | Important personnel risk level assessment method and related equipment | |
KR102291193B1 (en) | Automatic device and method for recommending risk and safety information based on artificial intelligence | |
CN111666373A (en) | Chinese news classification method based on Transformer | |
CN116069951A (en) | Construction worker safety knowledge extraction and knowledge graph construction method | |
CN115809833A (en) | Intelligent monitoring method and device for capital construction project based on portrait technology | |
CN112257425A (en) | Power data analysis method and system based on data classification model | |
CN116502646A (en) | Semantic drift detection method and device, electronic equipment and storage medium | |
CN115017879A (en) | Text comparison method, computer device and computer storage medium | |
CN114911893A (en) | Method and system for automatically constructing knowledge base based on knowledge graph | |
CN117151222B (en) | Domain knowledge guided emergency case entity attribute and relation extraction method thereof, electronic equipment and storage medium | |
CN117077631A (en) | Knowledge graph-based engineering emergency plan generation method | |
CN117390198A (en) | Method, device, equipment and medium for constructing scientific and technological knowledge graph in electric power field | |
CN116523042A (en) | Combined extraction method and system for power grid dispatching entity relationship | |
CN116226371A (en) | Digital economic patent classification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |