CN116069951A - Construction worker safety knowledge extraction and knowledge graph construction method - Google Patents

Construction worker safety knowledge extraction and knowledge graph construction method Download PDF

Info

Publication number
CN116069951A
CN116069951A CN202310175037.7A CN202310175037A CN116069951A CN 116069951 A CN116069951 A CN 116069951A CN 202310175037 A CN202310175037 A CN 202310175037A CN 116069951 A CN116069951 A CN 116069951A
Authority
CN
China
Prior art keywords
construction
knowledge
safety
safety knowledge
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310175037.7A
Other languages
Chinese (zh)
Inventor
刘佳静
詹健江
骆汉宾
方伟立
陈珂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202310175037.7A priority Critical patent/CN116069951A/en
Publication of CN116069951A publication Critical patent/CN116069951A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Human Resources & Organizations (AREA)
  • Databases & Information Systems (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Educational Administration (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field related to construction safety management, and discloses a construction worker safety knowledge extraction and knowledge graph construction method. The method comprises the following steps: s1, acquiring construction safety knowledge text data, constructing a safety knowledge framework, and defining a hierarchy level contained in the framework, a sub-hierarchy level corresponding to each hierarchy level and attribute relations among the hierarchy levels; s2, dividing the collected partial data into a training set and a testing set, and classifying all entries in each piece of text data according to a hierarchy and a sub-hierarchy defined in the safety knowledge framework; s3, constructing a prediction model, and predicting text data in the residual acquired data to obtain attribute relations among the levels, sub-levels and levels corresponding to the entries in each piece of text data, namely, extracting safety knowledge. Finally, the storage and retrieval of the extracted security knowledge are realized based on the graph database. The invention solves the problem of automatically acquiring the relationship between entities in the extraction of the safety knowledge.

Description

Construction worker safety knowledge extraction and knowledge graph construction method
Technical Field
The invention belongs to the technical field related to construction safety management, and particularly relates to a construction worker safety knowledge extraction and knowledge graph construction method.
Background
Various engineering document materials contain abundant construction safety knowledge, such as construction safety standard specifications, accident investigation reports and the like, but the accumulated safety knowledge is not effectively managed and organized, and is not beneficial to supporting efficient personalized safety knowledge training. In order to orderly organize construction safety knowledge resources and serve the integrated application of safety knowledge, a safety knowledge extraction and knowledge graph construction method oriented to safety knowledge training of construction workers is necessary to be constructed. At present, in the field of construction safety of construction engineering, the semantic expression of safety risk knowledge is promoted mainly by constructing a construction safety field ontology model so as to support applications such as risk identification and the like. However, the existing construction safety ontology model lacks definition of safety knowledge and core concepts and relationships between concepts facing the construction worker's safety knowledge learning. For instantiation of the ontology model, a researcher mainly adopts a manual filling mode initially, and the model construction process is time-consuming, labor-consuming and difficult to update. The text knowledge extraction is performed by manually defining rules, so that requirements on application scenes such as sentence patterns are high, and the applicability and robustness of the rules are very limited. In addition, the existing method for extracting entity objects based on natural language processing and deep learning cannot realize synchronous automatic extraction of the relationships among the entities.
Therefore, the following problems still need to be solved in the construction worker safety knowledge extraction and knowledge graph construction process: (1) Constructing a safety knowledge element system oriented to construction worker safety knowledge learning. (2) A method for automatically and synchronously extracting the relation between the entities of the safety knowledge concept in the text is constructed.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the invention provides a construction worker safety knowledge extraction and knowledge graph construction method, which solves the problem of automatic acquisition of the relationship between entities in the safety knowledge extraction.
In order to achieve the above object, according to the present invention, there is provided a construction worker safety knowledge extraction and knowledge graph construction method, comprising the steps of:
the data form an original data set, a security knowledge framework is constructed, the levels contained in the framework, the sub-level corresponding to each level and the attribute relationship among the levels are defined;
s2, dividing partial data in the original data set into a training set and a testing set, classifying all entries in each piece of text data in the training set and the testing set according to the levels and sub-levels defined in the security knowledge framework, and determining attribute relations among all levels so as to obtain the corresponding level, sub-level and attribute relations among the levels of each piece of text data;
s3, constructing a prediction model, taking the text data in the training set in the step S2 as input, taking the corresponding level, sub-level and attribute relation of the text data as output, training the prediction model, checking the prediction model by using the test set, and predicting the text data in the residual original data set by using the prediction model obtained after training, so as to obtain the attribute relation among the corresponding level, sub-level and each level of each term in each text data in the residual original data set, namely realizing extraction of safety knowledge.
Further preferably, after step S3, the attribute relationships among the levels, sub-levels and each level corresponding to all the text data in the text data obtained by performing the security knowledge extraction in step S3 are imported into the graph database, so as to obtain the construction security knowledge graph.
Further preferably, in step S1, the hierarchy includes construction resources, construction activities, construction locations, risk factors, potential risks and precautions.
Further preferably, in step S1, the attribute relationships between the construction resources and the construction activities, the preventive measures and the risk factors are implemented, possessed and implemented, the attribute relationships between the construction activities and the construction sites, the risk factors and the preventive measures are located, possessed and required, the attribute relationships between the risk factors and the potential risks and the construction sites are located and caused, the attribute relationships between the potential risks and the preventive measures are required, and the attribute relationships between the preventive measures and the construction sites are located.
Further preferably, in step S1, the collected construction safety knowledge text data is obtained from "the people' S republic of China building law", "the regulations for construction engineering safety production management", "the unified standards for construction quality inspection of construction engineering", "the specifications for construction engineering high-place operation safety techniques", "the regulations for construction machine use safety inspection, the regulations for construction safety operation or the standards for subway engineering construction safety evaluation", and the production safety accident reports disclosed by the living building and the living building in each province.
Further preferably, in step S3, the prediction model is a modified cassel model obtained by combining a cassel model with a comparative learning idea model, i.e. a CL-cassel model.
Further preferably, the contrast learning concept includes a data enhancement module, a BERT encoder, a mapping header, and a contrast loss function.
Further preferably, the CL-cassel model comprises a data enhancement module and a BERT encoder, the data enhancement module is used for converting data into positive samples and negative samples, the output characteristics of the BERT encoder comprise two branches, namely BERT embedding and an aggregation sequence of safety knowledge texts, the BERT embedding is used for extracting safety knowledge, the aggregation sequence of safety knowledge texts is used for performing contrast learning to obtain contrast loss, and the contrast loss is used for correcting parameters of the BERT encoder.
Further preferably, the method for extracting the security knowledge by the CL-CASREL model comprises the following steps:
s31, generating a positive sample and a negative sample by using text data in the training set to form a positive sample set and a negative sample set;
s32, dividing the data in the positive and negative sample sets into two parts, respectively inputting the data of the two parts into a BERT encoder, and simultaneously outputting an aggregation sequence of the safety knowledge text by the encoder;
s33, respectively obtaining an aggregation sequence of the safety knowledge text by two parts of data, and obtaining a comparison loss by comparing and learning thought calculation;
s34, the obtained comparison loss is returned to the BERT encoder, and the parameters of the encoder are corrected, so that the encoder is updated continuously until the loss is stable.
In general, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:
1. the invention utilizes a safety knowledge framework, natural language processing and deep learning technology to realize automatic extraction of construction worker safety knowledge, constructs a safety knowledge map to support efficient organization, utilization, storage and update of construction safety knowledge, and provides a basis for self-adaptive learning of construction safety knowledge;
2. according to the invention, the CASREL model is improved to be a CL-CASREL model, the CL-CASREL model combines the CASREL model and a contrast learning idea, and the prediction precision is improved by a data enhancement method, so that the problems of insufficient data and insufficient model generalization capability in a training stage are solved;
3. the technical scheme of the invention can improve the capability of extracting the safety knowledge from various construction safety knowledge texts, establish a safety knowledge map, assist construction safety training and improve the safety knowledge level of construction workers.
Drawings
FIG. 1 is a flow chart of knowledge extraction and knowledge graph construction of the present invention constructed in accordance with a preferred embodiment of the present invention;
FIG. 2 is a flow chart of the secure knowledge framework construction of the present invention constructed in accordance with a preferred embodiment of the present invention;
FIG. 3 is a framework diagram of a CL-CASREL model constructed in accordance with a preferred embodiment of the present invention;
FIG. 4 is a training process of a CL-CASREL model constructed in accordance with a preferred embodiment of the present invention;
FIG. 5 is a framework diagram of a construction safety knowledge triplet extraction algorithm constructed in accordance with a preferred embodiment of the invention;
FIG. 6 is a schematic diagram of BERT construction safety knowledge text input constructed in accordance with a preferred embodiment of the present invention;
fig. 7 is a schematic flow chart of a process of an MLP mapping header constructed in accordance with a preferred embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The construction worker safety knowledge extraction method comprises the following two steps as shown in fig. 1 and 2:
s1, constructing a framework of construction safety knowledge, which comprises the following steps:
s1.1 data acquisition, more specifically:
the method comprises the steps of collecting publicly available construction safety standard specifications and production safety accident investigation reports as construction safety knowledge text data sources, wherein the construction safety standard specifications and production safety accident investigation reports comprise 'the people's republic of China 'building law', 'the regulations on construction engineering safety production management', 'the unified standards for construction quality inspection of building engineering', 'the technical specifications for construction high-place operation safety of building machinery', 'the technical rules for construction safety inspection, the construction safety inspection standards', 'the construction safety operation rules', 'the subway engineering construction safety evaluation standards', and the production safety accident reports disclosed by the living building and the living building bureau of each province.
S1.2, constructing a safety knowledge framework, namely constructing a worker safety knowledge body by adopting a Prot e and seven-step method, wherein the construction method comprises the following steps of:
s1.2.1 defines the field and the scope of the construction safety knowledge framework, defines the intention of the construction of the safety knowledge framework, and defines the professional field and the application scene.
More specifically, the intention of the construction of the safety knowledge framework is to serve the definition of the construction worker safety knowledge graph data mode, and the construction safety knowledge related concept terms and the relations among concepts are covered; the safety knowledge specifically covers the knowledge related to the construction worker's use to understand the potential safety risk of the construction activity, to grasp the safety measures that should be taken and implemented, and to clarify the consequences of violating the safety regulations.
S1.2.2 review the existing construction safety knowledge framework and investigate the possibility of reusing the existing construction safety knowledge framework;
s1.2.3 defines important terms in the construction safety knowledge framework;
more specifically, the method comprises two modes of top-down and bottom-up; the top-down refers to condensing the most common concepts by abstracting knowledge points, terms and concepts in the construction safety field. And the bottom-up method is used for clustering specific concepts with high-overlap attributes or relationships based on text data in the aspect of a large amount of existing construction safety knowledge, so that common concept expressions are abstracted.
Further, important terms in the construction safety knowledge framework are defined, and top-level concept classes of the construction safety knowledge framework are determined.
Still further, the top-level concept class includes construction resources, construction activities, construction sites, risk factors, potential risks, and precautions.
Further, the construction resource refers to various elements forming productivity in construction activities; the construction activity refers to construction engineering construction activity; the construction position refers to a space position where construction activities are carried out; the risk factor refers to a potential factor which can cause accidents; the potential risk refers to a risk event possibly caused by a risk factor; the precaution measures are measures which are needed to be taken for guaranteeing construction safety and controlling safety accident hazard.
S1.2.4 defining construction safety knowledge frame classes and hierarchical structures of the classes, and establishing relations among concepts;
more specifically, in defining a class and a hierarchy of classes of a construction safety knowledge framework, the classes of the safety knowledge framework represent a set of concepts having the same characteristics, and a hierarchy structure of the classes is defined by a relationship between the classes.
Further, in the Prot g security knowledge framework modeling tool, classes represent "Class", the hierarchy of classes is expressed in "Class hierarchy", and generic relationships are defined in "subs of", set "Owl: thing "as the father of all classes.
Still further, in setting "Owl: in the lying process, 6 top-level conceptual security knowledge frames of the construction resource, the construction activity, the construction position, the risk factor, the potential risk and the preventive measure are determined according to the step S1.2.3 and serve as parents of all the classes.
Further, the 6 top-level conceptual security knowledge frameworks all have corresponding subclasses. The construction resource class comprises construction workers, materials and mechanical equipment; the construction activities are composed of 'sub-sub project construction' and 'measure project construction', and specific contents can be listed by referring to specifications such as 'unified standards for construction quality inspection and acceptance of building engineering'; the construction position comprises two subclasses of engineering components which are constructed on an engineering main body and safety facilities which are constructed on temporary facilities and supporting structures; the "risk factor" covers three categories of "worker unsafe behavior", "unsafe state of the object" and "unsafe factor of the environment"; the potential risk mainly refers to five major casualties which occur most frequently in the construction process of a construction project, namely 'high falling', 'object striking', 'collapse', 'mechanical injury' and 'electric shock'; the "precautions" include "personal protection means", "restricted areas", "safety training", "inspection tour", "other safeguards", and can be further extended.
S1.2.5 defines the attributes of the construction safety knowledge class and the relationships between the attributes;
more specifically, the attributes defining the construction safety knowledge class are for supplementing features describing the class defined in the safety knowledge framework or adding constraints to the class, including object attributes and data attributes.
Further, an object attribute refers to a relationship between two classes or instances, and a data attribute refers to a characteristic of an aspect of a class or instance, and may be expressed by numeric data or string data.
Further, the object attributes include a generic relationship defined by "Subclass of", which uses "is_a" to establish a relationship between the child and parent classes, forming a hierarchy of the security knowledge framework class.
Still further, the object properties also include the following definitions given for the top-level concept class in step S1.2.3: the construction resource has an implementation relationship with the construction activity, the risk factor and the preventive measure; the construction activity, the risk factor, the preventive measure and the construction position have the positional relationship; the construction activity has a risk factor; implementing "construction activities" and "potential risks" requires corresponding risk "precautions"; the "risk factor" may lead to the occurrence of a "potential risk".
S1.2.6 defines attribute limitations;
more specifically, the S1.2.6 attribute constraints serve security knowledge framework reasoning by adding different constraint limits through data attributes and object attributes, such as constraints like quantitative descriptions, containment descriptions, and the like.
Construction of S1.2.7 Security knowledge framework instance
S2 is based on natural language processing and deep learning security knowledge extraction, and is described below with reference to fig. 3, 4, 5, and 6.
The invention provides a model framework integrating unsupervised learning and supervised learning, namely a model framework integrating contrast learning and an existing CASREL model, and named CL-CASREL as shown in figure 3. The model combines the existing CASREL model with the contrast learning thought, compensates the problems of insufficient data and insufficient generalization capability of the model in the training stage, and extracts safety knowledge.
Further, the extracted security knowledge refers to a knowledge triplet including an entity and a relationship between entities, taking "construction activity" having "risk factor" as an example: "high construction" has "risk of falling", two entities refer to "high construction" and "risk of falling", and the relationship refers to "having".
Further, the core idea of contrast learning is to generate positive and negative sample pairs through a data enhancement structure, and design contrast loss to perform unsupervised learning, so that positive sample pairs are close in projection space, and negative sample pairs are distant in projection space.
Further, the contrast learning core component consists of a data enhancement module, a BERT encoder f (·) and a mapping head g (·) and a contrast loss function. The CASREL model consists of a BERT encoder f (,), an entity marker and an entity marker for a particular relationship. The overall structure of the CASREL model-CL-CASREL, which incorporates the comparative learning concept, is shown in FIG. 3.
Further, as shown in fig. 4, the overall training process of the model adopts a step-by-step training mode, and the specific steps are as follows:
s1, performing unsupervised learning by using a training set and positive and negative sample pairs generated by the training set, and calculating a contrast loss function, so that BERT encoder parameters are adjusted in a data enhancement mode by the positive and negative samples, and the negative sample pairs and the positive sample pairs are far away from each other, and the positive sample pairs are close to each other.
S2, after the first step is completed, taking the training set to conduct supervised learning, comparing the knowledge triples extracted through the model with the marked triples, and calculating a loss function, so that parameters in the CL-CASREL are further adjusted. And the model is verified by the test set.
Still further, step 1 includes:
s1.1, dividing the collected partial original data into a training set and a testing set, and generating positive and negative sample pairs through the training set
S1.2, extracting text aggregation sequence expression h [ cls ] from positive and negative sample pairs and training sets (equivalent to positive sample pairs) generated by a BERT encoder
S1.3 mapping h [ cls ] to "space where contrast loss is applied" in a nonlinear mapping manner by using a mapping head g (& gt) to obtain z [ cls ]
S1.4 calculates the loss using a contrast loss function. And (2) when the loss is stable, continuing to adjust parameters in the BERT encoder in the CL-CASREL model.
The step 2 comprises the following steps:
s2.1, the data in the training set is processed by a BERT encoder in a CL-CASREL model to obtain the BERT embedding of the text
S2.2 extracting the knowledge triples by using an entity relation extracting module consisting of an entity marker and an entity marker of a specific relation
S2.3, calculating the loss between the extracted knowledge triples and the marked knowledge triples through the triplet extraction loss function, so that parameters of other modules except the mapping head g (-) in the CL-CASREL model are adjusted until the loss is stable.
The positive and negative sample pair generation methods, BERT encoder, mapping header g (·), entity marker of specific relationship, and loss function mentioned in step 1 and step 2 will be described below.
Further, in the training phase, the output characteristics of the BERT encoder of the CL-cassel model are used as two branch tasks, the aggregate sequence of the security knowledge text represents h [ CLs ] for comparative loss measurement, and the BERT embedding is then input into the entity marker module of the cassel model for knowledge triplet extraction. The contrast loss training effect can influence the extraction effect of the knowledge triples by influencing the hyper-parameters of the BERT encoder.
Further, the data enhancement module is designed to design positive and negative pairs of samples for comparison loss measurement for unsupervised learning.
Further: positive sample pairs are generated in three ways: (1) Machine translation, translating construction safety knowledge text into English and then into Chinese; (2) Exchanging, namely randomly exchanging the positions of two words in the construction safety knowledge text; and (3) cutting off, and randomly changing a certain proportion of words into [ MASK ].
The negative sample pairs randomly replace entities marked in the safety knowledge text with entities of different categories in other sentences with a certain probability to generate new sentences.
Further, the BERT encoder encodes the context information of the text by using a pre-trained BERT model proposed by Devlin, and the core structure of the BERT encoder is composed of N identical stacks of transducers encoders. From sentence X j Extracting characteristic information x j For input into a subsequent entity marker and entity markers of a particular relationship. As shown in fig. 5, BERT consists of three parts for a given text input identification: (1) mark embedding; (2) segmentation embedding; (3) position embedding.
Further, as shown in fig. 6, a [ CLS ] tag is embedded at the beginning of each sentence as a start tag, the [ CLS ] is processed by the BERT to obtain an aggregate sequence of the security knowledge text, which represents h [ CLS ], as shown in fig. 5, in the first step of model training, the h [ CLS ] obtained by the BERT encoder process is output as a part of contrast learning, and the BERT embedded obtained in the second step of model extracts the knowledge triples through the entity markers and the entity markers of the specific relationship.
Further, the mapping header g (·) maps the aggregate sequence representation h [ cls ] of the security knowledge text resulting from data enhancement to the space where contrast loss is applied. The projection representation is obtained by mapping h [ cls ] through nonlinear mapping, and as shown in FIG. 7, the mapping operation is realized by adopting a two-layer multi-layer perceptron, and z [ cls ] is obtained by h [ cls ] through two fully connected layers, batch standard processing and a ReLU activation function.
Further, the entity labeler aims to identify all possible entities in the input security knowledge text by decoding the encoded vectors generated by the BERT encoder.
Further, as shown in the entity marker section of fig. 5, the entity marker adopts two identical binary classifiers to indicate whether the current marker corresponds to the start position or the end position of the entity by assigning a binary tag (0/1) to each marker, i.e. the sentence portion corresponding to the two 1 s is an entity. The entity annotators operate in detail for each tag as follows:
Figure BDA0004100538670000101
Figure BDA0004100538670000102
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004100538670000111
and->
Figure BDA0004100538670000112
The probabilities of identifying the i-th marker in the input sequence as the start and end positions of the entity are represented, respectively. If the probability exceeds a certain threshold, the corresponding flag will be assigned as tag 1, otherwise tag 0.X is x i Is the coded representation of the ith marker in the input sequence, i.e. x i =h N [i]。W start And W is end Is a trainable weight, b start And b end For bias, σ represents the sigmoid function.
Still further, the entity labeler optimizes the following likelihood function to identify the span of entities s for a given input text representation x:
Figure BDA0004100538670000113
where L represents the length of the text. If it is
Figure BDA0004100538670000114
True, then->
Figure BDA0004100538670000115
Otherwise, 0. If it is
Figure BDA0004100538670000116
True, then->
Figure BDA0004100538670000117
Otherwise, 0./>
Figure BDA0004100538670000118
Is a binary tag of the start position of the entity of the i-th tag in x,/and>
Figure BDA0004100538670000119
indicating the end position of the entity character. Parameters (parameters)
θ={W srarr ,W end ,b srart ,b end }
Further, the entity marker of the specific relationship simultaneously identifies the relationship related to the entity obtained by the entity marker and the entity corresponding to the relationship. For example: as shown in fig. 5, in this relationship of "implementation," the entity "unbelted" starts with 1 and ends with 1. In the "cause" relationship, the entity "fall" starts with 1 and ends with 1.
Further, it operates as follows for each marking:
Figure BDA00041005386700001110
Figure BDA00041005386700001111
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA00041005386700001112
and->
Figure BDA00041005386700001113
Representing the probability of identifying the i-th tag in the input sequence as the start and end position of the entity, respectively,/->
Figure BDA00041005386700001114
The coded representation vector representing the kth entity detected by the entity marker. />
Figure BDA00041005386700001115
And->
Figure BDA00041005386700001116
Is a trainable weight, ++>
Figure BDA00041005386700001117
And->
Figure BDA00041005386700001118
For bias, σ represents the sigmoid function.
Still further, the entity labeler for a particular relationship r optimizes the following likelihood function to identify the span of entity o for which a given sentence representation x and entity s have a relationship r:
Figure BDA00041005386700001119
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004100538670000121
is a binary tag of the start position of the entity of the i-th tag in x,/and>
Figure BDA0004100538670000122
indicating the end position of the entity character. For empty objects->
Figure BDA0004100538670000123
Figure BDA0004100538670000124
Parameters (parameters)
Figure BDA0004100538670000125
Further, the loss function designs a contrast loss function and a triplet extraction function. In the actual training process, a step training mode is adopted, firstly, the generated positive and negative samples and 80% of original data are utilized to carry out unsupervised learning, and then 80% of the original data are utilized to carry out supervised learning.
Further, the contrast loss function aims at training the CL-cassel model so that the positive sample pair is closer in the representation space and the negative sample pair is farther, thereby adjusting the hyper-parameters of the BERT encoder, acting on the extraction of the security knowledge triples.
Figure BDA0004100538670000126
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004100538670000127
is an indication function, if k is not equal to i, the result is 1, z i And z j For projection representation, τ represents the temperature coefficient.
Still further, the triplet extraction loss function is used for relation triplet extraction to identify all possible construction safety knowledge triples (entity s, relation r, entity o). The model was trained by Adam random gradient descent to maximize the objective function J (Θ) over a small batch of upsets.
Further, annotated text x of a given training set D j A set of possibly overlapping triples T j = { (s, r, o) } with the objective being to maximize the data likelihood objective of training set D, J (Θ) being expressed as
Figure BDA0004100538670000128
Wherein s is E T j Representation appears at T j Entity s in the triplet. T (T) j S is the triplet set that is led by the entity s in. (r, o) ε T j S is T j In a triplet guided by entity s(r, o) pairs of (d). R is the set of all possible relationships. R\T j S represents all relationships except those that are led by s. o (o) φ Representing an "empty" entity. Parameters (parameters)
Figure BDA00041005386700001210
p θ (s|x) and p θr (o|s, x) has been defined in the foregoing formula
The construction safety knowledge graph is constructed according to the following method:
and importing the safety knowledge entity extracted from the text data and the relation thereof into a Neo4j graph database, and storing and generating to obtain a construction safety knowledge graph.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (9)

1. The construction worker safety knowledge extraction and knowledge graph construction method is characterized by comprising the following steps of:
s1, acquiring construction safety knowledge text data to form an original data set, constructing a safety knowledge framework, and defining a hierarchy included in the framework, sub-hierarchies corresponding to each hierarchy and attribute relations among the hierarchies;
s2, dividing partial data in the original data set into a training set and a testing set, classifying all entries in each piece of text data in the training set and the testing set according to the levels and sub-levels defined in the security knowledge framework, and determining attribute relations among all levels so as to obtain the corresponding level, sub-level and attribute relations among the levels of each piece of text data;
s3, constructing a prediction model, taking the text data in the training set in the step S2 as input, taking the corresponding level, sub-level and attribute relation of the text data as output, training the prediction model, checking the prediction model by using the test set, and predicting the text data in the residual original data set by using the prediction model obtained after training, so as to obtain the attribute relation among the corresponding level, sub-level and each level of each term in each text data in the residual original data set, namely realizing extraction of safety knowledge.
2. A construction worker safety knowledge extraction and knowledge graph construction method as claimed in claim 1, wherein in step S3, the prediction model is a modified cassel model obtained by combining a cassel model with a comparative learning idea box, i.e., a CL-cassel model.
3. The construction worker safety knowledge extraction and knowledge graph construction method according to claim 2, wherein the comparison learning concept comprises a data enhancement module, a BERT encoder, a mapping head and a comparison loss function.
4. A construction worker safety knowledge extraction and knowledge graph construction method as claimed in claim 3, wherein the CL-cam el model comprises a data enhancement module and a BERT encoder, the data enhancement module is used for converting data into positive and negative samples, the output characteristics of the BERT encoder comprise two branches, respectively BERT embedding and an aggregation sequence of safety knowledge texts, the BERT embedding is used for extraction of safety knowledge, the aggregation sequence of safety knowledge texts is used for performing contrast learning to obtain contrast loss, and the contrast loss is used for correcting parameters of the BERT encoder.
5. The construction worker safety knowledge extraction and knowledge graph construction method according to claim 4, wherein the CL-cassel model safety knowledge extraction method comprises the following steps:
s31, generating a positive sample and a negative sample by using text data in the training set to form a positive sample set and a negative sample set;
s32, dividing the data in the positive and negative sample sets into two parts, respectively inputting the data of the two parts into a BERT encoder, and simultaneously outputting an aggregation sequence of the safety knowledge text by the encoder;
s33, respectively obtaining an aggregation sequence of the safety knowledge text by two parts of data, and obtaining a comparison loss by comparing and learning thought calculation;
s34, the obtained comparison loss is returned to the BERT encoder, and the parameters of the encoder are corrected, so that the encoder is updated continuously until the loss is stable.
6. The construction worker safety knowledge extraction and knowledge graph construction method according to claim 1, wherein after step S3, the construction worker safety knowledge graph is obtained by performing safety knowledge extraction in step S3 to obtain all the text data corresponding levels, sub-levels and attribute relationships between the levels in the text data, and importing the attribute relationships into the graph database.
7. A construction worker safety knowledge extraction and knowledge graph construction method as claimed in claim 6, wherein in step S1, the hierarchy includes construction resources, construction activities, construction locations, risk factors, potential risks and precautions.
8. The method for extracting safety knowledge and constructing a knowledge graph of construction workers according to claim 7, wherein in the step S1, the construction resource is implemented, provided with and implemented with the construction activity, the preventive measure and the risk factor, the construction activity is located, provided with and required with the construction position, the risk factor and the preventive measure, the risk factor is located and caused with the potential risk and the construction position, the potential risk is required with the preventive measure, and the preventive measure is located with the construction position.
9. The method for extracting safety knowledge and constructing a knowledge map of construction workers according to claim 7, wherein in the step S1, the collected construction safety knowledge text data is obtained from "the people' S republic of China" building law, "regulations for construction engineering safety production management," unified standards for construction quality inspection of construction engineering, "technical specifications for construction work at high altitudes," regulations for use safety of construction machines, "standards for construction safety inspection," regulations for construction safety operation, or "standards for subway engineering safety evaluation," and reports of production safety accidents disclosed by the living building and the living building bureau of each province.
CN202310175037.7A 2023-02-28 2023-02-28 Construction worker safety knowledge extraction and knowledge graph construction method Pending CN116069951A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310175037.7A CN116069951A (en) 2023-02-28 2023-02-28 Construction worker safety knowledge extraction and knowledge graph construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310175037.7A CN116069951A (en) 2023-02-28 2023-02-28 Construction worker safety knowledge extraction and knowledge graph construction method

Publications (1)

Publication Number Publication Date
CN116069951A true CN116069951A (en) 2023-05-05

Family

ID=86176796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310175037.7A Pending CN116069951A (en) 2023-02-28 2023-02-28 Construction worker safety knowledge extraction and knowledge graph construction method

Country Status (1)

Country Link
CN (1) CN116069951A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117473431A (en) * 2023-12-22 2024-01-30 青岛民航凯亚系统集成有限公司 Airport data classification and classification method and system based on knowledge graph

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117473431A (en) * 2023-12-22 2024-01-30 青岛民航凯亚系统集成有限公司 Airport data classification and classification method and system based on knowledge graph

Similar Documents

Publication Publication Date Title
CN111026842B (en) Natural language processing method, natural language processing device and intelligent question-answering system
CN113392986B (en) Highway bridge information extraction method based on big data and management maintenance system
CN109871955B (en) Aviation safety accident causal relation extraction method
CN106815293A (en) System and method for constructing knowledge graph for information analysis
CN108447534A (en) A kind of electronic health record data quality management method based on NLP
CN113468888A (en) Entity relation joint extraction method and device based on neural network
CN112434535B (en) Element extraction method, device, equipment and storage medium based on multiple models
CN112559734B (en) Brief report generating method, brief report generating device, electronic equipment and computer readable storage medium
CN114168745A (en) Knowledge graph construction method for production process of ethylene oxide derivative
CN113791757B (en) Software requirement and code mapping method and system
CN113191148A (en) Rail transit entity identification method based on semi-supervised learning and clustering
CN116205482A (en) Important personnel risk level assessment method and related equipment
KR102291193B1 (en) Automatic device and method for recommending risk and safety information based on artificial intelligence
CN111666373A (en) Chinese news classification method based on Transformer
CN116069951A (en) Construction worker safety knowledge extraction and knowledge graph construction method
CN115809833A (en) Intelligent monitoring method and device for capital construction project based on portrait technology
CN112257425A (en) Power data analysis method and system based on data classification model
CN116502646A (en) Semantic drift detection method and device, electronic equipment and storage medium
CN115017879A (en) Text comparison method, computer device and computer storage medium
CN114911893A (en) Method and system for automatically constructing knowledge base based on knowledge graph
CN117151222B (en) Domain knowledge guided emergency case entity attribute and relation extraction method thereof, electronic equipment and storage medium
CN117077631A (en) Knowledge graph-based engineering emergency plan generation method
CN117390198A (en) Method, device, equipment and medium for constructing scientific and technological knowledge graph in electric power field
CN116523042A (en) Combined extraction method and system for power grid dispatching entity relationship
CN116226371A (en) Digital economic patent classification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination