CN115238685A - Combined extraction method for building engineering change events based on position perception - Google Patents
Combined extraction method for building engineering change events based on position perception Download PDFInfo
- Publication number
- CN115238685A CN115238685A CN202211166342.1A CN202211166342A CN115238685A CN 115238685 A CN115238685 A CN 115238685A CN 202211166342 A CN202211166342 A CN 202211166342A CN 115238685 A CN115238685 A CN 115238685A
- Authority
- CN
- China
- Prior art keywords
- engineering change
- sentence
- character
- text
- event
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000008859 change Effects 0.000 title claims abstract description 229
- 238000000605 extraction Methods 0.000 title claims abstract description 22
- 230000008447 perception Effects 0.000 title claims abstract description 14
- 238000000034 method Methods 0.000 claims abstract description 33
- 238000010276 construction Methods 0.000 claims abstract description 30
- 238000005728 strengthening Methods 0.000 claims abstract description 9
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 230000002776 aggregation Effects 0.000 claims abstract description 5
- 238000004220 aggregation Methods 0.000 claims abstract description 5
- 239000013598 vector Substances 0.000 claims description 34
- 239000000126 substance Substances 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 9
- 230000007246 mechanism Effects 0.000 claims description 7
- 238000002372 labelling Methods 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 6
- 230000002401 inhibitory effect Effects 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 230000009471 action Effects 0.000 claims description 3
- 238000007500 overflow downdraw method Methods 0.000 claims description 2
- 230000009286 beneficial effect Effects 0.000 abstract description 4
- 230000000694 effects Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000009435 building construction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/08—Construction
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Business, Economics & Management (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Economics (AREA)
- Evolutionary Computation (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a combined extraction method of a building engineering change event based on position perception. The method comprises the following steps: acquiring a plurality of construction engineering change texts, and defining arguments and trigger words of engineering change events; preprocessing the engineering change text, and marking the word granularity of the engineering change text according to arguments and trigger words of the engineering change event; obtaining a prototype representation of an engineering change event; constructing a character feature coding module, strengthening the argument and the characters of the boundary position of the trigger word, and obtaining the character features with enhanced domain knowledge; constructing a sentence feature coding module to obtain sentence features for changing semantic perception; constructing a feature aggregation module to obtain deep character features with global context; and constructing a sequence marking module to carry out structural expression of the engineering change event. The invention is beneficial to improving the extraction effect of the construction engineering change event by fusing the domain knowledge semantics and sentence level characteristics in the character characteristics and utilizing the prior label knowledge.
Description
Technical Field
The invention belongs to the field of natural language processing and construction, and particularly relates to a combined extraction method of construction engineering change events based on position perception.
Background
Due to the characteristics of long-term performance, complexity and dynamics of the building engineering, the change of the engineering quantity is easily caused by the change of construction requirements, construction environment, construction progress, construction quality and the like during the development of projects. Engineering change is a key link in the construction process, the change intention of the engineering change needs to be evaluated in time, change instructions need to be strictly executed, and change data need to be effectively recorded, otherwise, the engineering change not only causes economic loss of building project participation units, but also seriously affects the building quality and safety. Therefore, effective management of the engineering change event is a necessary condition for smooth completion of the construction project.
Engineering change events are usually present in unstructured documents and contain descriptions of the change targets and their associated properties, called key elements of the engineering change event, such as: change objects, place of ownership, change modes, and the like. The key elements are extracted, so that the change requirements can be acquired quickly, the change events can be stored comprehensively and structurally, and the engineering management level is improved.
Currently, event extraction methods are mostly based on sentence level recognition. The invention discloses an automatic extraction and classification method and system for building construction process constraints, which extract the process constraints in building code articles. However, the case when the text contains a plurality of sentences and the event elements are dispersed in the context is ignored. Zheng S et al studied document level events in the financial field, which was done with labels mentioned based on known standard entities. However, in the building field, there is no fully mature and universal entity recognition model, and event extraction cannot be realized in a classification manner by utilizing entity class information (Zheng S, cao W, xu W, et al Doc2EDAG: an end-to-end document-level frame for Chinese fine event extraction [ J ]. ArXiv prepropressin: 1904.07535, 2019.). In addition, although the construction engineering change event is unavoidable, the problem still belongs to a small sample compared with a news event and the like, so that the extraction difficulty is high.
Disclosure of Invention
The invention aims to intelligently extract key elements of engineering change events in the building field by using a natural language processing technology and perform structured expression. The invention integrates the argument roles of the engineering change events into the argument class labels, strengthens the character characteristics of the boundary positions of the arguments, simultaneously strengthens the sentence characteristics of different positions by combining the prototype representation of the engineering change events, and realizes the joint extraction of the document-level engineering change events.
The purpose of the invention is realized by at least one of the following technical solutions.
A combined extraction method for building engineering change events based on location awareness comprises the following steps:
s1: acquiring a plurality of construction engineering change texts, analyzing the construction engineering change texts, determining elements forming an engineering change event, and defining argument and trigger words of the engineering change event;
s2: preprocessing the engineering change text, and marking the word granularity of the engineering change text according to arguments and trigger words of the engineering change event;
s3: obtaining a prototype representation of the engineering change event according to the labeled label information of the engineering change text;
s4: constructing a character feature coding module, strengthening arguments and characters at the boundary positions of trigger words by using element semantics of engineering change events to obtain the character features with enhanced domain knowledge;
s5: constructing a sentence feature coding module, and characterizing and sensing sentences containing event arguments and trigger words in the engineering change document by using the prototype of the engineering change event to obtain sentence features of change semantic sensing;
s6: constructing a feature aggregation module, and fusing sentence features and character features to obtain deep character features with global context;
s7: and constructing a sequence marking module, learning label dependence information corresponding to deep character features, obtaining an optimal label sequence in the engineering change text, and performing structural expression of engineering change events.
Further, in step S1, elements constituting the project change event include a building component, a building site, a building floor, a building space, an attribute, a numerical attribute value, an object attribute target, and a change mode for the building component;
defining roles of arguments of the engineering change event, including building components, building sites, building floors, building spaces, attributes, numerical attribute values and object attribute targets;
the trigger word of the engineering change event is a word for expressing a change mode.
Further, in step S2, the engineering change text includes a plurality of sentences, and the sentence preprocessing is performed on the engineering change text according to punctuations in the engineering change text, where each action is a single sentence;
obtaining the number of sentences and the length of sentences included in each preprocessed engineering change text; and marking the character granularity of the engineering change text by adopting a 'BIO' three-bit sequence marking method.
Further, the 'BIO' three-digit sequence labeling method is adopted to label the character granularity of the engineering change text, which specifically comprises the following steps:
marking words of the engineering change text, of which the categories belong to arguments, as arguments, wherein the labels are the roles of the words; and marking the words expressing the change modes in the engineering change text as trigger words.
Further, step S3 specifically includes the following steps:
s3.1: analyzing the importance of the elements constituting the engineering change event, and assigning a weight to the elements determined in step S1;
s3.2: for each engineering change text, for the argument and the trigger word labeled in the step S2, obtaining a corresponding word vector, and calculating the semantic representation of the engineering change event included in the engineering change text according to the weight of the corresponding word vectoreThe method comprises the following steps:
wherein, the first and the second end of the pipe are connected with each other,for changing in text for engineeringElements are labeled arguments or word vectors of trigger words,as arguments or trigger wordsThe weight of (a) is determined,the number of different element types;
s3.3: calculating semantic representations of the engineering change events corresponding to all the engineering change texts acquired in the step S1, and obtaining prototype representations of the engineering change events through average calculation:
Wherein the content of the first and second substances,the number of the engineering change texts acquired in the step S1.
Further, step S4 includes the steps of:
s4.1: the ith sentence consisting of T characters in the engineering change textThe input word vector model obtains the ith sentenceEach character vector in,Indicating the ith sentence in the project change textSeed of Japanese apricotT =1 to t;
S4.3: using word segmentation tool to the ith sentencePerforming word segmentation to obtain the ith sentenceThe semantic information of each word is fused to the hidden characteristics of each character in the word by different character position weights to obtain the character characteristics with enhanced domain knowledge:
Wherein the content of the first and second substances,representing the ith sentenceThe semantic vector of the jth word in the sentence, p represents the semantic vector constituting the ith sentenceThe p-th character of the jth word,to representThe ith sentenceThe position weight corresponding to the p character of the j word;
s4.4: and repeating the steps S4.1-S4.3 for all sentences in the engineering change text to obtain the character features of the enhanced domain knowledge of each character in all the sentences in the engineering change text.
wherein the content of the first and second substances,softmax() Represents a normalized exponential function;Normalization() Represents the maximum and minimum normalization;representing the ith sentenceThe number of characters included in the jth word.
Further, step S5 includes the steps of:
s5.1: establishing a coding layer capable of extracting partial characteristics of sentences, and learning the ith sentence in step S4.2The hidden feature of each character in the sentence is obtained to obtain the semantic representation of the ith sentence;
S5.2: according to the position sequence of the ith sentence in the document, splicing a position vectorObtaining the sentence representation of the ith sentence:
S5.3: calculating the correlation between the prototype representation of the engineering change event and the representation of the sentence, strengthening the event sentence characteristics including event arguments or trigger words in the document, and inhibiting irrelevant non-event sentence characteristics to obtain the sentence characteristics perceived by the change semantics of the ith sentence:
Wherein the content of the first and second substances,the relevance of the sentence representation of the ith sentence obtained by using an attention mechanism and the engineering change event prototype representation;
s5.4: and repeating the steps S5.1-S5.3 for all sentences in the engineering change text to obtain the sentence characteristics of the change semantic perception of all sentences in the engineering change text.
Further, in step S6, for all sentences in the engineering change text, the sentence features of the change semantic perception are fused to the character features of the enhanced domain knowledge in the sentence, so as to obtain deep character features with global contextThe method comprises the following steps:
wherein the content of the first and second substances,features of deep characters representing the t-th character in the i-th sentence,a feature fusion method is shown.
Further, step S7 includes the steps of:
s7.1: inputting deep character features corresponding to all characters in the engineering change text into the conditional random field model, learning the dependency relationship among the labels marked in the step S2 of all the characters in the engineering change text, and acquiring the optimal label sequence of the engineering change text to be extracted;
s7.2: and extracting words of corresponding label categories according to the optimal label sequence of the engineering change text, filling the words into an engineering change event expression template, and performing structured expression on the engineering change event.
Compared with the prior art, the invention at least has the following beneficial effects:
1. the invention fuses the semantics of the words with domain knowledge into the character features through the character feature coding module, which is beneficial to improving the identification accuracy of the engineering change event argument and the trigger words in the building domain; the characteristics of the boundary characters in the words are enhanced through position weight, and the problems of argument and error in recognition of the triggering word boundary are reduced;
2. according to the invention, through the sentence characteristic coding module, the attention mechanism is used for strengthening the characteristics of the event sentences containing arguments and trigger words, and inhibiting the characteristics of the non-event sentences; meanwhile, the sentence level features are fused into the character level features, and deep character features with global context are constructed;
3. according to the invention, by constructing the prototype representation of the engineering change event, the model can utilize the prior label knowledge, and is beneficial to extracting the engineering change small sample event.
Drawings
Fig. 1 is a flowchart of a combined extraction method for a construction engineering change event according to the present invention;
FIG. 2 is a diagram of engineering change text labeling and structured representation in an embodiment of the method of the present invention;
FIG. 3 is a diagram of a character encoding module in an embodiment of the method of the present invention;
FIG. 4 is a diagram of sentence encoding modules in an embodiment of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples, but the present invention is not limited thereto.
Example 1:
a combined extraction method of building engineering change events based on location awareness, as shown in fig. 1, includes the following steps:
s1: acquiring a plurality of construction engineering change texts, analyzing the construction engineering change texts, determining elements forming an engineering change event, and defining argument and trigger words of the engineering change event;
the elements forming the engineering change event comprise building components, a building site, building floors, building spaces, attributes, numerical attribute values, object attribute targets and change modes of the building components;
determining that the key elements forming the engineering change event comprise 8 types, wherein the meanings of the elements 1-8 are as follows:
element 1: determining or planning a building element that requires modification;
element 2: the main project or the supporting project to which the building component belongs;
element 3: the floor location of the building element;
element 4: the spatial position of the building element;
element 5: the architectural attribute that the architectural element requires modification;
element 6: the numerical requirement that the building data attribute needs to be changed;
element 7: the target requirement that the attribute of the building object needs to be changed;
element 8: the concrete mode of modification is applied to the building member.
It is understood that the engineering change event elements include, but are not limited to, elements 1 to 8.
Defining roles of arguments of the engineering change event, wherein the roles comprise building components, building sites, building floors, building spaces, attributes, numerical attribute values and object attribute targets;
the trigger word of the engineering change event is a word for expressing a change mode.
S2: preprocessing the engineering change text, and marking the word granularity of the engineering change text according to arguments and trigger words of the engineering change event;
the engineering change text comprises a plurality of sentences, and the sentence preprocessing is carried out on the engineering change text according to punctuation marks in the engineering change text, wherein each action is a single sentence;
obtaining the number of sentences and the length of sentences included in each preprocessed engineering change text; and marking the character granularity of the engineering change text by adopting a 'BIO' three-bit sequence marking method.
The 'BIO' three-bit sequence marking method is adopted to mark character granularity on the engineering change text, and the method specifically comprises the following steps:
marking words of the engineering change text, of which the categories belong to arguments, as arguments, wherein the labels are the roles of the words; and marking the words expressing the change modes in the engineering change text as trigger words.
In this embodiment, as shown in fig. 2, an engineering change text is marked with arguments and trigger words.
S3: the method comprises the following steps of obtaining a prototype representation of an engineering change event according to label information of an annotated engineering change text, and specifically comprises the following steps:
s3.1: analyzing the importance of the elements constituting the engineering change event, and assigning a weight to the elements determined in step S1;
s3.2: for each engineering change text, for the argument and the trigger word labeled in step S2, the corresponding word vector is obtained, and the semantic representation of the engineering change event included in the engineering change text is calculated according to the weight thereof, which is specifically as follows:
wherein the content of the first and second substances,for elements in the engineering change text, i.e. annotated arguments or word vectors of trigger words,as arguments or trigger wordsThe weight of (a) is determined,the number of different element categories; in this embodiment, a word vector is obtained by querying a word2vec word vector model;
in this embodiment, as shown in fig. 2, the labeled engineering change event expression words are:
(modification, fifth seat, floor 41, logistics area, ceiling, height);
determining that the element 1 is an element which needs to be clarified by analyzing information expressed by the engineering change text, and setting the argument 'building component' weight as a first level; setting the weights of the argument 'building site', 'building floor' and 'building space' as a second grade for 2-4 times of the position information elements of the building component; elements 5-8 have the possibility of not being clear in the engineering change intention type text or exist in a drawing, so the argument 'attribute', 'numerical attribute value', 'object attribute target' and the trigger weight are set to be in a third grade;
s3.3: calculating semantic representations of the engineering change events corresponding to all the engineering change texts acquired in the step S1, and obtaining prototype representations of the engineering change events through average calculation:
wherein, the first and the second end of the pipe are connected with each other,the number of the project change texts acquired in the step S1.
S4: the method comprises the following steps of constructing a character feature coding module, strengthening argument and triggering characters of word boundary positions by using element semantics of engineering change events to obtain field knowledge enhanced character features, and comprising the following steps of:
s4.1: the ith sentence consisting of T characters in the engineering change textThe input word vector model obtains the ith sentenceEach character vector in,Representing the ith sentence in the engineering change textT =1 to t; in the present embodiment, each character vector is obtained using the bidirectional language model BERT;
S4.2: in this embodiment, the ith sentence is extracted through the Bi-LSTMHidden feature of each character in;
S4.3: using word segmentation tool to the ith sentencePerforming word segmentation to obtain the ith sentenceThe semantic information of each word is fused to the hidden characteristics of each character in the word by different character position weights to obtain the character characteristics with enhanced domain knowledge:
Wherein the content of the first and second substances,represents the ith sentenceThe semantic vector of the jth word in the sentence, p represents the semantic vector constituting the ith sentenceThe p-th character of the jth word,representing the ith sentenceThe position weight corresponding to the p character of the j word;
wherein the content of the first and second substances,representing the ith sentenceThe number of characters included in the jth word.
In the invention, because the sequence marking task needs to identify a section of continuous characters, boundary identification errors are easy to generate; in addition, the conventional sequence labeling method based on word granularity cannot effectively utilize the self semantics of the words, and is easy to cause category identification errors.
Therefore, in step S4.3, semantic information of the word is introduced, and meanwhile, the character position weight is designed, so that larger word information can be fused with the character features of the boundary position in the word, thereby enhancing the argument and the character features of the boundary position of the trigger word.
In this embodiment, fig. 3 is a schematic diagram of a character feature encoding module.
S4.4: and repeating the steps S4.1-S4.3 for all sentences in the engineering change text to obtain the character characteristics of each character in all sentences in the engineering change text with enhanced domain knowledge.
S5: the method comprises the following steps of constructing a sentence characteristic coding module, sensing a sentence containing an event argument and a trigger word in an engineering change document by using a prototype representation of an engineering change event to obtain a sentence characteristic of changing semantic perception, and comprising the following steps:
s5.1: establishing a coding layer capable of extracting partial characteristics of sentences, in this embodiment, learning the ith sentence in step S4.2 by using convolutional neural network CNNThe hidden feature of each character in the sentence obtains the semantic representation of the ith sentence;
S5.2: according to the position sequence of the ith sentence in the document, splicing a position vectorIn this embodiment, the position vector in the transform model is used to obtain the sentence representation of the ith sentence:
S5.3: calculating the correlation between the prototype representation of the engineering change event and the representation of the sentence, strengthening the event sentence characteristics including event arguments or trigger words in the document, and inhibiting irrelevant non-event sentence characteristics to obtain the sentence characteristics perceived by the change semantics of the ith sentence:
Wherein the content of the first and second substances,in order to obtain the correlation between the sentence representation of the ith sentence and the engineering change event prototype representation by using the attention mechanism, the attention mechanism is adopted in the embodiment to calculate as follows:
wherein the content of the first and second substances,indicating the first in the project alteration textThe feature vector of an individual sentence is,for the number of sentences included in the engineering change text, the calculation formula of the attention score is as follows:
wherein, the first and the second end of the pipe are connected with each other,is a dimension of the feature vector of the sentence,scoredenotes the attention score and T denotes the transpose.
S5.4: and repeating the steps S5.1-S5.3 for all sentences in the engineering change text to obtain the sentence characteristics of the change semantic perception of all sentences in the engineering change text.
In this embodiment, fig. 4 is a schematic diagram of a sentence feature encoding module.
S6: constructing a feature aggregation module, and fusing sentence features and character features to obtain deep character features with global context;
for all sentences in the engineering change text, in the embodiment, the sentence characteristics of the change semantic perception are fused to the character characteristics of the field knowledge enhancement in the sentences through a gating mechanism to obtain deep character characteristics with global contextThe method comprises the following steps:
wherein, the first and the second end of the pipe are connected with each other,features of deep characters representing the t-th character in the i-th sentence,Gaterepresenting the weight, and the calculation formula is as follows:
wherein the content of the first and second substances,andwhich is indicative of a training parameter that is,Sigmoid() As a function of the squeeze.
S7: constructing a sequence labeling module, learning label dependence information corresponding to deep character features, obtaining an optimal label sequence in an engineering change text, and performing structural expression of engineering change events, wherein the method comprises the following steps of:
s7.1: inputting deep character features corresponding to all characters in the engineering change text into the conditional random field model, learning the dependency relationship among the labels marked in the step S2 of all the characters in the engineering change text, and acquiring the optimal label sequence of the engineering change text to be extracted;
s7.2: and extracting words of corresponding label categories according to the optimal label sequence of the engineering change text, filling the words into an engineering change event expression template, and performing structured expression on the engineering change event.
In this embodiment, the engineering change event information expression template is shown in fig. 2.
In this embodiment, the method of the present invention is compared with a commonly used method based on word features in combination with engineering change text data from a real building project.
TABLE 1 results of experiments on data sets of the method of the invention and other classical methods
BiLSTM-CRF | BERT-BiLSTM-CRF | The method of the invention | |
Micro Recall | 52.84 | 71.67 | 76.11 |
The experimental result shows that the method has the highest recall ratio on a small-scale real engineering change text data set, and is greatly improved compared with two common methods.
Example 2:
in the present embodiment, the difference from embodiment 1 is that:
s4.1: sentence composed of T charactersInput word vector model obtaining sentenceEach character vector in(ii) a In the present embodiment, each character vector is obtained using the dynamic word vector model ElMo;
S4.2: extracting sentences through coding layerHidden feature of each character in(ii) a In this embodiment, the hidden feature of each character is extracted by Bi-directional gated cyclic unit Bi-GRU;
Example 3:
in the present embodiment, the difference from embodiment 1 is that:
s6: constructing a feature aggregation module, and fusing sentence features and character features to obtain deep character features with global context;
in the embodiment, the sentence features with changed semantic perception are fused to the character features with enhanced domain knowledge in the sentence in a splicing mode to obtain deep character features with global contextThe method comprises the following steps:
the above operation is repeated for all sentences in the engineering change text.
Example 4:
in this embodiment, the difference from embodiment 1 is that:
s5.1: establishing a coding layer capable of extracting partial features of the sentence, in the embodiment, learning the hidden features of the sentence characters in the step S4.2 by adopting a convolutional neural network GCN to obtain the semantic representation of the sentence;
S5.3: calculating the correlation between the engineering change event prototype representation and the sentence representation, strengthening the event sentence characteristics including event arguments or trigger words in the document, and inhibiting irrelevant non-event sentence characteristics to obtain the sentence characteristics perceived by the change semantics:
wherein the content of the first and second substances,for the relevance of the sentence features to the prototype representation of the engineering change event, the attention mechanism is used in this embodiment to calculate as:
wherein, the first and the second end of the pipe are connected with each other,indicating the first in the project alteration textThe feature vector of an individual sentence is,for the number of sentences included in the engineering change text, the calculation formula of the attention score is as follows:
wherein the content of the first and second substances,、、t represents transposition for the training parameters; tanh () is a hyperbolic tangent function.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A combined extraction method for building engineering change events based on location awareness is characterized by comprising the following steps:
s1: acquiring a plurality of construction engineering change texts, analyzing the construction engineering change texts, determining elements forming an engineering change event, and defining argument and trigger words of the engineering change event;
s2: preprocessing the engineering change text, and marking the word granularity of the engineering change text according to arguments and trigger words of the engineering change event;
s3: obtaining a prototype representation of the engineering change event according to the labeled label information of the engineering change text;
s4: constructing a character feature coding module, and strengthening argument and characters of the boundary position of a trigger word by using element semantics of an engineering change event to obtain the character features with enhanced domain knowledge;
s5: constructing a sentence characteristic coding module, and representing and sensing a sentence containing an event argument and a trigger word in an engineering change document by using a prototype of an engineering change event to obtain a sentence characteristic of changing semantic perception;
s6: constructing a feature aggregation module, and fusing sentence features and character features to obtain deep character features with global context;
s7: and constructing a sequence labeling module, learning label dependence information corresponding to deep character features, obtaining an optimal label sequence in the engineering change text, and performing structural expression of engineering change events.
2. The combined extraction method for the construction engineering change event based on the location awareness as claimed in claim 1, wherein in the step S1, the elements constituting the construction change event comprise a construction member, a construction site, a construction floor, a construction space, an attribute, a numerical attribute value, an object attribute target and a change mode of the construction member;
defining roles of arguments of the engineering change event, wherein the roles comprise building components, building sites, building floors, building spaces, attributes, numerical attribute values and object attribute targets;
the trigger word of the engineering change event is a word for expressing a change mode.
3. The combined extraction method of construction engineering change events based on location awareness as claimed in claim 2, wherein in step S2, the engineering change text comprises a plurality of sentences, and the sentence preprocessing is performed on the engineering change text according to punctuation marks in the engineering change text, each action being a single sentence;
obtaining the number of sentences and the length of sentences included in each preprocessed engineering change text; and marking the character granularity of the engineering change text by adopting a 'BIO' three-bit sequence marking method.
4. The combined extraction method for the construction engineering change events based on the location awareness as claimed in claim 3, wherein the 'BIO' three-bit sequence labeling method is adopted to label the character granularity of the engineering change text, and specifically comprises the following steps:
marking words of the engineering change text, of which the categories belong to arguments, as arguments, wherein the labels are the roles of the words; and marking the words expressing the change modes in the engineering change text as trigger words.
5. The combined extraction method for the construction engineering change event based on the location awareness as claimed in claim 4, wherein the step S3 specifically comprises the following steps:
s3.1: analyzing the importance of the elements constituting the engineering change event, and assigning a weight to the elements determined in step S1;
s3.2: for each engineering change text, for the argument and the trigger word labeled in the step S2, obtaining a corresponding word vector, and calculating the semantic representation of the engineering change event included in the engineering change text according to the weight of the corresponding word vectoreThe method comprises the following steps:
wherein, the first and the second end of the pipe are connected with each other,for elements in the engineering change text, namely labeled arguments or word vectors of trigger words,as arguments or trigger wordsThe weight of (a) is calculated,the number of different element types;
s3.3: calculating semantic representations of the engineering change events corresponding to all the engineering change texts acquired in the step S1, and obtaining prototype representations of the engineering change events through average calculation:
6. The combined extraction method for the building engineering change event based on the location awareness as claimed in claim 1, wherein the step S4 comprises the steps of:
s4.1: the ith sentence consisting of T characters in the engineering change textThe input word vector model obtains the ith sentenceEach character vector in,Representing the ith sentence in the engineering change textT =1 to t;
S4.3: using word segmentation tool to the ith sentencePerforming word segmentation to obtain the ith sentenceThe semantic information of each word is fused to the hidden characteristics of each character in the word by different character position weights to obtain the character characteristics with enhanced domain knowledge:
Wherein the content of the first and second substances,representing the ith sentenceThe semantic vector of the jth word in the sentence, p represents the semantic vector forming the ith sentenceThe p-th character of the jth word,representing the ith sentenceThe position weight corresponding to the p character of the j word;
s4.4: and repeating the steps S4.1-S4.3 for all sentences in the engineering change text to obtain the character characteristics of each character in all sentences in the engineering change text with enhanced domain knowledge.
7. The method as claimed in claim 6, wherein the character position weight is extracted from the combined extraction of the construction engineering change events based on position perceptionThe calculation formula is as follows:
8. The combined extraction method for the building engineering change event based on the location awareness as claimed in claim 1, wherein the step S5 comprises the steps of:
s5.1: establishing a coding layer capable of extracting partial characteristics of sentences, and learning the ith sentence in the step S4.2The hidden feature of each character in the sentence is obtained to obtain the semantic representation of the ith sentence;
S5.2: according to the position sequence of the ith sentence in the document, splicing a position vectorObtaining sentence representation of ith sentence:
S5.3: calculating the correlation between the prototype representation of the engineering change event and the representation of the sentence, strengthening the event sentence characteristics including event arguments or trigger words in the document, and inhibiting irrelevant non-event sentence characteristics to obtain the sentence characteristics perceived by the change semantics of the ith sentence:
Wherein the content of the first and second substances,the relevance of the sentence representation of the ith sentence obtained by using an attention mechanism and the prototype representation of the engineering change event;
s5.4: and repeating the steps S5.1-S5.3 for all sentences in the engineering change text to obtain the sentence characteristics of the change semantic perception of all sentences in the engineering change text.
9. The method as claimed in claim 1, wherein in step S6, for all sentences in the engineering change text, the sentence features of semantic perception of change are fused to the character features of domain knowledge enhancement in the sentence, so as to obtain deep character features with global contextThe method comprises the following steps:
10. The method for jointly extracting the construction engineering change events based on the location awareness according to any one of claims 1 to 9, wherein the step S7 comprises the following steps:
s7.1: inputting deep character features corresponding to all characters in the engineering change text into the conditional random field model, learning the dependency relationship among the labels marked in the step S2 of all the characters in the engineering change text, and acquiring the optimal label sequence of the engineering change text to be extracted;
s7.2: and extracting words of corresponding label categories according to the optimal label sequence of the engineering change text, filling the words into an engineering change event expression template, and performing structured expression on the engineering change event.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211166342.1A CN115238685B (en) | 2022-09-23 | 2022-09-23 | Combined extraction method for building engineering change events based on position perception |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211166342.1A CN115238685B (en) | 2022-09-23 | 2022-09-23 | Combined extraction method for building engineering change events based on position perception |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115238685A true CN115238685A (en) | 2022-10-25 |
CN115238685B CN115238685B (en) | 2023-03-21 |
Family
ID=83667029
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211166342.1A Active CN115238685B (en) | 2022-09-23 | 2022-09-23 | Combined extraction method for building engineering change events based on position perception |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115238685B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115577112A (en) * | 2022-12-09 | 2023-01-06 | 成都索贝数码科技股份有限公司 | Event extraction method and system based on type perception gated attention mechanism |
CN117094397A (en) * | 2023-10-19 | 2023-11-21 | 北京大数据先进技术研究院 | Fine granularity event information extraction method, device and product based on shorthand |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112183030A (en) * | 2020-10-10 | 2021-01-05 | 深圳壹账通智能科技有限公司 | Event extraction method and device based on preset neural network, computer equipment and storage medium |
CN112528676A (en) * | 2020-12-18 | 2021-03-19 | 南开大学 | Document-level event argument extraction method |
KR20210124938A (en) * | 2020-11-26 | 2021-10-15 | 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. | Event extraction method, device, electronic equipment and storage medium |
CN113591483A (en) * | 2021-04-27 | 2021-11-02 | 重庆邮电大学 | Document-level event argument extraction method based on sequence labeling |
CN114298053A (en) * | 2022-03-10 | 2022-04-08 | 中国科学院自动化研究所 | Event joint extraction system based on feature and attention mechanism fusion |
CN114818721A (en) * | 2022-06-30 | 2022-07-29 | 湖南工商大学 | Event joint extraction model and method combined with sequence labeling |
-
2022
- 2022-09-23 CN CN202211166342.1A patent/CN115238685B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112183030A (en) * | 2020-10-10 | 2021-01-05 | 深圳壹账通智能科技有限公司 | Event extraction method and device based on preset neural network, computer equipment and storage medium |
KR20210124938A (en) * | 2020-11-26 | 2021-10-15 | 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. | Event extraction method, device, electronic equipment and storage medium |
CN112528676A (en) * | 2020-12-18 | 2021-03-19 | 南开大学 | Document-level event argument extraction method |
CN113591483A (en) * | 2021-04-27 | 2021-11-02 | 重庆邮电大学 | Document-level event argument extraction method based on sequence labeling |
CN114298053A (en) * | 2022-03-10 | 2022-04-08 | 中国科学院自动化研究所 | Event joint extraction system based on feature and attention mechanism fusion |
CN114818721A (en) * | 2022-06-30 | 2022-07-29 | 湖南工商大学 | Event joint extraction model and method combined with sequence labeling |
Non-Patent Citations (4)
Title |
---|
CHAO SHEN等: ""Joint Event Extraction Based on CNN-BiGRU and Attention Mechanism"", 《2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML)》 * |
YUBO CHEN: ""Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks"", 《PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 * |
杨登辉等: ""基于RBBLC模型的中文事件抽取方法"", 《南京师范大学学报(工程技术版)》 * |
石磊: ""基于BERT-BiLSTM-CRF的突发公共卫生事件抽取研究"", 《哈尔滨师范大学自然科学学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115577112A (en) * | 2022-12-09 | 2023-01-06 | 成都索贝数码科技股份有限公司 | Event extraction method and system based on type perception gated attention mechanism |
CN115577112B (en) * | 2022-12-09 | 2023-04-18 | 成都索贝数码科技股份有限公司 | Event extraction method and system based on type perception gated attention mechanism |
CN117094397A (en) * | 2023-10-19 | 2023-11-21 | 北京大数据先进技术研究院 | Fine granularity event information extraction method, device and product based on shorthand |
CN117094397B (en) * | 2023-10-19 | 2024-02-06 | 北京大数据先进技术研究院 | Fine granularity event information extraction method, device and product based on shorthand |
Also Published As
Publication number | Publication date |
---|---|
CN115238685B (en) | 2023-03-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Teng et al. | Context-sensitive lexicon features for neural sentiment analysis | |
CN115238685B (en) | Combined extraction method for building engineering change events based on position perception | |
CN109657947B (en) | Enterprise industry classification-oriented anomaly detection method | |
CN110287323B (en) | Target-oriented emotion classification method | |
CN112001186A (en) | Emotion classification method using graph convolution neural network and Chinese syntax | |
CN111488931A (en) | Article quality evaluation method, article recommendation method and corresponding devices | |
CN110321563A (en) | Text emotion analysis method based on mixing monitor model | |
CN112837184A (en) | Project management system suitable for building engineering | |
CN109101490B (en) | Factual implicit emotion recognition method and system based on fusion feature representation | |
CN112597302B (en) | False comment detection method based on multi-dimensional comment representation | |
CN113742733B (en) | Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type | |
CN107688870A (en) | A kind of the classification factor visual analysis method and device of the deep neural network based on text flow input | |
CN113360582B (en) | Relation classification method and system based on BERT model fusion multi-entity information | |
CN112256866A (en) | Text fine-grained emotion analysis method based on deep learning | |
CN109522412A (en) | Text emotion analysis method, device and medium | |
CN107818173B (en) | Vector space model-based Chinese false comment filtering method | |
CN114997288A (en) | Design resource association method | |
CN112215629B (en) | Multi-target advertisement generating system and method based on construction countermeasure sample | |
CN114880427A (en) | Model based on multi-level attention mechanism, event argument extraction method and system | |
CN115017879A (en) | Text comparison method, computer device and computer storage medium | |
CN114547303A (en) | Text multi-feature classification method and device based on Bert-LSTM | |
CN111191029B (en) | AC construction method based on supervised learning and text classification | |
CN112347252A (en) | Interpretability analysis method based on CNN text classification model | |
CN112287119A (en) | Knowledge graph generation method for extracting relevant information of online resources | |
CN116258204A (en) | Industrial safety production violation punishment management method and system based on knowledge graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |