CN115238685A - Combined extraction method for building engineering change events based on position perception - Google Patents

Combined extraction method for building engineering change events based on position perception Download PDF

Info

Publication number
CN115238685A
CN115238685A CN202211166342.1A CN202211166342A CN115238685A CN 115238685 A CN115238685 A CN 115238685A CN 202211166342 A CN202211166342 A CN 202211166342A CN 115238685 A CN115238685 A CN 115238685A
Authority
CN
China
Prior art keywords
engineering change
sentence
character
text
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211166342.1A
Other languages
Chinese (zh)
Other versions
CN115238685B (en
Inventor
刘发贵
吴怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202211166342.1A priority Critical patent/CN115238685B/en
Publication of CN115238685A publication Critical patent/CN115238685A/en
Application granted granted Critical
Publication of CN115238685B publication Critical patent/CN115238685B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/08Construction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a combined extraction method of a building engineering change event based on position perception. The method comprises the following steps: acquiring a plurality of construction engineering change texts, and defining arguments and trigger words of engineering change events; preprocessing the engineering change text, and marking the word granularity of the engineering change text according to arguments and trigger words of the engineering change event; obtaining a prototype representation of an engineering change event; constructing a character feature coding module, strengthening the argument and the characters of the boundary position of the trigger word, and obtaining the character features with enhanced domain knowledge; constructing a sentence feature coding module to obtain sentence features for changing semantic perception; constructing a feature aggregation module to obtain deep character features with global context; and constructing a sequence marking module to carry out structural expression of the engineering change event. The invention is beneficial to improving the extraction effect of the construction engineering change event by fusing the domain knowledge semantics and sentence level characteristics in the character characteristics and utilizing the prior label knowledge.

Description

Combined extraction method for building engineering change events based on position perception
Technical Field
The invention belongs to the field of natural language processing and construction, and particularly relates to a combined extraction method of construction engineering change events based on position perception.
Background
Due to the characteristics of long-term performance, complexity and dynamics of the building engineering, the change of the engineering quantity is easily caused by the change of construction requirements, construction environment, construction progress, construction quality and the like during the development of projects. Engineering change is a key link in the construction process, the change intention of the engineering change needs to be evaluated in time, change instructions need to be strictly executed, and change data need to be effectively recorded, otherwise, the engineering change not only causes economic loss of building project participation units, but also seriously affects the building quality and safety. Therefore, effective management of the engineering change event is a necessary condition for smooth completion of the construction project.
Engineering change events are usually present in unstructured documents and contain descriptions of the change targets and their associated properties, called key elements of the engineering change event, such as: change objects, place of ownership, change modes, and the like. The key elements are extracted, so that the change requirements can be acquired quickly, the change events can be stored comprehensively and structurally, and the engineering management level is improved.
Currently, event extraction methods are mostly based on sentence level recognition. The invention discloses an automatic extraction and classification method and system for building construction process constraints, which extract the process constraints in building code articles. However, the case when the text contains a plurality of sentences and the event elements are dispersed in the context is ignored. Zheng S et al studied document level events in the financial field, which was done with labels mentioned based on known standard entities. However, in the building field, there is no fully mature and universal entity recognition model, and event extraction cannot be realized in a classification manner by utilizing entity class information (Zheng S, cao W, xu W, et al Doc2EDAG: an end-to-end document-level frame for Chinese fine event extraction [ J ]. ArXiv prepropressin: 1904.07535, 2019.). In addition, although the construction engineering change event is unavoidable, the problem still belongs to a small sample compared with a news event and the like, so that the extraction difficulty is high.
Disclosure of Invention
The invention aims to intelligently extract key elements of engineering change events in the building field by using a natural language processing technology and perform structured expression. The invention integrates the argument roles of the engineering change events into the argument class labels, strengthens the character characteristics of the boundary positions of the arguments, simultaneously strengthens the sentence characteristics of different positions by combining the prototype representation of the engineering change events, and realizes the joint extraction of the document-level engineering change events.
The purpose of the invention is realized by at least one of the following technical solutions.
A combined extraction method for building engineering change events based on location awareness comprises the following steps:
s1: acquiring a plurality of construction engineering change texts, analyzing the construction engineering change texts, determining elements forming an engineering change event, and defining argument and trigger words of the engineering change event;
s2: preprocessing the engineering change text, and marking the word granularity of the engineering change text according to arguments and trigger words of the engineering change event;
s3: obtaining a prototype representation of the engineering change event according to the labeled label information of the engineering change text;
s4: constructing a character feature coding module, strengthening arguments and characters at the boundary positions of trigger words by using element semantics of engineering change events to obtain the character features with enhanced domain knowledge;
s5: constructing a sentence feature coding module, and characterizing and sensing sentences containing event arguments and trigger words in the engineering change document by using the prototype of the engineering change event to obtain sentence features of change semantic sensing;
s6: constructing a feature aggregation module, and fusing sentence features and character features to obtain deep character features with global context;
s7: and constructing a sequence marking module, learning label dependence information corresponding to deep character features, obtaining an optimal label sequence in the engineering change text, and performing structural expression of engineering change events.
Further, in step S1, elements constituting the project change event include a building component, a building site, a building floor, a building space, an attribute, a numerical attribute value, an object attribute target, and a change mode for the building component;
defining roles of arguments of the engineering change event, including building components, building sites, building floors, building spaces, attributes, numerical attribute values and object attribute targets;
the trigger word of the engineering change event is a word for expressing a change mode.
Further, in step S2, the engineering change text includes a plurality of sentences, and the sentence preprocessing is performed on the engineering change text according to punctuations in the engineering change text, where each action is a single sentence;
obtaining the number of sentences and the length of sentences included in each preprocessed engineering change text; and marking the character granularity of the engineering change text by adopting a 'BIO' three-bit sequence marking method.
Further, the 'BIO' three-digit sequence labeling method is adopted to label the character granularity of the engineering change text, which specifically comprises the following steps:
marking words of the engineering change text, of which the categories belong to arguments, as arguments, wherein the labels are the roles of the words; and marking the words expressing the change modes in the engineering change text as trigger words.
Further, step S3 specifically includes the following steps:
s3.1: analyzing the importance of the elements constituting the engineering change event, and assigning a weight to the elements determined in step S1;
s3.2: for each engineering change text, for the argument and the trigger word labeled in the step S2, obtaining a corresponding word vector, and calculating the semantic representation of the engineering change event included in the engineering change text according to the weight of the corresponding word vectoreThe method comprises the following steps:
Figure 532757DEST_PATH_IMAGE001
wherein, the first and the second end of the pipe are connected with each other,
Figure 981056DEST_PATH_IMAGE002
for changing in text for engineeringElements are labeled arguments or word vectors of trigger words,
Figure 432897DEST_PATH_IMAGE003
as arguments or trigger words
Figure 562527DEST_PATH_IMAGE004
The weight of (a) is determined,
Figure 908058DEST_PATH_IMAGE005
the number of different element types;
s3.3: calculating semantic representations of the engineering change events corresponding to all the engineering change texts acquired in the step S1, and obtaining prototype representations of the engineering change events through average calculation
Figure DEST_PATH_IMAGE006
Figure 386312DEST_PATH_IMAGE007
Wherein the content of the first and second substances,
Figure 450083DEST_PATH_IMAGE008
the number of the engineering change texts acquired in the step S1.
Further, step S4 includes the steps of:
s4.1: the ith sentence consisting of T characters in the engineering change text
Figure 383404DEST_PATH_IMAGE009
The input word vector model obtains the ith sentence
Figure 724387DEST_PATH_IMAGE009
Each character vector in
Figure 452171DEST_PATH_IMAGE010
Figure 3238DEST_PATH_IMAGE010
Indicating the ith sentence in the project change textSeed of Japanese apricot
Figure 474671DEST_PATH_IMAGE009
T =1 to t;
s4.2: extracting the ith sentence through a coding layer
Figure 420893DEST_PATH_IMAGE009
Hidden feature of each character in
Figure 585158DEST_PATH_IMAGE011
S4.3: using word segmentation tool to the ith sentence
Figure 623521DEST_PATH_IMAGE009
Performing word segmentation to obtain the ith sentence
Figure 633065DEST_PATH_IMAGE009
The semantic information of each word is fused to the hidden characteristics of each character in the word by different character position weights to obtain the character characteristics with enhanced domain knowledge
Figure 948640DEST_PATH_IMAGE012
Figure 18227DEST_PATH_IMAGE013
Wherein the content of the first and second substances,
Figure 543886DEST_PATH_IMAGE014
representing the ith sentence
Figure 357122DEST_PATH_IMAGE009
The semantic vector of the jth word in the sentence, p represents the semantic vector constituting the ith sentence
Figure 776471DEST_PATH_IMAGE009
The p-th character of the jth word,
Figure 282538DEST_PATH_IMAGE015
to representThe ith sentence
Figure 29914DEST_PATH_IMAGE009
The position weight corresponding to the p character of the j word;
s4.4: and repeating the steps S4.1-S4.3 for all sentences in the engineering change text to obtain the character features of the enhanced domain knowledge of each character in all the sentences in the engineering change text.
Further, the character position weight
Figure 646841DEST_PATH_IMAGE015
The calculation formula is as follows:
Figure 671428DEST_PATH_IMAGE016
wherein the content of the first and second substances,softmax() Represents a normalized exponential function;Normalization() Represents the maximum and minimum normalization;
Figure 348397DEST_PATH_IMAGE017
representing the ith sentence
Figure 583070DEST_PATH_IMAGE009
The number of characters included in the jth word.
Further, step S5 includes the steps of:
s5.1: establishing a coding layer capable of extracting partial characteristics of sentences, and learning the ith sentence in step S4.2
Figure 738107DEST_PATH_IMAGE009
The hidden feature of each character in the sentence is obtained to obtain the semantic representation of the ith sentence
Figure 630584DEST_PATH_IMAGE009
S5.2: according to the position sequence of the ith sentence in the document, splicing a position vector
Figure 478454DEST_PATH_IMAGE018
Obtaining the sentence representation of the ith sentence
Figure 200422DEST_PATH_IMAGE019
Figure 159151DEST_PATH_IMAGE020
S5.3: calculating the correlation between the prototype representation of the engineering change event and the representation of the sentence, strengthening the event sentence characteristics including event arguments or trigger words in the document, and inhibiting irrelevant non-event sentence characteristics to obtain the sentence characteristics perceived by the change semantics of the ith sentence
Figure 892752DEST_PATH_IMAGE021
Figure 911523DEST_PATH_IMAGE022
Wherein the content of the first and second substances,
Figure 120788DEST_PATH_IMAGE023
the relevance of the sentence representation of the ith sentence obtained by using an attention mechanism and the engineering change event prototype representation;
s5.4: and repeating the steps S5.1-S5.3 for all sentences in the engineering change text to obtain the sentence characteristics of the change semantic perception of all sentences in the engineering change text.
Further, in step S6, for all sentences in the engineering change text, the sentence features of the change semantic perception are fused to the character features of the enhanced domain knowledge in the sentence, so as to obtain deep character features with global context
Figure 883208DEST_PATH_IMAGE024
The method comprises the following steps:
Figure 986162DEST_PATH_IMAGE025
wherein the content of the first and second substances,
Figure 910255DEST_PATH_IMAGE026
features of deep characters representing the t-th character in the i-th sentence,
Figure 810078DEST_PATH_IMAGE027
a feature fusion method is shown.
Further, step S7 includes the steps of:
s7.1: inputting deep character features corresponding to all characters in the engineering change text into the conditional random field model, learning the dependency relationship among the labels marked in the step S2 of all the characters in the engineering change text, and acquiring the optimal label sequence of the engineering change text to be extracted;
s7.2: and extracting words of corresponding label categories according to the optimal label sequence of the engineering change text, filling the words into an engineering change event expression template, and performing structured expression on the engineering change event.
Compared with the prior art, the invention at least has the following beneficial effects:
1. the invention fuses the semantics of the words with domain knowledge into the character features through the character feature coding module, which is beneficial to improving the identification accuracy of the engineering change event argument and the trigger words in the building domain; the characteristics of the boundary characters in the words are enhanced through position weight, and the problems of argument and error in recognition of the triggering word boundary are reduced;
2. according to the invention, through the sentence characteristic coding module, the attention mechanism is used for strengthening the characteristics of the event sentences containing arguments and trigger words, and inhibiting the characteristics of the non-event sentences; meanwhile, the sentence level features are fused into the character level features, and deep character features with global context are constructed;
3. according to the invention, by constructing the prototype representation of the engineering change event, the model can utilize the prior label knowledge, and is beneficial to extracting the engineering change small sample event.
Drawings
Fig. 1 is a flowchart of a combined extraction method for a construction engineering change event according to the present invention;
FIG. 2 is a diagram of engineering change text labeling and structured representation in an embodiment of the method of the present invention;
FIG. 3 is a diagram of a character encoding module in an embodiment of the method of the present invention;
FIG. 4 is a diagram of sentence encoding modules in an embodiment of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples, but the present invention is not limited thereto.
Example 1:
a combined extraction method of building engineering change events based on location awareness, as shown in fig. 1, includes the following steps:
s1: acquiring a plurality of construction engineering change texts, analyzing the construction engineering change texts, determining elements forming an engineering change event, and defining argument and trigger words of the engineering change event;
the elements forming the engineering change event comprise building components, a building site, building floors, building spaces, attributes, numerical attribute values, object attribute targets and change modes of the building components;
determining that the key elements forming the engineering change event comprise 8 types, wherein the meanings of the elements 1-8 are as follows:
element 1: determining or planning a building element that requires modification;
element 2: the main project or the supporting project to which the building component belongs;
element 3: the floor location of the building element;
element 4: the spatial position of the building element;
element 5: the architectural attribute that the architectural element requires modification;
element 6: the numerical requirement that the building data attribute needs to be changed;
element 7: the target requirement that the attribute of the building object needs to be changed;
element 8: the concrete mode of modification is applied to the building member.
It is understood that the engineering change event elements include, but are not limited to, elements 1 to 8.
Defining roles of arguments of the engineering change event, wherein the roles comprise building components, building sites, building floors, building spaces, attributes, numerical attribute values and object attribute targets;
the trigger word of the engineering change event is a word for expressing a change mode.
S2: preprocessing the engineering change text, and marking the word granularity of the engineering change text according to arguments and trigger words of the engineering change event;
the engineering change text comprises a plurality of sentences, and the sentence preprocessing is carried out on the engineering change text according to punctuation marks in the engineering change text, wherein each action is a single sentence;
obtaining the number of sentences and the length of sentences included in each preprocessed engineering change text; and marking the character granularity of the engineering change text by adopting a 'BIO' three-bit sequence marking method.
The 'BIO' three-bit sequence marking method is adopted to mark character granularity on the engineering change text, and the method specifically comprises the following steps:
marking words of the engineering change text, of which the categories belong to arguments, as arguments, wherein the labels are the roles of the words; and marking the words expressing the change modes in the engineering change text as trigger words.
In this embodiment, as shown in fig. 2, an engineering change text is marked with arguments and trigger words.
S3: the method comprises the following steps of obtaining a prototype representation of an engineering change event according to label information of an annotated engineering change text, and specifically comprises the following steps:
s3.1: analyzing the importance of the elements constituting the engineering change event, and assigning a weight to the elements determined in step S1;
s3.2: for each engineering change text, for the argument and the trigger word labeled in step S2, the corresponding word vector is obtained, and the semantic representation of the engineering change event included in the engineering change text is calculated according to the weight thereof, which is specifically as follows:
Figure 172926DEST_PATH_IMAGE028
wherein the content of the first and second substances,
Figure 881119DEST_PATH_IMAGE029
for elements in the engineering change text, i.e. annotated arguments or word vectors of trigger words,
Figure 241694DEST_PATH_IMAGE030
as arguments or trigger words
Figure 363233DEST_PATH_IMAGE031
The weight of (a) is determined,
Figure 529773DEST_PATH_IMAGE005
the number of different element categories; in this embodiment, a word vector is obtained by querying a word2vec word vector model;
in this embodiment, as shown in fig. 2, the labeled engineering change event expression words are:
(modification, fifth seat, floor 41, logistics area, ceiling, height);
determining that the element 1 is an element which needs to be clarified by analyzing information expressed by the engineering change text, and setting the argument 'building component' weight as a first level; setting the weights of the argument 'building site', 'building floor' and 'building space' as a second grade for 2-4 times of the position information elements of the building component; elements 5-8 have the possibility of not being clear in the engineering change intention type text or exist in a drawing, so the argument 'attribute', 'numerical attribute value', 'object attribute target' and the trigger weight are set to be in a third grade;
s3.3: calculating semantic representations of the engineering change events corresponding to all the engineering change texts acquired in the step S1, and obtaining prototype representations of the engineering change events through average calculation:
Figure 154789DEST_PATH_IMAGE032
wherein, the first and the second end of the pipe are connected with each other,
Figure 374680DEST_PATH_IMAGE033
the number of the project change texts acquired in the step S1.
S4: the method comprises the following steps of constructing a character feature coding module, strengthening argument and triggering characters of word boundary positions by using element semantics of engineering change events to obtain field knowledge enhanced character features, and comprising the following steps of:
s4.1: the ith sentence consisting of T characters in the engineering change text
Figure 717937DEST_PATH_IMAGE034
The input word vector model obtains the ith sentence
Figure 688167DEST_PATH_IMAGE034
Each character vector in
Figure 433269DEST_PATH_IMAGE035
Figure 73329DEST_PATH_IMAGE035
Representing the ith sentence in the engineering change text
Figure 903881DEST_PATH_IMAGE034
T =1 to t; in the present embodiment, each character vector is obtained using the bidirectional language model BERT
Figure 412223DEST_PATH_IMAGE035
S4.2: in this embodiment, the ith sentence is extracted through the Bi-LSTM
Figure 11832DEST_PATH_IMAGE034
Hidden feature of each character in
Figure 72060DEST_PATH_IMAGE036
S4.3: using word segmentation tool to the ith sentence
Figure 389909DEST_PATH_IMAGE034
Performing word segmentation to obtain the ith sentence
Figure 436363DEST_PATH_IMAGE034
The semantic information of each word is fused to the hidden characteristics of each character in the word by different character position weights to obtain the character characteristics with enhanced domain knowledge
Figure 156057DEST_PATH_IMAGE037
Figure 137920DEST_PATH_IMAGE038
Wherein the content of the first and second substances,
Figure 943065DEST_PATH_IMAGE039
represents the ith sentence
Figure 793209DEST_PATH_IMAGE034
The semantic vector of the jth word in the sentence, p represents the semantic vector constituting the ith sentence
Figure 367410DEST_PATH_IMAGE034
The p-th character of the jth word,
Figure 996538DEST_PATH_IMAGE040
representing the ith sentence
Figure 85716DEST_PATH_IMAGE034
The position weight corresponding to the p character of the j word;
the character position weight
Figure 677235DEST_PATH_IMAGE040
The calculation formula is as follows:
Figure 43625DEST_PATH_IMAGE041
wherein the content of the first and second substances,
Figure 429607DEST_PATH_IMAGE042
representing the ith sentence
Figure 271661DEST_PATH_IMAGE034
The number of characters included in the jth word.
In the invention, because the sequence marking task needs to identify a section of continuous characters, boundary identification errors are easy to generate; in addition, the conventional sequence labeling method based on word granularity cannot effectively utilize the self semantics of the words, and is easy to cause category identification errors.
Therefore, in step S4.3, semantic information of the word is introduced, and meanwhile, the character position weight is designed, so that larger word information can be fused with the character features of the boundary position in the word, thereby enhancing the argument and the character features of the boundary position of the trigger word.
In this embodiment, fig. 3 is a schematic diagram of a character feature encoding module.
S4.4: and repeating the steps S4.1-S4.3 for all sentences in the engineering change text to obtain the character characteristics of each character in all sentences in the engineering change text with enhanced domain knowledge.
S5: the method comprises the following steps of constructing a sentence characteristic coding module, sensing a sentence containing an event argument and a trigger word in an engineering change document by using a prototype representation of an engineering change event to obtain a sentence characteristic of changing semantic perception, and comprising the following steps:
s5.1: establishing a coding layer capable of extracting partial characteristics of sentences, in this embodiment, learning the ith sentence in step S4.2 by using convolutional neural network CNN
Figure 401291DEST_PATH_IMAGE034
The hidden feature of each character in the sentence obtains the semantic representation of the ith sentence
Figure 871456DEST_PATH_IMAGE034
S5.2: according to the position sequence of the ith sentence in the document, splicing a position vector
Figure 693918DEST_PATH_IMAGE043
In this embodiment, the position vector in the transform model is used to obtain the sentence representation of the ith sentence
Figure 757689DEST_PATH_IMAGE044
Figure 691010DEST_PATH_IMAGE045
S5.3: calculating the correlation between the prototype representation of the engineering change event and the representation of the sentence, strengthening the event sentence characteristics including event arguments or trigger words in the document, and inhibiting irrelevant non-event sentence characteristics to obtain the sentence characteristics perceived by the change semantics of the ith sentence
Figure 766413DEST_PATH_IMAGE046
Figure 759777DEST_PATH_IMAGE047
Wherein the content of the first and second substances,
Figure 310844DEST_PATH_IMAGE048
in order to obtain the correlation between the sentence representation of the ith sentence and the engineering change event prototype representation by using the attention mechanism, the attention mechanism is adopted in the embodiment to calculate as follows:
Figure 782277DEST_PATH_IMAGE049
wherein the content of the first and second substances,
Figure 728498DEST_PATH_IMAGE050
indicating the first in the project alteration text
Figure 892763DEST_PATH_IMAGE051
The feature vector of an individual sentence is,
Figure 665547DEST_PATH_IMAGE052
for the number of sentences included in the engineering change text, the calculation formula of the attention score is as follows:
Figure 612775DEST_PATH_IMAGE053
wherein, the first and the second end of the pipe are connected with each other,
Figure 990667DEST_PATH_IMAGE054
is a dimension of the feature vector of the sentence,scoredenotes the attention score and T denotes the transpose.
S5.4: and repeating the steps S5.1-S5.3 for all sentences in the engineering change text to obtain the sentence characteristics of the change semantic perception of all sentences in the engineering change text.
In this embodiment, fig. 4 is a schematic diagram of a sentence feature encoding module.
S6: constructing a feature aggregation module, and fusing sentence features and character features to obtain deep character features with global context;
for all sentences in the engineering change text, in the embodiment, the sentence characteristics of the change semantic perception are fused to the character characteristics of the field knowledge enhancement in the sentences through a gating mechanism to obtain deep character characteristics with global context
Figure 122571DEST_PATH_IMAGE055
The method comprises the following steps:
Figure 585913DEST_PATH_IMAGE056
wherein, the first and the second end of the pipe are connected with each other,
Figure 586099DEST_PATH_IMAGE057
features of deep characters representing the t-th character in the i-th sentence,Gaterepresenting the weight, and the calculation formula is as follows:
Figure 818497DEST_PATH_IMAGE058
wherein the content of the first and second substances,
Figure 121302DEST_PATH_IMAGE059
and
Figure 71941DEST_PATH_IMAGE060
which is indicative of a training parameter that is,Sigmoid() As a function of the squeeze.
S7: constructing a sequence labeling module, learning label dependence information corresponding to deep character features, obtaining an optimal label sequence in an engineering change text, and performing structural expression of engineering change events, wherein the method comprises the following steps of:
s7.1: inputting deep character features corresponding to all characters in the engineering change text into the conditional random field model, learning the dependency relationship among the labels marked in the step S2 of all the characters in the engineering change text, and acquiring the optimal label sequence of the engineering change text to be extracted;
s7.2: and extracting words of corresponding label categories according to the optimal label sequence of the engineering change text, filling the words into an engineering change event expression template, and performing structured expression on the engineering change event.
In this embodiment, the engineering change event information expression template is shown in fig. 2.
In this embodiment, the method of the present invention is compared with a commonly used method based on word features in combination with engineering change text data from a real building project.
TABLE 1 results of experiments on data sets of the method of the invention and other classical methods
BiLSTM-CRF BERT-BiLSTM-CRF The method of the invention
Micro Recall 52.84 71.67 76.11
The experimental result shows that the method has the highest recall ratio on a small-scale real engineering change text data set, and is greatly improved compared with two common methods.
Example 2:
in the present embodiment, the difference from embodiment 1 is that:
s4.1: sentence composed of T characters
Figure 626550DEST_PATH_IMAGE061
Input word vector model obtaining sentence
Figure 713455DEST_PATH_IMAGE061
Each character vector in
Figure 452741DEST_PATH_IMAGE062
(ii) a In the present embodiment, each character vector is obtained using the dynamic word vector model ElMo
Figure 625096DEST_PATH_IMAGE062
S4.2: extracting sentences through coding layer
Figure 465620DEST_PATH_IMAGE061
Hidden feature of each character in
Figure 672610DEST_PATH_IMAGE063
(ii) a In this embodiment, the hidden feature of each character is extracted by Bi-directional gated cyclic unit Bi-GRU
Figure 317218DEST_PATH_IMAGE063
Example 3:
in the present embodiment, the difference from embodiment 1 is that:
s6: constructing a feature aggregation module, and fusing sentence features and character features to obtain deep character features with global context;
in the embodiment, the sentence features with changed semantic perception are fused to the character features with enhanced domain knowledge in the sentence in a splicing mode to obtain deep character features with global context
Figure 976870DEST_PATH_IMAGE064
The method comprises the following steps:
Figure 873281DEST_PATH_IMAGE065
the above operation is repeated for all sentences in the engineering change text.
Example 4:
in this embodiment, the difference from embodiment 1 is that:
s5.1: establishing a coding layer capable of extracting partial features of the sentence, in the embodiment, learning the hidden features of the sentence characters in the step S4.2 by adopting a convolutional neural network GCN to obtain the semantic representation of the sentence
Figure 934778DEST_PATH_IMAGE066
S5.3: calculating the correlation between the engineering change event prototype representation and the sentence representation, strengthening the event sentence characteristics including event arguments or trigger words in the document, and inhibiting irrelevant non-event sentence characteristics to obtain the sentence characteristics perceived by the change semantics:
Figure 15867DEST_PATH_IMAGE067
wherein the content of the first and second substances,
Figure 162814DEST_PATH_IMAGE068
for the relevance of the sentence features to the prototype representation of the engineering change event, the attention mechanism is used in this embodiment to calculate as:
Figure 846606DEST_PATH_IMAGE069
wherein, the first and the second end of the pipe are connected with each other,
Figure 762609DEST_PATH_IMAGE070
indicating the first in the project alteration text
Figure 14599DEST_PATH_IMAGE071
The feature vector of an individual sentence is,
Figure 320946DEST_PATH_IMAGE072
for the number of sentences included in the engineering change text, the calculation formula of the attention score is as follows:
Figure 887057DEST_PATH_IMAGE073
wherein the content of the first and second substances,
Figure 657567DEST_PATH_IMAGE074
Figure 80458DEST_PATH_IMAGE075
Figure 201998DEST_PATH_IMAGE076
t represents transposition for the training parameters; tanh () is a hyperbolic tangent function.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A combined extraction method for building engineering change events based on location awareness is characterized by comprising the following steps:
s1: acquiring a plurality of construction engineering change texts, analyzing the construction engineering change texts, determining elements forming an engineering change event, and defining argument and trigger words of the engineering change event;
s2: preprocessing the engineering change text, and marking the word granularity of the engineering change text according to arguments and trigger words of the engineering change event;
s3: obtaining a prototype representation of the engineering change event according to the labeled label information of the engineering change text;
s4: constructing a character feature coding module, and strengthening argument and characters of the boundary position of a trigger word by using element semantics of an engineering change event to obtain the character features with enhanced domain knowledge;
s5: constructing a sentence characteristic coding module, and representing and sensing a sentence containing an event argument and a trigger word in an engineering change document by using a prototype of an engineering change event to obtain a sentence characteristic of changing semantic perception;
s6: constructing a feature aggregation module, and fusing sentence features and character features to obtain deep character features with global context;
s7: and constructing a sequence labeling module, learning label dependence information corresponding to deep character features, obtaining an optimal label sequence in the engineering change text, and performing structural expression of engineering change events.
2. The combined extraction method for the construction engineering change event based on the location awareness as claimed in claim 1, wherein in the step S1, the elements constituting the construction change event comprise a construction member, a construction site, a construction floor, a construction space, an attribute, a numerical attribute value, an object attribute target and a change mode of the construction member;
defining roles of arguments of the engineering change event, wherein the roles comprise building components, building sites, building floors, building spaces, attributes, numerical attribute values and object attribute targets;
the trigger word of the engineering change event is a word for expressing a change mode.
3. The combined extraction method of construction engineering change events based on location awareness as claimed in claim 2, wherein in step S2, the engineering change text comprises a plurality of sentences, and the sentence preprocessing is performed on the engineering change text according to punctuation marks in the engineering change text, each action being a single sentence;
obtaining the number of sentences and the length of sentences included in each preprocessed engineering change text; and marking the character granularity of the engineering change text by adopting a 'BIO' three-bit sequence marking method.
4. The combined extraction method for the construction engineering change events based on the location awareness as claimed in claim 3, wherein the 'BIO' three-bit sequence labeling method is adopted to label the character granularity of the engineering change text, and specifically comprises the following steps:
marking words of the engineering change text, of which the categories belong to arguments, as arguments, wherein the labels are the roles of the words; and marking the words expressing the change modes in the engineering change text as trigger words.
5. The combined extraction method for the construction engineering change event based on the location awareness as claimed in claim 4, wherein the step S3 specifically comprises the following steps:
s3.1: analyzing the importance of the elements constituting the engineering change event, and assigning a weight to the elements determined in step S1;
s3.2: for each engineering change text, for the argument and the trigger word labeled in the step S2, obtaining a corresponding word vector, and calculating the semantic representation of the engineering change event included in the engineering change text according to the weight of the corresponding word vectoreThe method comprises the following steps:
Figure 973629DEST_PATH_IMAGE001
wherein, the first and the second end of the pipe are connected with each other,
Figure 625190DEST_PATH_IMAGE002
for elements in the engineering change text, namely labeled arguments or word vectors of trigger words,
Figure 936086DEST_PATH_IMAGE003
as arguments or trigger words
Figure 3399DEST_PATH_IMAGE002
The weight of (a) is calculated,
Figure 552192DEST_PATH_IMAGE004
the number of different element types;
s3.3: calculating semantic representations of the engineering change events corresponding to all the engineering change texts acquired in the step S1, and obtaining prototype representations of the engineering change events through average calculation
Figure 171392DEST_PATH_IMAGE005
Figure 172846DEST_PATH_IMAGE006
Wherein, the first and the second end of the pipe are connected with each other,
Figure 293118DEST_PATH_IMAGE007
the number of the engineering change texts acquired in the step S1.
6. The combined extraction method for the building engineering change event based on the location awareness as claimed in claim 1, wherein the step S4 comprises the steps of:
s4.1: the ith sentence consisting of T characters in the engineering change text
Figure 696418DEST_PATH_IMAGE008
The input word vector model obtains the ith sentence
Figure 486519DEST_PATH_IMAGE008
Each character vector in
Figure 975269DEST_PATH_IMAGE009
Figure 384385DEST_PATH_IMAGE009
Representing the ith sentence in the engineering change text
Figure 642191DEST_PATH_IMAGE008
T =1 to t;
s4.2: extracting the ith sentence through a coding layer
Figure 868773DEST_PATH_IMAGE010
Hidden feature of each character in
Figure 844819DEST_PATH_IMAGE011
S4.3: using word segmentation tool to the ith sentence
Figure 542779DEST_PATH_IMAGE010
Performing word segmentation to obtain the ith sentence
Figure 920671DEST_PATH_IMAGE008
The semantic information of each word is fused to the hidden characteristics of each character in the word by different character position weights to obtain the character characteristics with enhanced domain knowledge
Figure 52575DEST_PATH_IMAGE012
Figure 515917DEST_PATH_IMAGE013
Wherein the content of the first and second substances,
Figure 1256DEST_PATH_IMAGE014
representing the ith sentence
Figure 295971DEST_PATH_IMAGE015
The semantic vector of the jth word in the sentence, p represents the semantic vector forming the ith sentence
Figure 802039DEST_PATH_IMAGE015
The p-th character of the jth word,
Figure 674049DEST_PATH_IMAGE016
representing the ith sentence
Figure 290975DEST_PATH_IMAGE017
The position weight corresponding to the p character of the j word;
s4.4: and repeating the steps S4.1-S4.3 for all sentences in the engineering change text to obtain the character characteristics of each character in all sentences in the engineering change text with enhanced domain knowledge.
7. The method as claimed in claim 6, wherein the character position weight is extracted from the combined extraction of the construction engineering change events based on position perception
Figure 440197DEST_PATH_IMAGE018
The calculation formula is as follows:
Figure 117166DEST_PATH_IMAGE019
wherein the content of the first and second substances,softmax() Represents a normalized exponential function;Normalization() Expressing the maximum and minimum normalization;
Figure DEST_PATH_IMAGE020
represents the ith sentence
Figure 961625DEST_PATH_IMAGE021
The number of characters included in the jth word.
8. The combined extraction method for the building engineering change event based on the location awareness as claimed in claim 1, wherein the step S5 comprises the steps of:
s5.1: establishing a coding layer capable of extracting partial characteristics of sentences, and learning the ith sentence in the step S4.2
Figure 178980DEST_PATH_IMAGE022
The hidden feature of each character in the sentence is obtained to obtain the semantic representation of the ith sentence
Figure 385970DEST_PATH_IMAGE021
S5.2: according to the position sequence of the ith sentence in the document, splicing a position vector
Figure 499420DEST_PATH_IMAGE023
Obtaining sentence representation of ith sentence
Figure 584837DEST_PATH_IMAGE024
Figure 605883DEST_PATH_IMAGE025
S5.3: calculating the correlation between the prototype representation of the engineering change event and the representation of the sentence, strengthening the event sentence characteristics including event arguments or trigger words in the document, and inhibiting irrelevant non-event sentence characteristics to obtain the sentence characteristics perceived by the change semantics of the ith sentence
Figure 401800DEST_PATH_IMAGE026
Figure 358255DEST_PATH_IMAGE027
Wherein the content of the first and second substances,
Figure 505203DEST_PATH_IMAGE028
the relevance of the sentence representation of the ith sentence obtained by using an attention mechanism and the prototype representation of the engineering change event;
s5.4: and repeating the steps S5.1-S5.3 for all sentences in the engineering change text to obtain the sentence characteristics of the change semantic perception of all sentences in the engineering change text.
9. The method as claimed in claim 1, wherein in step S6, for all sentences in the engineering change text, the sentence features of semantic perception of change are fused to the character features of domain knowledge enhancement in the sentence, so as to obtain deep character features with global context
Figure 329939DEST_PATH_IMAGE029
The method comprises the following steps:
Figure 245943DEST_PATH_IMAGE030
wherein the content of the first and second substances,
Figure 356987DEST_PATH_IMAGE031
features of deep characters representing the t-th character in the i-th sentence,
Figure 991231DEST_PATH_IMAGE032
a feature fusion method is shown.
10. The method for jointly extracting the construction engineering change events based on the location awareness according to any one of claims 1 to 9, wherein the step S7 comprises the following steps:
s7.1: inputting deep character features corresponding to all characters in the engineering change text into the conditional random field model, learning the dependency relationship among the labels marked in the step S2 of all the characters in the engineering change text, and acquiring the optimal label sequence of the engineering change text to be extracted;
s7.2: and extracting words of corresponding label categories according to the optimal label sequence of the engineering change text, filling the words into an engineering change event expression template, and performing structured expression on the engineering change event.
CN202211166342.1A 2022-09-23 2022-09-23 Combined extraction method for building engineering change events based on position perception Active CN115238685B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211166342.1A CN115238685B (en) 2022-09-23 2022-09-23 Combined extraction method for building engineering change events based on position perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211166342.1A CN115238685B (en) 2022-09-23 2022-09-23 Combined extraction method for building engineering change events based on position perception

Publications (2)

Publication Number Publication Date
CN115238685A true CN115238685A (en) 2022-10-25
CN115238685B CN115238685B (en) 2023-03-21

Family

ID=83667029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211166342.1A Active CN115238685B (en) 2022-09-23 2022-09-23 Combined extraction method for building engineering change events based on position perception

Country Status (1)

Country Link
CN (1) CN115238685B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115577112A (en) * 2022-12-09 2023-01-06 成都索贝数码科技股份有限公司 Event extraction method and system based on type perception gated attention mechanism
CN117094397A (en) * 2023-10-19 2023-11-21 北京大数据先进技术研究院 Fine granularity event information extraction method, device and product based on shorthand

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183030A (en) * 2020-10-10 2021-01-05 深圳壹账通智能科技有限公司 Event extraction method and device based on preset neural network, computer equipment and storage medium
CN112528676A (en) * 2020-12-18 2021-03-19 南开大学 Document-level event argument extraction method
KR20210124938A (en) * 2020-11-26 2021-10-15 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. Event extraction method, device, electronic equipment and storage medium
CN113591483A (en) * 2021-04-27 2021-11-02 重庆邮电大学 Document-level event argument extraction method based on sequence labeling
CN114298053A (en) * 2022-03-10 2022-04-08 中国科学院自动化研究所 Event joint extraction system based on feature and attention mechanism fusion
CN114818721A (en) * 2022-06-30 2022-07-29 湖南工商大学 Event joint extraction model and method combined with sequence labeling

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183030A (en) * 2020-10-10 2021-01-05 深圳壹账通智能科技有限公司 Event extraction method and device based on preset neural network, computer equipment and storage medium
KR20210124938A (en) * 2020-11-26 2021-10-15 베이징 바이두 넷컴 사이언스 앤 테크놀로지 코., 엘티디. Event extraction method, device, electronic equipment and storage medium
CN112528676A (en) * 2020-12-18 2021-03-19 南开大学 Document-level event argument extraction method
CN113591483A (en) * 2021-04-27 2021-11-02 重庆邮电大学 Document-level event argument extraction method based on sequence labeling
CN114298053A (en) * 2022-03-10 2022-04-08 中国科学院自动化研究所 Event joint extraction system based on feature and attention mechanism fusion
CN114818721A (en) * 2022-06-30 2022-07-29 湖南工商大学 Event joint extraction model and method combined with sequence labeling

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHAO SHEN等: ""Joint Event Extraction Based on CNN-BiGRU and Attention Mechanism"", 《2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML)》 *
YUBO CHEN: ""Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks"", 《PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 *
杨登辉等: ""基于RBBLC模型的中文事件抽取方法"", 《南京师范大学学报(工程技术版)》 *
石磊: ""基于BERT-BiLSTM-CRF的突发公共卫生事件抽取研究"", 《哈尔滨师范大学自然科学学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115577112A (en) * 2022-12-09 2023-01-06 成都索贝数码科技股份有限公司 Event extraction method and system based on type perception gated attention mechanism
CN115577112B (en) * 2022-12-09 2023-04-18 成都索贝数码科技股份有限公司 Event extraction method and system based on type perception gated attention mechanism
CN117094397A (en) * 2023-10-19 2023-11-21 北京大数据先进技术研究院 Fine granularity event information extraction method, device and product based on shorthand
CN117094397B (en) * 2023-10-19 2024-02-06 北京大数据先进技术研究院 Fine granularity event information extraction method, device and product based on shorthand

Also Published As

Publication number Publication date
CN115238685B (en) 2023-03-21

Similar Documents

Publication Publication Date Title
Teng et al. Context-sensitive lexicon features for neural sentiment analysis
CN115238685B (en) Combined extraction method for building engineering change events based on position perception
CN109657947B (en) Enterprise industry classification-oriented anomaly detection method
CN110287323B (en) Target-oriented emotion classification method
CN112001186A (en) Emotion classification method using graph convolution neural network and Chinese syntax
CN111488931A (en) Article quality evaluation method, article recommendation method and corresponding devices
CN110321563A (en) Text emotion analysis method based on mixing monitor model
CN112837184A (en) Project management system suitable for building engineering
CN109101490B (en) Factual implicit emotion recognition method and system based on fusion feature representation
CN112597302B (en) False comment detection method based on multi-dimensional comment representation
CN113742733B (en) Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type
CN107688870A (en) A kind of the classification factor visual analysis method and device of the deep neural network based on text flow input
CN113360582B (en) Relation classification method and system based on BERT model fusion multi-entity information
CN112256866A (en) Text fine-grained emotion analysis method based on deep learning
CN109522412A (en) Text emotion analysis method, device and medium
CN107818173B (en) Vector space model-based Chinese false comment filtering method
CN114997288A (en) Design resource association method
CN112215629B (en) Multi-target advertisement generating system and method based on construction countermeasure sample
CN114880427A (en) Model based on multi-level attention mechanism, event argument extraction method and system
CN115017879A (en) Text comparison method, computer device and computer storage medium
CN114547303A (en) Text multi-feature classification method and device based on Bert-LSTM
CN111191029B (en) AC construction method based on supervised learning and text classification
CN112347252A (en) Interpretability analysis method based on CNN text classification model
CN112287119A (en) Knowledge graph generation method for extracting relevant information of online resources
CN116258204A (en) Industrial safety production violation punishment management method and system based on knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant