CN113779988A - Method for extracting process knowledge events in communication field - Google Patents
Method for extracting process knowledge events in communication field Download PDFInfo
- Publication number
- CN113779988A CN113779988A CN202111045480.XA CN202111045480A CN113779988A CN 113779988 A CN113779988 A CN 113779988A CN 202111045480 A CN202111045480 A CN 202111045480A CN 113779988 A CN113779988 A CN 113779988A
- Authority
- CN
- China
- Prior art keywords
- event
- communication field
- information
- sequence
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 81
- 238000004891 communication Methods 0.000 title claims abstract description 59
- 230000008569 process Effects 0.000 title claims abstract description 43
- 238000000605 extraction Methods 0.000 claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 25
- 238000013528 artificial neural network Methods 0.000 claims abstract description 19
- 238000013526 transfer learning Methods 0.000 claims abstract description 12
- 230000001537 neural effect Effects 0.000 claims abstract description 10
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 238000002372 labelling Methods 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 14
- 239000011159 matrix material Substances 0.000 claims description 9
- 230000011218 segmentation Effects 0.000 claims description 9
- 239000003550 marker Substances 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000007476 Maximum Likelihood Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims description 2
- 230000007704 transition Effects 0.000 claims description 2
- 230000004927 fusion Effects 0.000 abstract description 3
- 238000004140 cleaning Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 241000288105 Grus Species 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method for extracting process knowledge events in the communication field, which belongs to the technical field of information and comprises the following steps: s1: defining the event extraction problem of the communication field, and selecting an extraction method; s2: preprocessing data of process knowledge in the communication field; s3: constructing a hierarchical sequence marking task; s4: obtaining an enhanced semantic representation by using a pre-training model and a graph convolution neural network; s5: obtaining long-distance semantic dependency information of semantic representation by using a gated neural unit; s6: solving the tag bias problem existing in step S5 using conditional random fields; s7: and extracting the events by using a communication field process knowledge event extraction model based on model transfer learning and a graph convolution neural network. According to the method, a fusion model based on model transfer learning and a graph convolution neural network is used for extracting semantic representation, a gated neural unit is used for acquiring long-distance dependence information of the semantic representation, and a conditional random field is used for overcoming the problem of label deviation.
Description
Technical Field
The invention relates to the technical field of information, in particular to a method for extracting process knowledge events in the communication field.
Background
In recent years, with the rapid development of natural language processing technology and the wide application of 5G technology in the communication field, how to extract process-class knowledge in the communication field by using natural language processing technology becomes a more and more interesting problem. The communication field event extraction aims at extracting specified event attributes from unstructured process class knowledge texts, is one of important steps of text structuring, and is also the basis of extensive application of knowledge maps.
The current task of extracting events in the communication field generally faces the problems of high marking cost and rare marking samples. Therefore, the method realizes high-quality event extraction under the condition of less labeled samples, and has important value for wide application of event extraction technology in the communication field. The event extraction method based on the rule is adopted, and the difficulty in formulating a unified and complete rule is high due to the uncertainty of a language structure; however, most of the traditional machine learning is based on supervised learning, and the problems of diversified event element expressions and defect of event elements (missing extraction and defect of text description) are difficult to process. Therefore, a method for extracting process knowledge events in the communication field is provided.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method can more intuitively present the logic of fault occurrence in the communication operation and maintenance process by extracting and combing the 'event' and 'event relation' in the communication operation and maintenance process, and is an important prerequisite for follow-up fault troubleshooting and one-line processing of the current network fault.
The invention solves the technical problems through the following technical scheme, and the invention comprises the following steps:
s1: problem definition and method selection for extracting process knowledge events in the communication field;
s2: preprocessing data of process knowledge in the communication field;
s3: constructing a hierarchical sequence marking task;
s4: obtaining an enhanced semantic representation using a pre-trained model and a graph convolution neural network (GCN);
s5: obtaining long-distance semantic dependency information of semantic representations by using a gated neural unit (GRU);
s6: overcoming the label bias problem present in step S5 using Conditional Random Fields (CRF);
s7: and carrying out a data extraction process on the communication field process knowledge event extraction model based on model transfer learning and the graph convolution neural network.
Further, the question definition in step S1 refers to which event elements are extracted from the event text corpus of the communication domain; after the requirement analysis is performed, the problem definition of event extraction is given: firstly, whether relevant communication field events exist or not is identified from text corpora, secondly, related elements of the relevant events are identified, and finally, the role played by each element is determined. The method of event extraction selects a pipeline extraction method.
Furthermore, the data preprocessing in step S2 refers to operations such as data cleaning, data deduplication, and text normalization, so as to solve the problems of data non-normalization, feature omission, and labeling error existing in the original manual labeling data.
Further, the hierarchical sequence labeling in step S3 refers to a task of dividing data into structured 8 types of 30-level addresses by using a programming means based on the event type and the event element in the data Schema according to the event type, and performing sequence labeling by using a BIO labeling policy.
Furthermore, the pre-training model in step S4 is obtained by running an auto-supervised learning method on the basis of the mass corpus; the pre-training model provides a model for other tasks to perform model transfer learning, and the model can be used as a feature extractor after being finely adjusted or fixed according to the tasks; the graph convolution neural network is characterized in that a message transmission mechanism and a message receiving mechanism are used on a graph, deep relationships among nodes in the graph are mined through convolution operation on the graph, and accordingly enhanced node semantic representations can be obtained.
Furthermore, the gated neural unit (GRU) in step S5 is an LSTM simplified model with a reset gate and an update gate, and the GRU has fewer parameters and higher efficiency; the long range semantic dependency is determined by the reset gate and update gate characteristics in the GRU, the reset gate determining how to combine the new input information with the previous memory, the update gate defining the amount of previous memory saved to the current time step.
Further, the Conditional Random Field (CRF) in step S6 is determined according to K feature functions, corresponding K weights, and an observation sequence x ═ { x ═ x1,x2,x3,...,xn}, predicting the optimal marker sequence
Further, the data flow in step S7 is that the text corpus passes through a text input layer, a pre-training model layer, a GCN layer, a GRU layer, a CRF layer, and an output layer to obtain a prediction result of the process knowledge event extraction in the communication field.
Compared with the prior art, the invention has the following advantages: according to the method for extracting the process knowledge event in the communication field, the semantic representation extraction is realized by using a fusion model based on model transfer learning and a graph convolution neural network, long-distance dependence information of the semantic representation is acquired by using a gated neural unit (GRU), and the problem of label deviation is solved by using a Conditional Random Field (CRF).
Drawings
FIG. 1 is a schematic diagram illustrating a masking prediction performed on a communication field process knowledge corpus by a pre-training model according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a convolutional neural network (GCN) implementation of multi-level semantic updating according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a gated neural unit (GRU) data update process at time t according to an embodiment of the present invention;
fig. 4 is an execution flow diagram of a communication domain process knowledge event extraction model based on model transfer learning and graph convolution neural network in the embodiment of the present invention.
Detailed Description
The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.
As shown in fig. 1 to 4, the present embodiment provides a technical solution: a communication field process knowledge event extraction method based on model transfer learning and graph convolution neural network comprises the following steps:
s1: problem definition for event extraction
The communications domain event extraction problem can be described as: firstly, whether relevant communication field events exist or not is identified from text corpora, secondly, related elements of the relevant events are identified, and finally, the role played by each element is determined. As shown in the example sentence, inputting the example sentence into the event extraction model requires extracting E1, a1, a2, A3, and a 4. Where E1 is called a trigger, A1, A2, A3, and A4 are called event elements.
Example sentence: XX cell 8 o' clock later apple terminal (A1) access (E1)5G network (A2) failure (A3)
The trigger in the example sentence is "access", which indicates an event containing a software and hardware exception (software hardware fault), and the roles of the extracted elements a1, a2 and A3 in the software and hardware exception event respectively represent the position of the fault, the fault-related object and the fault state.
At present, there are two methods for extracting events based on machine learning, which are a pipeline method (The pipeline Approach) and a Joint learning method (The Joint Approach). The pipeline method is that trigger word recognition and event type determination are carried out in the first stage, and event element recognition is carried out in the second stage, namely E1 in an example sentence is extracted first, the type of the event is judged, and then A1, A2, A3 and A4 extraction is carried out according to an E1 event framework. The joint learning method is to extract trigger words and event elements simultaneously, namely extract E1, A1, A2, A3 and A4 in example sentences simultaneously. The pipeline method has the phenomenon of error propagation, and if the judgment of the event type in the first stage is wrong, the extraction of the event elements in the second stage is wrong, so the effect of using the joint learning method under the normal condition is better than that of the pipeline method. However, in the communication field, event types and event elements are both extremely complex, and a situation that an event trigger word and an event element are overlapped often occurs, for example, the trigger word is "access", and the event element is "access terminal", so that the model easily marks the two occurrences of "access" as the event trigger word, thereby causing failure of the extraction task. Therefore, in the invention, a pipeline method is adopted to respectively and independently model the event trigger words and the event elements, and the trigger words and the event elements contained in the events are sequentially extracted. Experiments prove that in the event extraction task of the communication field with complex context, the effect of the pipeline method is obviously better than that of the joint learning method.
S2: data preprocessing for communication domain process class knowledge
There are many process-class knowledge in the field of communications, and common process-class knowledge event types include: index deterioration, software and hardware abnormity, data acquisition, verification, configuration fault, external event, machine adjustment, machine operation and the like. The training data set identifies the type to which the event belongs, and trigger words and event elements contained in each type in detail. "pair _ ID" is an event pair ID. The statistics and examples of the training dataset and the validation dataset are shown in tables 1 and 2, and table 3, respectively.
Table 1 data set statistics
Table 2 training data set example
Table 3 verification data set example
id | Text |
15001 | But not connected to an antenna feeder |
15002 | Even more unfortunately the handset is mobile! |
15003 | But because the site is not configured with the adjacent region of the GERAN system |
15004 | A phenomenon of RRC establishment failure occurs |
15005 | Many sites cannot be adjusted |
Because the process knowledge in the communication field is generated in real time in the operation process of the equipment, the problems of large amount of data non-specification, missing features, even wrong labeling and the like still exist after manual cleaning and labeling. Therefore, data preprocessing work is required before data is input into the model.
S21: and (6) data cleaning. The marked communication field knowledge extraction corpus text has part of obvious marking errors, and the part of data needs to be directly discarded in the data cleaning process.
S22: and (5) data deduplication. Sometimes, the device records the state of the same device within a certain time, so that a lot of repeated data is generated. The large amount of repeated data affects the sample distribution, so the repeated data is subjected to the deduplication operation in the preprocessing loop.
S23: and (5) text normalization. The problem of non-uniformity of all half angles of texts and symbols existing in the sample is uniformly processed.
S3: constructing hierarchical sequence annotation tasks
The sequence tagging problem is the most common problem in NLP, and most NLP problems can be converted into the sequence tagging problem. By "sequence labeling", it is meant for a one-dimensional linear input sequence: x ═ x1,x2,x3,...,xnEach element in the linear sequence is labeled with a certain label y in the set of labels { y ═ y }1,y2,y3,...,yn}. Therefore, the sequence labeling task is essentially a problem of classifying each element in a linear sequence according to the context.
The invention takes event extraction as a sequence labeling task, the labeling strategy adopts a BIO strategy, B represents the beginning of an event element, I represents the middle or ending word of the event element, and O represents an irrelevant word.
Based on the event types and event elements in the data Schema, the data is divided into 8 types of structured 30-level addresses by using a programming means, trigger words under 8 types of categories are marked by using A-H, event elements under each category are marked by using An-Hn, the starting position is marked by using B, and the middle and ending positions are marked by using I. The labeling specifications are shown in table 4.
Table 4: sequence annotation tag definition rules
Label (R) | Definition of |
B-A1 | Starting position of SoftHardwarreFault |
I-A1 | Middle position or end position of SoftHardwarreFault |
B-A2 | Subject start position |
I-A2 | Subject middle or end position |
B-A3 | Object/Object start position |
I-A3 | Object/Object intermediate position or end position |
B-A4 | State Start position |
I-A4 | State intermediate or end position |
B-A5 | Owner start position |
I-A5 | Middle of OwnerPosition or end position |
B-B1 | Starting position of CollectData |
I-B1 | CollectData intermediate or end bit |
B-B2 | Object/Object start position |
I-B2 | Object/Object middle position or end bit |
B-B3 | Source starting position |
I-B3 | Source middle or end bit |
... | ... |
The rules for labeling are as follows:
trigger_dic={'SoftHardwareFault':'A1','CollectData':'B1','Check':'C1','SettingFault':'D1','ExternalFault':'E1','SetMachine':'F1','Operate':'G1','IndexFault':'H1'}
a_dic={'Subject':'A2','Object':'A3','object':'A3','State':'A4','Owner':'A5'}
b_dic={'Object':'B2','object':'B2','Source':'B3'}
c_dic={'Object':'C2','object':'C2','Standard':'C3'}
d_dic={'Setting':'D2','Owner':'D3','Reference':'D4','State':'D5'}
e_dic={'State':'E2'}
f_dic={'Object':'F2','object':'F2','Network':'F3','InitialState':'F4','FinalState':'F5'}
g_dic={'Object':'G2','object':'G2','Owner':'G3'}
h_dic={'Index':'H2','Owner':'H3','State':'H4'}
s4: obtaining enhanced semantic representations using a pre-trained model and a graph-convolution neural network (GCN)
S41: the invention uses a pre-training model to perform token processing on the corpus. Firstly, a word segmentation method is used for segmenting a text into word units such as a word or a phrase. Because the original corpus needs to be input into the pre-trained model, word segmentation is required. For a given sentence x ═ x1,x2,x3,...,xnIn which xiThe ith character representing the input sentence, n is the number of characters contained in the sentence, when the ith character is input into the layer, a word segmentation device provided with a pre-training model is used, and when the word segmentation device processes Chinese, the word segmentation is carried out by using the characters. After word segmentation, 0 is supplemented to the uniform length after the sequence, and a word segmentation result omega is obtainedi∈Rm(i=1,2,...,m),ωiIs the ith mark in the sentence, and m is the length of the sequence after the sentence is participated.
The invention obtains semantic representation of corpus text by using a pre-training model. The pre-training model provides a model for other tasks to perform model transfer learning, and the model can be used as a feature extractor after being finely adjusted or fixed according to the tasks. The method uses the position coding of characters as the input of a transformer, randomly masks a part of words in a corpus, and then predicts the masked words by using the context information, so that the meaning of the masked words can be better understood according to the corpus context. In the present invention, a method for performing masking prediction on a communication field process class knowledge corpus by using a pre-training model is shown in fig. 1.
The method uses the graph convolution neural network to enhance the node semantic representation of the pre-training model.
S42: taking event trigger words and event elements in the corpus text as nodes, wherein the nodes in each corpus have adjacent edge relation, and constructing a dynamic network topological graph; the feature information of each node is transmitted to the neighbor nodes after being transformed through a message transmission mechanism, the extraction transformation of the feature information of the nodes is realized, and then the transmission information of the neighbor nodes around each target node is aggregated through a message receiving mechanism:
wherein A represents an adjacency matrix of the target node, D represents a degree matrix of the target node, and H(l)For node semantic representation at level l, H(l+1)For node semantic representation at level l +1, W(l)And sigma is a Sigmoid activation function which is a characteristic weight matrix of the l-th layer target node.
Deep relationships between nodes in the graph can be mined through a multi-layer convolution operation on the graph, wherein the l +1 layer convolution operation on the graph is shown in FIG. 2.
S5: obtaining long-range semantically-dependent information of semantic representations using gated neural units (GRUs)
The gated neural unit (GRU) is a simplified model of LSTM, and the addition of the GRU layer in the model is long-distance semantic dependency information for obtaining input vectors. Compared with the recurrent neural network, the GRU solves the problems of gradient disappearance and gradient explosion, and compared with the LSTM model, the LSTM has three gates (an input gate, a forgetting gate and an output gate), and the GRU only has two gates (a reset gate and an update gate). The GRU does not control and retain internal memory and does not have an output gate in the LSTM, so that the neural network GRU with the same structure has fewer parameters, higher efficiency and better effect on many tasks. Because the event main bodies appearing in the event in the communication field are more and more complex, the characteristic information extracted by the pre-training model can be further refined through the GRU, and the relation between the event elements in the remote communication field can be obtained.
As shown in fig. 3, the GRU uses an update gate and a reset gate. The reset gate determines how the new input information is combined with the previous memory, the update gate defining the amount of the previous memory to be saved to the current time step.
S51: at time step t, the information of the last time step is aggregated with the information of the current time step using an update gate:
zt=σ(W(z)xt+U(z)ht-1)
wherein h ist-1Latent semantic output, x, representing the t-1 time steptOriginal semantic input, W, representing the t-th time step(z)And U(z)All are weight matrixes, and sigma is a Sigmoid activation function.
S52: at time step t, the information of the last time step is aggregated with the information of the current time step using a reset gate:
zt=σ(W(r)xt+U(r)ht-1)
in updating the gate, different weight matrices are used in semantic aggregation.
S53: at time step t, the magnitude of the reset gating value is measured, and the previous information is determined to be reserved or forgotten:
h′ t=tanh(Wxt+zt⊙Uht-1)
wherein tanh is the activation function, ztA result of resetting the gate, an indication of a Hadamard product.
S54: at the end of time step t, the size of the update gating value is measured, and the information transmitted to the next unit is determined to be the hidden layer information or the information of the update gate of the previous time step:
ht=zt⊙ht-1+(1-zt)⊙h′t
s6: overcoming the problem of label bias present in step S5 using Conditional Random Fields (CRF)
Conditional Random Fields (CRFs) can model the dependency between tags, overcoming the tag bias problem. After the feature vector H with the obtained context semantic dependency is transmitted into a linear layer, a matrix P of m x n is obtained, wherein Pi,jThe score of the jth label in the ith label is m is the number of labels, and n is the maximum length of the sentence set by the model.
Inputting: k feature functions of the model, corresponding K weights, and an observation sequence x ═ x1,x2,x3,...,xn}
S61: and (3) performing modeling initialization on the CRF, and solving the probability of each mark combination at the initial position:
wherein i represents the position of the mark, δ1(l) Representing the probability, w, of each combination of marks at the initial positionkCRF model parameters for the k-th pair of marker combinations, fkA feature function representing the k-th pair of mark combinations,indicating that delta is at the initial position1(l) The marker value reaching the maximum value;
s62: recursion is performed on i 1, 2.. n, and the maximum non-normalized probability of each marker l 1, 2.. n to position i is obtained:
wherein, deltai+1(l) Represents the maximum value of the unnormalized probability, δ, corresponding to each possible value of the label/at position ii(j) Representing the probability of marking a combination of j at location i;
s63: recording the path of the maximum value of the unnormalized probability:
wherein,is expressed as deltai+1(l) The marker value of the position i which reaches the maximum value;
s64: and when i finishes traversing all n corpus samples, stopping the recursion process, wherein the maximum value of the non-normalized probability is as follows:
simultaneously, the end point of the optimal path can be obtained:
s65: and backtracking the end point of the optimal path to obtain the whole optimal path:
connecting the nodes on the optimal path to obtain the marking sequence of the optimal path:
s7: communication field process knowledge event extraction model data flow based on model transfer learning and graph convolution neural network
As shown in fig. 4, the communication field process knowledge event extraction model based on model transfer learning and graph convolution neural network mainly includes a text input layer, a BERT pre-training model layer, a GRU layer, a CRF layer, and an output layer.
S71: in a text input layer, performing data preprocessing on an original corpus, and performing token-based processing on the text corpus according to Chinese characters by using a BERT pre-training model to obtain a token-based word segmentation result:
x={x1,x2,x3,...,xn}
wherein xiThe ith character of the input sentence is represented, n is the number of characters contained in the sentence, and if the length of the sentence is less than n, 0 is automatically supplemented to the same length after the sequence.
S72: obtaining semantic representation of an original text corpus through a BERT pre-training model layer;
s73: semantic characterization of a pre-trained modelInputting a GRU layer to extract key information of event corpus of the communication field:
S74: the dependence relationship between labels is modeled by using a Conditional Random Field (CRF), so that the label deviation problem is solved. For one input sequence: x ═ x1,x2,x3,...,xnCalculating an input label sequence y ═ y1,y2,y3,...,ynTo the target tag sequenceLOSS value score of (a):
wherein A is a transition probability matrix, Ai,jIs the conversion score, P, of label i to label ji,jAnd m is the maximum length of a single text corpus.
During the training process, optimize { xi,yiThe maximum likelihood function of (c):
where λ and Θ are canonical parameters, P (y)i|xi) Is the probability from the original sequence to the predicted sequence. And obtaining the final sequence label according to the maximum likelihood function.
S75: in the output layer, the output label y adjusted according to the CRF layer is { y ═ y1,y2,y3,...,ynAnd the sequence labeling label definition rule defined in the table 4 converts the output label y into a BIO label, thereby obtaining an event trigger word and an event element of the corpus text in the inference process.
To sum up, in the method for extracting process-class knowledge events in the communication field according to the embodiment, the semantic representation extraction is realized by using a fusion model based on model migration learning and a graph convolution neural network, long-distance dependency information of the semantic representation is acquired by using a gated neural unit (GRU), and the problem of label deviation is overcome by using a Conditional Random Field (CRF).
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.
Claims (10)
1. A method for extracting process knowledge events in the communication field is characterized by comprising the following steps:
s1: defining the event extraction problem of the communication field, and selecting an extraction method;
s2: preprocessing data of process knowledge in the communication field;
s3: constructing a hierarchical sequence marking task;
s4: obtaining an enhanced semantic representation by using a pre-training model and a graph convolution neural network;
s5: obtaining long-distance semantic dependency information of semantic representation by using a gated neural unit;
s6: solving the tag bias problem existing in step S5 using conditional random fields;
s7: and extracting the events by using a communication field process knowledge event extraction model based on model transfer learning and a graph convolution neural network to obtain a prediction result of communication field process knowledge event extraction.
2. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S1, the specific process of defining the communication domain event extraction problem is as follows:
s11: identifying whether a related communication field event exists from the text corpus;
s12: identifying a related element of the related event;
s13: the role each element plays is determined.
3. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S1, the event extraction method selects a pipeline extraction method, and the pipeline extraction method is adopted to model the event trigger words and the event elements separately, and sequentially extract the trigger words and the event elements included in the event.
4. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S2, the data preprocessing process is specifically as follows:
s21: data cleansing
Extracting part of obviously labeled error data existing in the corpus text for the labeled communication field knowledge, and directly abandoning the part of data;
s22: data deduplication
Executing deduplication operation on duplicate data generated by recording the same equipment state within a certain time;
s23: text normalization
And processing the text and the symbol which are not uniform in all half angles in the sample data into a uniform format.
5. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S3, the hierarchical sequence annotation refers to a task of dividing data into structured 8 types of 30-level addresses based on the event type and the event element in the data Schema according to the event type, and performing sequence annotation by using a BIO annotation policy.
6. The method for extracting process-class knowledge events in the communication field according to claim 5, wherein: b in the BIO labeling strategy represents the beginning of an event element, I represents a middle or ending word of the event element, and O represents an irrelevant word.
7. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: the specific process of step S4 is as follows:
s41: obtaining semantic representation of a corpus text by using a pre-training model;
s42: taking event trigger words and event elements in the corpus text as nodes, wherein the nodes in each corpus have adjacent edge relation, and constructing a dynamic network topological graph; the feature information of each node is transmitted to the neighbor nodes after being transformed through a message transmission mechanism, the extraction transformation of the feature information of the nodes is realized, and then the transmission information of the neighbor nodes around each target node is aggregated through a message receiving mechanism:
wherein A represents an adjacency matrix of the target node, D represents a degree matrix of the target node, and H(l)For node semantic representation at level l, H(l+1)For node semantic representation at level l +1, W(l)And sigma is a Sigmoid activation function which is a characteristic weight matrix of the l-th layer target node.
8. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: the specific process of step S5 is as follows:
s51: at time step t, the information of the last time step is aggregated with the information of the current time step using an update gate:
zt=σ(W(z)xt+U(z)ht-1)
wherein h ist-1Latent semantic output, x, representing the t-1 time steptOriginal semantic input, W, representing the t-th time step(z)And U(z)All the weight matrixes are weight matrixes, and sigma is a Sigmoid activation function;
s52: at time step t, the information of the last time step is aggregated with the information of the current time step using a reset gate:
zt=σ(W(r)xt+U(r)ht-1);
s53: at time step t, the magnitude of the reset gating value is measured, and the previous information is determined to be reserved or forgotten:
h′t=tanh(Wxt+zt⊙Uht-1)
wherein,tan h is the activation function, ztA result of reset gate, which indicates a Hadamard product;
s54: at the end of time step t, the size of the update gating value is measured, and the information transmitted to the next unit is determined to be the hidden layer information or the information of the update gate of the previous time step:
ht=zt⊙ht-1+(1-zt)⊙h′t。
9. the method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S6, the conditional random field is generated according to K feature functions, corresponding K weights, and an observation sequence x ═ x1,x2,x3,...,xn}, predicting the optimal marker sequence
10. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S7, the communication field process knowledge event extraction model based on model transfer learning and graph convolution neural network includes a text input layer, a BERT pre-training model layer, a GRU layer, a CRF layer, and an output layer, and the specific working process of the model is as follows:
s71: in a text input layer, performing data preprocessing on an original corpus, and performing token-based processing on the text corpus according to Chinese characters by using a BERT pre-training model to obtain a token-based word segmentation result:
x={x1,x2,x3,...,xn}
wherein x isiThe ith character of the input sentence is represented, n is the number of characters contained in the sentence, and if the length of the sentence is less than n, 0 is automatically supplemented to the same length after the sequence;
s72: obtaining semantic representation of an original text corpus through a BERT pre-training model layer;
s73: will be pre-trainedSemantic representation of exercise modelsInputting a GRU layer to extract key information of event corpus of the communication field:
s74: the conditional random field is used to model the dependency between tags, overcoming the tag bias problem, for an input sequence: x ═ x1,x2,x3,...,xnCalculating an input label sequence y ═ y1,y2,y3,...,ynTo the target tag sequence y ═LOSS value score of (a):
wherein A is a transition probability matrix, Ai,jIs the conversion score, P, of label i to label ji,jRepresenting the score of the jth label in the ith mark, wherein m is the maximum length of a single text corpus;
during the training process, optimize { xi,yiThe maximum likelihood function of (c):
where λ and Θ are canonical parameters, P (y)i|xi) Obtaining a final sequence label according to a maximum likelihood function for the probability from the original sequence to the predicted sequence;
s75: in the output layer, the output label y adjusted according to the CRF layer is { y ═ y1,y2,y3,...,ynAnd converting the labels of the BIO, and outputting event trigger words and event elements.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111045480.XA CN113779988A (en) | 2021-09-07 | 2021-09-07 | Method for extracting process knowledge events in communication field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111045480.XA CN113779988A (en) | 2021-09-07 | 2021-09-07 | Method for extracting process knowledge events in communication field |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113779988A true CN113779988A (en) | 2021-12-10 |
Family
ID=78841673
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111045480.XA Pending CN113779988A (en) | 2021-09-07 | 2021-09-07 | Method for extracting process knowledge events in communication field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113779988A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114943221A (en) * | 2022-04-11 | 2022-08-26 | 哈尔滨工业大学(深圳) | Construction method of segment pointer interaction model and social sensing disaster monitoring method |
CN115860002A (en) * | 2022-12-27 | 2023-03-28 | 中国人民解放军国防科技大学 | Combat task generation method and system based on event extraction |
CN116049345A (en) * | 2023-03-31 | 2023-05-02 | 江西财经大学 | Document-level event joint extraction method and system based on bidirectional event complete graph |
CN118277574A (en) * | 2024-06-04 | 2024-07-02 | 中国人民解放军国防科技大学 | Event extraction model and military event type prediction method |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110134757A (en) * | 2019-04-19 | 2019-08-16 | 杭州电子科技大学 | A kind of event argument roles abstracting method based on bull attention mechanism |
CN110232413A (en) * | 2019-05-31 | 2019-09-13 | 华北电力大学(保定) | Insulator image, semantic based on GRU network describes method, system, device |
CN112001185A (en) * | 2020-08-26 | 2020-11-27 | 重庆理工大学 | Emotion classification method combining Chinese syntax and graph convolution neural network |
CN112101028A (en) * | 2020-08-17 | 2020-12-18 | 淮阴工学院 | Multi-feature bidirectional gating field expert entity extraction method and system |
CN112149421A (en) * | 2020-09-23 | 2020-12-29 | 云南师范大学 | Software programming field entity identification method based on BERT embedding |
CN112148888A (en) * | 2020-09-18 | 2020-12-29 | 南京邮电大学 | Knowledge graph construction method based on graph neural network |
CN112765952A (en) * | 2020-12-28 | 2021-05-07 | 大连理工大学 | Conditional probability combined event extraction method under graph convolution attention mechanism |
CN112883714A (en) * | 2021-03-17 | 2021-06-01 | 广西师范大学 | ABSC task syntactic constraint method based on dependency graph convolution and transfer learning |
CN113157916A (en) * | 2021-03-10 | 2021-07-23 | 南京航空航天大学 | Civil aviation emergency extraction method based on deep learning |
CN113312483A (en) * | 2021-06-02 | 2021-08-27 | 郑州大学 | Text classification method based on self-attention mechanism and BiGRU |
-
2021
- 2021-09-07 CN CN202111045480.XA patent/CN113779988A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110134757A (en) * | 2019-04-19 | 2019-08-16 | 杭州电子科技大学 | A kind of event argument roles abstracting method based on bull attention mechanism |
CN110232413A (en) * | 2019-05-31 | 2019-09-13 | 华北电力大学(保定) | Insulator image, semantic based on GRU network describes method, system, device |
CN112101028A (en) * | 2020-08-17 | 2020-12-18 | 淮阴工学院 | Multi-feature bidirectional gating field expert entity extraction method and system |
CN112001185A (en) * | 2020-08-26 | 2020-11-27 | 重庆理工大学 | Emotion classification method combining Chinese syntax and graph convolution neural network |
CN112148888A (en) * | 2020-09-18 | 2020-12-29 | 南京邮电大学 | Knowledge graph construction method based on graph neural network |
CN112149421A (en) * | 2020-09-23 | 2020-12-29 | 云南师范大学 | Software programming field entity identification method based on BERT embedding |
CN112765952A (en) * | 2020-12-28 | 2021-05-07 | 大连理工大学 | Conditional probability combined event extraction method under graph convolution attention mechanism |
CN113157916A (en) * | 2021-03-10 | 2021-07-23 | 南京航空航天大学 | Civil aviation emergency extraction method based on deep learning |
CN112883714A (en) * | 2021-03-17 | 2021-06-01 | 广西师范大学 | ABSC task syntactic constraint method based on dependency graph convolution and transfer learning |
CN113312483A (en) * | 2021-06-02 | 2021-08-27 | 郑州大学 | Text classification method based on self-attention mechanism and BiGRU |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114943221A (en) * | 2022-04-11 | 2022-08-26 | 哈尔滨工业大学(深圳) | Construction method of segment pointer interaction model and social sensing disaster monitoring method |
CN115860002A (en) * | 2022-12-27 | 2023-03-28 | 中国人民解放军国防科技大学 | Combat task generation method and system based on event extraction |
CN115860002B (en) * | 2022-12-27 | 2024-04-05 | 中国人民解放军国防科技大学 | Combat task generation method and system based on event extraction |
CN116049345A (en) * | 2023-03-31 | 2023-05-02 | 江西财经大学 | Document-level event joint extraction method and system based on bidirectional event complete graph |
CN116049345B (en) * | 2023-03-31 | 2023-10-10 | 江西财经大学 | Document-level event joint extraction method and system based on bidirectional event complete graph |
CN118277574A (en) * | 2024-06-04 | 2024-07-02 | 中国人民解放军国防科技大学 | Event extraction model and military event type prediction method |
CN118277574B (en) * | 2024-06-04 | 2024-09-03 | 中国人民解放军国防科技大学 | Event extraction model and military event type prediction method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108920622B (en) | Training method, training device and recognition device for intention recognition | |
CN106980683B (en) | Blog text abstract generating method based on deep learning | |
WO2023024412A1 (en) | Visual question answering method and apparatus based on deep learning model, and medium and device | |
CN109934261B (en) | Knowledge-driven parameter propagation model and few-sample learning method thereof | |
CN113779988A (en) | Method for extracting process knowledge events in communication field | |
CN110968660B (en) | Information extraction method and system based on joint training model | |
CN113672708B (en) | Language model training method, question-answer pair generation method, device and equipment | |
CN110555084B (en) | Remote supervision relation classification method based on PCNN and multi-layer attention | |
CN117009490A (en) | Training method and device for generating large language model based on knowledge base feedback | |
CN111966812B (en) | Automatic question answering method based on dynamic word vector and storage medium | |
CN112560432A (en) | Text emotion analysis method based on graph attention network | |
CN110275928B (en) | Iterative entity relation extraction method | |
CN113254675B (en) | Knowledge graph construction method based on self-adaptive few-sample relation extraction | |
CN113239143B (en) | Power transmission and transformation equipment fault processing method and system fusing power grid fault case base | |
CN116049387A (en) | Short text classification method, device and medium based on graph convolution | |
CN113742733A (en) | Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device | |
CN111145914A (en) | Method and device for determining lung cancer clinical disease library text entity | |
CN116402352A (en) | Enterprise risk prediction method and device, electronic equipment and medium | |
CN115687609A (en) | Zero sample relation extraction method based on Prompt multi-template fusion | |
CN117436451A (en) | Agricultural pest and disease damage named entity identification method based on IDCNN-Attention | |
CN115795060B (en) | Entity alignment method based on knowledge enhancement | |
CN116756605A (en) | ERNIE-CN-GRU-based automatic speech step recognition method, system, equipment and medium | |
CN116227603A (en) | Event reasoning task processing method, device and medium | |
CN115934944A (en) | Entity relation extraction method based on Graph-MLP and adjacent contrast loss | |
CN115687627A (en) | Two-step lightweight text classification method based on attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |