CN113779988A - Method for extracting process knowledge events in communication field - Google Patents

Method for extracting process knowledge events in communication field Download PDF

Info

Publication number
CN113779988A
CN113779988A CN202111045480.XA CN202111045480A CN113779988A CN 113779988 A CN113779988 A CN 113779988A CN 202111045480 A CN202111045480 A CN 202111045480A CN 113779988 A CN113779988 A CN 113779988A
Authority
CN
China
Prior art keywords
event
communication field
information
sequence
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111045480.XA
Other languages
Chinese (zh)
Inventor
李飞
周源
万飞
王德玄
夏献军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kedaduochuang Cloud Technology Co ltd
Original Assignee
Kedaduochuang Cloud Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kedaduochuang Cloud Technology Co ltd filed Critical Kedaduochuang Cloud Technology Co ltd
Priority to CN202111045480.XA priority Critical patent/CN113779988A/en
Publication of CN113779988A publication Critical patent/CN113779988A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for extracting process knowledge events in the communication field, which belongs to the technical field of information and comprises the following steps: s1: defining the event extraction problem of the communication field, and selecting an extraction method; s2: preprocessing data of process knowledge in the communication field; s3: constructing a hierarchical sequence marking task; s4: obtaining an enhanced semantic representation by using a pre-training model and a graph convolution neural network; s5: obtaining long-distance semantic dependency information of semantic representation by using a gated neural unit; s6: solving the tag bias problem existing in step S5 using conditional random fields; s7: and extracting the events by using a communication field process knowledge event extraction model based on model transfer learning and a graph convolution neural network. According to the method, a fusion model based on model transfer learning and a graph convolution neural network is used for extracting semantic representation, a gated neural unit is used for acquiring long-distance dependence information of the semantic representation, and a conditional random field is used for overcoming the problem of label deviation.

Description

Method for extracting process knowledge events in communication field
Technical Field
The invention relates to the technical field of information, in particular to a method for extracting process knowledge events in the communication field.
Background
In recent years, with the rapid development of natural language processing technology and the wide application of 5G technology in the communication field, how to extract process-class knowledge in the communication field by using natural language processing technology becomes a more and more interesting problem. The communication field event extraction aims at extracting specified event attributes from unstructured process class knowledge texts, is one of important steps of text structuring, and is also the basis of extensive application of knowledge maps.
The current task of extracting events in the communication field generally faces the problems of high marking cost and rare marking samples. Therefore, the method realizes high-quality event extraction under the condition of less labeled samples, and has important value for wide application of event extraction technology in the communication field. The event extraction method based on the rule is adopted, and the difficulty in formulating a unified and complete rule is high due to the uncertainty of a language structure; however, most of the traditional machine learning is based on supervised learning, and the problems of diversified event element expressions and defect of event elements (missing extraction and defect of text description) are difficult to process. Therefore, a method for extracting process knowledge events in the communication field is provided.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method can more intuitively present the logic of fault occurrence in the communication operation and maintenance process by extracting and combing the 'event' and 'event relation' in the communication operation and maintenance process, and is an important prerequisite for follow-up fault troubleshooting and one-line processing of the current network fault.
The invention solves the technical problems through the following technical scheme, and the invention comprises the following steps:
s1: problem definition and method selection for extracting process knowledge events in the communication field;
s2: preprocessing data of process knowledge in the communication field;
s3: constructing a hierarchical sequence marking task;
s4: obtaining an enhanced semantic representation using a pre-trained model and a graph convolution neural network (GCN);
s5: obtaining long-distance semantic dependency information of semantic representations by using a gated neural unit (GRU);
s6: overcoming the label bias problem present in step S5 using Conditional Random Fields (CRF);
s7: and carrying out a data extraction process on the communication field process knowledge event extraction model based on model transfer learning and the graph convolution neural network.
Further, the question definition in step S1 refers to which event elements are extracted from the event text corpus of the communication domain; after the requirement analysis is performed, the problem definition of event extraction is given: firstly, whether relevant communication field events exist or not is identified from text corpora, secondly, related elements of the relevant events are identified, and finally, the role played by each element is determined. The method of event extraction selects a pipeline extraction method.
Furthermore, the data preprocessing in step S2 refers to operations such as data cleaning, data deduplication, and text normalization, so as to solve the problems of data non-normalization, feature omission, and labeling error existing in the original manual labeling data.
Further, the hierarchical sequence labeling in step S3 refers to a task of dividing data into structured 8 types of 30-level addresses by using a programming means based on the event type and the event element in the data Schema according to the event type, and performing sequence labeling by using a BIO labeling policy.
Furthermore, the pre-training model in step S4 is obtained by running an auto-supervised learning method on the basis of the mass corpus; the pre-training model provides a model for other tasks to perform model transfer learning, and the model can be used as a feature extractor after being finely adjusted or fixed according to the tasks; the graph convolution neural network is characterized in that a message transmission mechanism and a message receiving mechanism are used on a graph, deep relationships among nodes in the graph are mined through convolution operation on the graph, and accordingly enhanced node semantic representations can be obtained.
Furthermore, the gated neural unit (GRU) in step S5 is an LSTM simplified model with a reset gate and an update gate, and the GRU has fewer parameters and higher efficiency; the long range semantic dependency is determined by the reset gate and update gate characteristics in the GRU, the reset gate determining how to combine the new input information with the previous memory, the update gate defining the amount of previous memory saved to the current time step.
Further, the Conditional Random Field (CRF) in step S6 is determined according to K feature functions, corresponding K weights, and an observation sequence x ═ { x ═ x1,x2,x3,...,xn}, predicting the optimal marker sequence
Figure BDA0003251035590000021
Further, the data flow in step S7 is that the text corpus passes through a text input layer, a pre-training model layer, a GCN layer, a GRU layer, a CRF layer, and an output layer to obtain a prediction result of the process knowledge event extraction in the communication field.
Compared with the prior art, the invention has the following advantages: according to the method for extracting the process knowledge event in the communication field, the semantic representation extraction is realized by using a fusion model based on model transfer learning and a graph convolution neural network, long-distance dependence information of the semantic representation is acquired by using a gated neural unit (GRU), and the problem of label deviation is solved by using a Conditional Random Field (CRF).
Drawings
FIG. 1 is a schematic diagram illustrating a masking prediction performed on a communication field process knowledge corpus by a pre-training model according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a convolutional neural network (GCN) implementation of multi-level semantic updating according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a gated neural unit (GRU) data update process at time t according to an embodiment of the present invention;
fig. 4 is an execution flow diagram of a communication domain process knowledge event extraction model based on model transfer learning and graph convolution neural network in the embodiment of the present invention.
Detailed Description
The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.
As shown in fig. 1 to 4, the present embodiment provides a technical solution: a communication field process knowledge event extraction method based on model transfer learning and graph convolution neural network comprises the following steps:
s1: problem definition for event extraction
The communications domain event extraction problem can be described as: firstly, whether relevant communication field events exist or not is identified from text corpora, secondly, related elements of the relevant events are identified, and finally, the role played by each element is determined. As shown in the example sentence, inputting the example sentence into the event extraction model requires extracting E1, a1, a2, A3, and a 4. Where E1 is called a trigger, A1, A2, A3, and A4 are called event elements.
Example sentence: XX cell 8 o' clock later apple terminal (A1) access (E1)5G network (A2) failure (A3)
The trigger in the example sentence is "access", which indicates an event containing a software and hardware exception (software hardware fault), and the roles of the extracted elements a1, a2 and A3 in the software and hardware exception event respectively represent the position of the fault, the fault-related object and the fault state.
At present, there are two methods for extracting events based on machine learning, which are a pipeline method (The pipeline Approach) and a Joint learning method (The Joint Approach). The pipeline method is that trigger word recognition and event type determination are carried out in the first stage, and event element recognition is carried out in the second stage, namely E1 in an example sentence is extracted first, the type of the event is judged, and then A1, A2, A3 and A4 extraction is carried out according to an E1 event framework. The joint learning method is to extract trigger words and event elements simultaneously, namely extract E1, A1, A2, A3 and A4 in example sentences simultaneously. The pipeline method has the phenomenon of error propagation, and if the judgment of the event type in the first stage is wrong, the extraction of the event elements in the second stage is wrong, so the effect of using the joint learning method under the normal condition is better than that of the pipeline method. However, in the communication field, event types and event elements are both extremely complex, and a situation that an event trigger word and an event element are overlapped often occurs, for example, the trigger word is "access", and the event element is "access terminal", so that the model easily marks the two occurrences of "access" as the event trigger word, thereby causing failure of the extraction task. Therefore, in the invention, a pipeline method is adopted to respectively and independently model the event trigger words and the event elements, and the trigger words and the event elements contained in the events are sequentially extracted. Experiments prove that in the event extraction task of the communication field with complex context, the effect of the pipeline method is obviously better than that of the joint learning method.
S2: data preprocessing for communication domain process class knowledge
There are many process-class knowledge in the field of communications, and common process-class knowledge event types include: index deterioration, software and hardware abnormity, data acquisition, verification, configuration fault, external event, machine adjustment, machine operation and the like. The training data set identifies the type to which the event belongs, and trigger words and event elements contained in each type in detail. "pair _ ID" is an event pair ID. The statistics and examples of the training dataset and the validation dataset are shown in tables 1 and 2, and table 3, respectively.
Table 1 data set statistics
Figure BDA0003251035590000041
Table 2 training data set example
Figure BDA0003251035590000042
Table 3 verification data set example
id Text
15001 But not connected to an antenna feeder
15002 Even more unfortunately the handset is mobile!
15003 But because the site is not configured with the adjacent region of the GERAN system
15004 A phenomenon of RRC establishment failure occurs
15005 Many sites cannot be adjusted
Because the process knowledge in the communication field is generated in real time in the operation process of the equipment, the problems of large amount of data non-specification, missing features, even wrong labeling and the like still exist after manual cleaning and labeling. Therefore, data preprocessing work is required before data is input into the model.
S21: and (6) data cleaning. The marked communication field knowledge extraction corpus text has part of obvious marking errors, and the part of data needs to be directly discarded in the data cleaning process.
S22: and (5) data deduplication. Sometimes, the device records the state of the same device within a certain time, so that a lot of repeated data is generated. The large amount of repeated data affects the sample distribution, so the repeated data is subjected to the deduplication operation in the preprocessing loop.
S23: and (5) text normalization. The problem of non-uniformity of all half angles of texts and symbols existing in the sample is uniformly processed.
S3: constructing hierarchical sequence annotation tasks
The sequence tagging problem is the most common problem in NLP, and most NLP problems can be converted into the sequence tagging problem. By "sequence labeling", it is meant for a one-dimensional linear input sequence: x ═ x1,x2,x3,...,xnEach element in the linear sequence is labeled with a certain label y in the set of labels { y ═ y }1,y2,y3,...,yn}. Therefore, the sequence labeling task is essentially a problem of classifying each element in a linear sequence according to the context.
The invention takes event extraction as a sequence labeling task, the labeling strategy adopts a BIO strategy, B represents the beginning of an event element, I represents the middle or ending word of the event element, and O represents an irrelevant word.
Based on the event types and event elements in the data Schema, the data is divided into 8 types of structured 30-level addresses by using a programming means, trigger words under 8 types of categories are marked by using A-H, event elements under each category are marked by using An-Hn, the starting position is marked by using B, and the middle and ending positions are marked by using I. The labeling specifications are shown in table 4.
Table 4: sequence annotation tag definition rules
Label (R) Definition of
B-A1 Starting position of SoftHardwarreFault
I-A1 Middle position or end position of SoftHardwarreFault
B-A2 Subject start position
I-A2 Subject middle or end position
B-A3 Object/Object start position
I-A3 Object/Object intermediate position or end position
B-A4 State Start position
I-A4 State intermediate or end position
B-A5 Owner start position
I-A5 Middle of OwnerPosition or end position
B-B1 Starting position of CollectData
I-B1 CollectData intermediate or end bit
B-B2 Object/Object start position
I-B2 Object/Object middle position or end bit
B-B3 Source starting position
I-B3 Source middle or end bit
... ...
The rules for labeling are as follows:
trigger_dic={'SoftHardwareFault':'A1','CollectData':'B1','Check':'C1','SettingFault':'D1','ExternalFault':'E1','SetMachine':'F1','Operate':'G1','IndexFault':'H1'}
a_dic={'Subject':'A2','Object':'A3','object':'A3','State':'A4','Owner':'A5'}
b_dic={'Object':'B2','object':'B2','Source':'B3'}
c_dic={'Object':'C2','object':'C2','Standard':'C3'}
d_dic={'Setting':'D2','Owner':'D3','Reference':'D4','State':'D5'}
e_dic={'State':'E2'}
f_dic={'Object':'F2','object':'F2','Network':'F3','InitialState':'F4','FinalState':'F5'}
g_dic={'Object':'G2','object':'G2','Owner':'G3'}
h_dic={'Index':'H2','Owner':'H3','State':'H4'}
s4: obtaining enhanced semantic representations using a pre-trained model and a graph-convolution neural network (GCN)
S41: the invention uses a pre-training model to perform token processing on the corpus. Firstly, a word segmentation method is used for segmenting a text into word units such as a word or a phrase. Because the original corpus needs to be input into the pre-trained model, word segmentation is required. For a given sentence x ═ x1,x2,x3,...,xnIn which xiThe ith character representing the input sentence, n is the number of characters contained in the sentence, when the ith character is input into the layer, a word segmentation device provided with a pre-training model is used, and when the word segmentation device processes Chinese, the word segmentation is carried out by using the characters. After word segmentation, 0 is supplemented to the uniform length after the sequence, and a word segmentation result omega is obtainedi∈Rm(i=1,2,...,m),ωiIs the ith mark in the sentence, and m is the length of the sequence after the sentence is participated.
The invention obtains semantic representation of corpus text by using a pre-training model. The pre-training model provides a model for other tasks to perform model transfer learning, and the model can be used as a feature extractor after being finely adjusted or fixed according to the tasks. The method uses the position coding of characters as the input of a transformer, randomly masks a part of words in a corpus, and then predicts the masked words by using the context information, so that the meaning of the masked words can be better understood according to the corpus context. In the present invention, a method for performing masking prediction on a communication field process class knowledge corpus by using a pre-training model is shown in fig. 1.
The method uses the graph convolution neural network to enhance the node semantic representation of the pre-training model.
S42: taking event trigger words and event elements in the corpus text as nodes, wherein the nodes in each corpus have adjacent edge relation, and constructing a dynamic network topological graph; the feature information of each node is transmitted to the neighbor nodes after being transformed through a message transmission mechanism, the extraction transformation of the feature information of the nodes is realized, and then the transmission information of the neighbor nodes around each target node is aggregated through a message receiving mechanism:
Figure BDA0003251035590000061
wherein A represents an adjacency matrix of the target node, D represents a degree matrix of the target node, and H(l)For node semantic representation at level l, H(l+1)For node semantic representation at level l +1, W(l)And sigma is a Sigmoid activation function which is a characteristic weight matrix of the l-th layer target node.
Deep relationships between nodes in the graph can be mined through a multi-layer convolution operation on the graph, wherein the l +1 layer convolution operation on the graph is shown in FIG. 2.
S5: obtaining long-range semantically-dependent information of semantic representations using gated neural units (GRUs)
The gated neural unit (GRU) is a simplified model of LSTM, and the addition of the GRU layer in the model is long-distance semantic dependency information for obtaining input vectors. Compared with the recurrent neural network, the GRU solves the problems of gradient disappearance and gradient explosion, and compared with the LSTM model, the LSTM has three gates (an input gate, a forgetting gate and an output gate), and the GRU only has two gates (a reset gate and an update gate). The GRU does not control and retain internal memory and does not have an output gate in the LSTM, so that the neural network GRU with the same structure has fewer parameters, higher efficiency and better effect on many tasks. Because the event main bodies appearing in the event in the communication field are more and more complex, the characteristic information extracted by the pre-training model can be further refined through the GRU, and the relation between the event elements in the remote communication field can be obtained.
As shown in fig. 3, the GRU uses an update gate and a reset gate. The reset gate determines how the new input information is combined with the previous memory, the update gate defining the amount of the previous memory to be saved to the current time step.
S51: at time step t, the information of the last time step is aggregated with the information of the current time step using an update gate:
zt=σ(W(z)xt+U(z)ht-1)
wherein h ist-1Latent semantic output, x, representing the t-1 time steptOriginal semantic input, W, representing the t-th time step(z)And U(z)All are weight matrixes, and sigma is a Sigmoid activation function.
S52: at time step t, the information of the last time step is aggregated with the information of the current time step using a reset gate:
zt=σ(W(r)xt+U(r)ht-1)
in updating the gate, different weight matrices are used in semantic aggregation.
S53: at time step t, the magnitude of the reset gating value is measured, and the previous information is determined to be reserved or forgotten:
h t=tanh(Wxt+zt⊙Uht-1)
wherein tanh is the activation function, ztA result of resetting the gate, an indication of a Hadamard product.
S54: at the end of time step t, the size of the update gating value is measured, and the information transmitted to the next unit is determined to be the hidden layer information or the information of the update gate of the previous time step:
ht=zt⊙ht-1+(1-zt)⊙h′t
s6: overcoming the problem of label bias present in step S5 using Conditional Random Fields (CRF)
Conditional Random Fields (CRFs) can model the dependency between tags, overcoming the tag bias problem. After the feature vector H with the obtained context semantic dependency is transmitted into a linear layer, a matrix P of m x n is obtained, wherein Pi,jThe score of the jth label in the ith label is m is the number of labels, and n is the maximum length of the sentence set by the model.
Inputting: k feature functions of the model, corresponding K weights, and an observation sequence x ═ x1,x2,x3,...,xn}
And (3) outputting: optimal marker sequence
Figure BDA0003251035590000081
S61: and (3) performing modeling initialization on the CRF, and solving the probability of each mark combination at the initial position:
Figure BDA0003251035590000082
Figure BDA0003251035590000083
wherein i represents the position of the mark, δ1(l) Representing the probability, w, of each combination of marks at the initial positionkCRF model parameters for the k-th pair of marker combinations, fkA feature function representing the k-th pair of mark combinations,
Figure BDA0003251035590000084
indicating that delta is at the initial position1(l) The marker value reaching the maximum value;
s62: recursion is performed on i 1, 2.. n, and the maximum non-normalized probability of each marker l 1, 2.. n to position i is obtained:
Figure BDA0003251035590000085
wherein, deltai+1(l) Represents the maximum value of the unnormalized probability, δ, corresponding to each possible value of the label/at position ii(j) Representing the probability of marking a combination of j at location i;
s63: recording the path of the maximum value of the unnormalized probability:
Figure BDA0003251035590000086
wherein,
Figure BDA0003251035590000087
is expressed as deltai+1(l) The marker value of the position i which reaches the maximum value;
s64: and when i finishes traversing all n corpus samples, stopping the recursion process, wherein the maximum value of the non-normalized probability is as follows:
Figure BDA0003251035590000088
simultaneously, the end point of the optimal path can be obtained:
Figure BDA0003251035590000089
s65: and backtracking the end point of the optimal path to obtain the whole optimal path:
Figure BDA00032510355900000810
wherein,
Figure BDA0003251035590000091
an optimal mark representing the ith position;
connecting the nodes on the optimal path to obtain the marking sequence of the optimal path:
Figure BDA0003251035590000092
s7: communication field process knowledge event extraction model data flow based on model transfer learning and graph convolution neural network
As shown in fig. 4, the communication field process knowledge event extraction model based on model transfer learning and graph convolution neural network mainly includes a text input layer, a BERT pre-training model layer, a GRU layer, a CRF layer, and an output layer.
S71: in a text input layer, performing data preprocessing on an original corpus, and performing token-based processing on the text corpus according to Chinese characters by using a BERT pre-training model to obtain a token-based word segmentation result:
x={x1,x2,x3,...,xn}
wherein xiThe ith character of the input sentence is represented, n is the number of characters contained in the sentence, and if the length of the sentence is less than n, 0 is automatically supplemented to the same length after the sequence.
S72: obtaining semantic representation of an original text corpus through a BERT pre-training model layer;
s73: semantic characterization of a pre-trained model
Figure BDA0003251035590000093
Inputting a GRU layer to extract key information of event corpus of the communication field:
Figure BDA0003251035590000094
wherein,
Figure BDA0003251035590000095
is the input of the current GRU, HtIs the hidden layer state vector of the GRU.
S74: the dependence relationship between labels is modeled by using a Conditional Random Field (CRF), so that the label deviation problem is solved. For one input sequence: x ═ x1,x2,x3,...,xnCalculating an input label sequence y ═ y1,y2,y3,...,ynTo the target tag sequence
Figure BDA0003251035590000096
LOSS value score of (a):
Figure BDA0003251035590000097
wherein A is a transition probability matrix, Ai,jIs the conversion score, P, of label i to label ji,jAnd m is the maximum length of a single text corpus.
During the training process, optimize { xi,yiThe maximum likelihood function of (c):
Figure BDA0003251035590000098
where λ and Θ are canonical parameters, P (y)i|xi) Is the probability from the original sequence to the predicted sequence. And obtaining the final sequence label according to the maximum likelihood function.
S75: in the output layer, the output label y adjusted according to the CRF layer is { y ═ y1,y2,y3,...,ynAnd the sequence labeling label definition rule defined in the table 4 converts the output label y into a BIO label, thereby obtaining an event trigger word and an event element of the corpus text in the inference process.
To sum up, in the method for extracting process-class knowledge events in the communication field according to the embodiment, the semantic representation extraction is realized by using a fusion model based on model migration learning and a graph convolution neural network, long-distance dependency information of the semantic representation is acquired by using a gated neural unit (GRU), and the problem of label deviation is overcome by using a Conditional Random Field (CRF).
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A method for extracting process knowledge events in the communication field is characterized by comprising the following steps:
s1: defining the event extraction problem of the communication field, and selecting an extraction method;
s2: preprocessing data of process knowledge in the communication field;
s3: constructing a hierarchical sequence marking task;
s4: obtaining an enhanced semantic representation by using a pre-training model and a graph convolution neural network;
s5: obtaining long-distance semantic dependency information of semantic representation by using a gated neural unit;
s6: solving the tag bias problem existing in step S5 using conditional random fields;
s7: and extracting the events by using a communication field process knowledge event extraction model based on model transfer learning and a graph convolution neural network to obtain a prediction result of communication field process knowledge event extraction.
2. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S1, the specific process of defining the communication domain event extraction problem is as follows:
s11: identifying whether a related communication field event exists from the text corpus;
s12: identifying a related element of the related event;
s13: the role each element plays is determined.
3. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S1, the event extraction method selects a pipeline extraction method, and the pipeline extraction method is adopted to model the event trigger words and the event elements separately, and sequentially extract the trigger words and the event elements included in the event.
4. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S2, the data preprocessing process is specifically as follows:
s21: data cleansing
Extracting part of obviously labeled error data existing in the corpus text for the labeled communication field knowledge, and directly abandoning the part of data;
s22: data deduplication
Executing deduplication operation on duplicate data generated by recording the same equipment state within a certain time;
s23: text normalization
And processing the text and the symbol which are not uniform in all half angles in the sample data into a uniform format.
5. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S3, the hierarchical sequence annotation refers to a task of dividing data into structured 8 types of 30-level addresses based on the event type and the event element in the data Schema according to the event type, and performing sequence annotation by using a BIO annotation policy.
6. The method for extracting process-class knowledge events in the communication field according to claim 5, wherein: b in the BIO labeling strategy represents the beginning of an event element, I represents a middle or ending word of the event element, and O represents an irrelevant word.
7. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: the specific process of step S4 is as follows:
s41: obtaining semantic representation of a corpus text by using a pre-training model;
s42: taking event trigger words and event elements in the corpus text as nodes, wherein the nodes in each corpus have adjacent edge relation, and constructing a dynamic network topological graph; the feature information of each node is transmitted to the neighbor nodes after being transformed through a message transmission mechanism, the extraction transformation of the feature information of the nodes is realized, and then the transmission information of the neighbor nodes around each target node is aggregated through a message receiving mechanism:
Figure FDA0003251035580000021
wherein A represents an adjacency matrix of the target node, D represents a degree matrix of the target node, and H(l)For node semantic representation at level l, H(l+1)For node semantic representation at level l +1, W(l)And sigma is a Sigmoid activation function which is a characteristic weight matrix of the l-th layer target node.
8. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: the specific process of step S5 is as follows:
s51: at time step t, the information of the last time step is aggregated with the information of the current time step using an update gate:
zt=σ(W(z)xt+U(z)ht-1)
wherein h ist-1Latent semantic output, x, representing the t-1 time steptOriginal semantic input, W, representing the t-th time step(z)And U(z)All the weight matrixes are weight matrixes, and sigma is a Sigmoid activation function;
s52: at time step t, the information of the last time step is aggregated with the information of the current time step using a reset gate:
zt=σ(W(r)xt+U(r)ht-1);
s53: at time step t, the magnitude of the reset gating value is measured, and the previous information is determined to be reserved or forgotten:
h′t=tanh(Wxt+zt⊙Uht-1)
wherein,tan h is the activation function, ztA result of reset gate, which indicates a Hadamard product;
s54: at the end of time step t, the size of the update gating value is measured, and the information transmitted to the next unit is determined to be the hidden layer information or the information of the update gate of the previous time step:
ht=zt⊙ht-1+(1-zt)⊙h′t
9. the method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S6, the conditional random field is generated according to K feature functions, corresponding K weights, and an observation sequence x ═ x1,x2,x3,...,xn}, predicting the optimal marker sequence
Figure FDA0003251035580000022
10. The method for extracting process-class knowledge events in the communication field according to claim 1, wherein: in step S7, the communication field process knowledge event extraction model based on model transfer learning and graph convolution neural network includes a text input layer, a BERT pre-training model layer, a GRU layer, a CRF layer, and an output layer, and the specific working process of the model is as follows:
s71: in a text input layer, performing data preprocessing on an original corpus, and performing token-based processing on the text corpus according to Chinese characters by using a BERT pre-training model to obtain a token-based word segmentation result:
x={x1,x2,x3,...,xn}
wherein x isiThe ith character of the input sentence is represented, n is the number of characters contained in the sentence, and if the length of the sentence is less than n, 0 is automatically supplemented to the same length after the sequence;
s72: obtaining semantic representation of an original text corpus through a BERT pre-training model layer;
s73: will be pre-trainedSemantic representation of exercise models
Figure FDA0003251035580000031
Inputting a GRU layer to extract key information of event corpus of the communication field:
Figure FDA0003251035580000032
wherein,
Figure FDA0003251035580000033
is the input of the current GRU, Ht is the hidden layer state vector of the GRU;
s74: the conditional random field is used to model the dependency between tags, overcoming the tag bias problem, for an input sequence: x ═ x1,x2,x3,...,xnCalculating an input label sequence y ═ y1,y2,y3,...,ynTo the target tag sequence y ═
Figure FDA0003251035580000034
LOSS value score of (a):
Figure FDA0003251035580000035
wherein A is a transition probability matrix, Ai,jIs the conversion score, P, of label i to label ji,jRepresenting the score of the jth label in the ith mark, wherein m is the maximum length of a single text corpus;
during the training process, optimize { xi,yiThe maximum likelihood function of (c):
Figure FDA0003251035580000036
where λ and Θ are canonical parameters, P (y)i|xi) Obtaining a final sequence label according to a maximum likelihood function for the probability from the original sequence to the predicted sequence;
s75: in the output layer, the output label y adjusted according to the CRF layer is { y ═ y1,y2,y3,...,ynAnd converting the labels of the BIO, and outputting event trigger words and event elements.
CN202111045480.XA 2021-09-07 2021-09-07 Method for extracting process knowledge events in communication field Pending CN113779988A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111045480.XA CN113779988A (en) 2021-09-07 2021-09-07 Method for extracting process knowledge events in communication field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111045480.XA CN113779988A (en) 2021-09-07 2021-09-07 Method for extracting process knowledge events in communication field

Publications (1)

Publication Number Publication Date
CN113779988A true CN113779988A (en) 2021-12-10

Family

ID=78841673

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111045480.XA Pending CN113779988A (en) 2021-09-07 2021-09-07 Method for extracting process knowledge events in communication field

Country Status (1)

Country Link
CN (1) CN113779988A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114943221A (en) * 2022-04-11 2022-08-26 哈尔滨工业大学(深圳) Construction method of segment pointer interaction model and social sensing disaster monitoring method
CN115860002A (en) * 2022-12-27 2023-03-28 中国人民解放军国防科技大学 Combat task generation method and system based on event extraction
CN116049345A (en) * 2023-03-31 2023-05-02 江西财经大学 Document-level event joint extraction method and system based on bidirectional event complete graph
CN118277574A (en) * 2024-06-04 2024-07-02 中国人民解放军国防科技大学 Event extraction model and military event type prediction method

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134757A (en) * 2019-04-19 2019-08-16 杭州电子科技大学 A kind of event argument roles abstracting method based on bull attention mechanism
CN110232413A (en) * 2019-05-31 2019-09-13 华北电力大学(保定) Insulator image, semantic based on GRU network describes method, system, device
CN112001185A (en) * 2020-08-26 2020-11-27 重庆理工大学 Emotion classification method combining Chinese syntax and graph convolution neural network
CN112101028A (en) * 2020-08-17 2020-12-18 淮阴工学院 Multi-feature bidirectional gating field expert entity extraction method and system
CN112149421A (en) * 2020-09-23 2020-12-29 云南师范大学 Software programming field entity identification method based on BERT embedding
CN112148888A (en) * 2020-09-18 2020-12-29 南京邮电大学 Knowledge graph construction method based on graph neural network
CN112765952A (en) * 2020-12-28 2021-05-07 大连理工大学 Conditional probability combined event extraction method under graph convolution attention mechanism
CN112883714A (en) * 2021-03-17 2021-06-01 广西师范大学 ABSC task syntactic constraint method based on dependency graph convolution and transfer learning
CN113157916A (en) * 2021-03-10 2021-07-23 南京航空航天大学 Civil aviation emergency extraction method based on deep learning
CN113312483A (en) * 2021-06-02 2021-08-27 郑州大学 Text classification method based on self-attention mechanism and BiGRU

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134757A (en) * 2019-04-19 2019-08-16 杭州电子科技大学 A kind of event argument roles abstracting method based on bull attention mechanism
CN110232413A (en) * 2019-05-31 2019-09-13 华北电力大学(保定) Insulator image, semantic based on GRU network describes method, system, device
CN112101028A (en) * 2020-08-17 2020-12-18 淮阴工学院 Multi-feature bidirectional gating field expert entity extraction method and system
CN112001185A (en) * 2020-08-26 2020-11-27 重庆理工大学 Emotion classification method combining Chinese syntax and graph convolution neural network
CN112148888A (en) * 2020-09-18 2020-12-29 南京邮电大学 Knowledge graph construction method based on graph neural network
CN112149421A (en) * 2020-09-23 2020-12-29 云南师范大学 Software programming field entity identification method based on BERT embedding
CN112765952A (en) * 2020-12-28 2021-05-07 大连理工大学 Conditional probability combined event extraction method under graph convolution attention mechanism
CN113157916A (en) * 2021-03-10 2021-07-23 南京航空航天大学 Civil aviation emergency extraction method based on deep learning
CN112883714A (en) * 2021-03-17 2021-06-01 广西师范大学 ABSC task syntactic constraint method based on dependency graph convolution and transfer learning
CN113312483A (en) * 2021-06-02 2021-08-27 郑州大学 Text classification method based on self-attention mechanism and BiGRU

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114943221A (en) * 2022-04-11 2022-08-26 哈尔滨工业大学(深圳) Construction method of segment pointer interaction model and social sensing disaster monitoring method
CN115860002A (en) * 2022-12-27 2023-03-28 中国人民解放军国防科技大学 Combat task generation method and system based on event extraction
CN115860002B (en) * 2022-12-27 2024-04-05 中国人民解放军国防科技大学 Combat task generation method and system based on event extraction
CN116049345A (en) * 2023-03-31 2023-05-02 江西财经大学 Document-level event joint extraction method and system based on bidirectional event complete graph
CN116049345B (en) * 2023-03-31 2023-10-10 江西财经大学 Document-level event joint extraction method and system based on bidirectional event complete graph
CN118277574A (en) * 2024-06-04 2024-07-02 中国人民解放军国防科技大学 Event extraction model and military event type prediction method
CN118277574B (en) * 2024-06-04 2024-09-03 中国人民解放军国防科技大学 Event extraction model and military event type prediction method

Similar Documents

Publication Publication Date Title
CN108920622B (en) Training method, training device and recognition device for intention recognition
CN106980683B (en) Blog text abstract generating method based on deep learning
WO2023024412A1 (en) Visual question answering method and apparatus based on deep learning model, and medium and device
CN109934261B (en) Knowledge-driven parameter propagation model and few-sample learning method thereof
CN113779988A (en) Method for extracting process knowledge events in communication field
CN110968660B (en) Information extraction method and system based on joint training model
CN113672708B (en) Language model training method, question-answer pair generation method, device and equipment
CN110555084B (en) Remote supervision relation classification method based on PCNN and multi-layer attention
CN117009490A (en) Training method and device for generating large language model based on knowledge base feedback
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN112560432A (en) Text emotion analysis method based on graph attention network
CN110275928B (en) Iterative entity relation extraction method
CN113254675B (en) Knowledge graph construction method based on self-adaptive few-sample relation extraction
CN113239143B (en) Power transmission and transformation equipment fault processing method and system fusing power grid fault case base
CN116049387A (en) Short text classification method, device and medium based on graph convolution
CN113742733A (en) Reading understanding vulnerability event trigger word extraction and vulnerability type identification method and device
CN111145914A (en) Method and device for determining lung cancer clinical disease library text entity
CN116402352A (en) Enterprise risk prediction method and device, electronic equipment and medium
CN115687609A (en) Zero sample relation extraction method based on Prompt multi-template fusion
CN117436451A (en) Agricultural pest and disease damage named entity identification method based on IDCNN-Attention
CN115795060B (en) Entity alignment method based on knowledge enhancement
CN116756605A (en) ERNIE-CN-GRU-based automatic speech step recognition method, system, equipment and medium
CN116227603A (en) Event reasoning task processing method, device and medium
CN115934944A (en) Entity relation extraction method based on Graph-MLP and adjacent contrast loss
CN115687627A (en) Two-step lightweight text classification method based on attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination