CN111597350A - Rail transit event knowledge map construction method based on deep learning - Google Patents

Rail transit event knowledge map construction method based on deep learning Download PDF

Info

Publication number
CN111597350A
CN111597350A CN202010365826.3A CN202010365826A CN111597350A CN 111597350 A CN111597350 A CN 111597350A CN 202010365826 A CN202010365826 A CN 202010365826A CN 111597350 A CN111597350 A CN 111597350A
Authority
CN
China
Prior art keywords
event
events
template
rail transit
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010365826.3A
Other languages
Chinese (zh)
Other versions
CN111597350B (en
Inventor
黑新宏
彭伟
朱磊
赵钦
王一川
姬文江
姚燕妮
焦瑞
董林靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202010365826.3A priority Critical patent/CN111597350B/en
Publication of CN111597350A publication Critical patent/CN111597350A/en
Application granted granted Critical
Publication of CN111597350B publication Critical patent/CN111597350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for constructing a knowledge graph of a rail transit incident based on deep learning; constructing event recognition model training data by adopting a dictionary matching mode and a manual labeling mode; training a standard event recognition model by adopting a BERT-BilSTM-CRF algorithm, and automatically extracting standard entry events from a rail transit design standard text; event unification is carried out on the events output by the event recognition model by adopting a word2vec model, cosine similarity clustering and logistic regression two-classification model; adopting a snowball algorithm to construct training data of an event relation model; and training a relationship recognition model by adopting a BERT-BilSTM-ATTENTION-SOFTMAX algorithm, and automatically extracting the relationship between the events. The informatization of the rail transit construction design engineering is improved, and the workload of map construction is reduced.

Description

Rail transit event knowledge map construction method based on deep learning
Technical Field
The invention belongs to an important direction in the field of artificial intelligence, and particularly relates to a rail transit incident knowledge graph construction method based on deep learning.
Background
With the rapid development of the internet technology, a plurality of industries are deeply integrated with the emerging artificial intelligence technology, and remarkable results are obtained. The urban rail transit is used as a standard distribution of urban modernization, and plays an important role in promoting urban economic development. The rail transit construction engineering belongs to complex engineering and has the characteristics of large scale, long construction period, huge investment and the like. The early design planning stage in the rail transit construction engineering is the foundation of later engineering, and the later construction can be guaranteed only by complete early design planning. However, in the design and planning stage of the rail transit engineering, the referenced design standard is of a complicated variety, the information amount of each standard entry is huge, and the informatization degree of the whole rail transit construction engineering is low, so that the difficulty of inquiring the content of a certain standard in the design and planning stage is caused. And has extremely high requirements on the professional ability of designers in the design stage, so that the design work is extremely challenging. Therefore, knowledge map is needed to represent the track traffic design specification knowledge, and informatization of the track traffic construction engineering is promoted.
At present, most of knowledge maps are entity knowledge maps taking entities as cores, but entity information is separated from specific contexts, and one-sidedness of semantic information exists. Compared with an entity, the event can express semantic information more clearly. The event expression is mostly contained in the specification entry of the rail transit design standard. The design specification is thus expressed in the form of an event knowledge graph. Compared with the traditional knowledge graph construction method, most of the methods are low in automation degree, time-consuming and labor-consuming, so that the method for constructing the knowledge graph of the rail transit event based on deep learning is provided, the automation degree is improved, and the workload is reduced.
Disclosure of Invention
The invention aims to provide a method for constructing a knowledge graph of a rail transit incident based on deep learning. The specification is expressed through the event knowledge graph, so that the expressed content is richer and more accurate in semantics. The problems of low automation degree, time consumption and labor consumption in the traditional map construction technology are solved by utilizing deep learning.
The technical scheme adopted by the invention is that an event trigger dictionary matching mode and a manual tagging mode are adopted to construct training data of a rail transit event recognition model; training a standard event recognition model by adopting a BERT-BilSTM-CRF algorithm, and automatically extracting standard entry events from a rail transit design standard text; event unification is carried out on the events output by the event recognition model by adopting a word2vec model, cosine similarity clustering and logistic regression two-classification model; adopting a snowball algorithm to construct training data of an event relation model; and training a relationship recognition model by adopting a BERT-BilSTM-ATTENTION-SOFTMAX algorithm, and automatically extracting the relationship between events to form a rail transit event knowledge map. The event knowledge graph construction process comprises the following steps:
step 1, adopting an event triggering dictionary matching and manual labeling mode to an original text to construct training data of an event recognition model.
And 2, extracting a training set from the rail transit design standard events for preprocessing, dividing texts in the training set by standard entries, and labeling the texts by parts of speech.
And 3, training the text processed in the step 2 by using a BERT-BilSTM-CRF algorithm to train a rail transit design specification event recognition model.
And 4, constructing event relation training data by adopting a snowball algorithm on the original text.
And 5, extracting a training set from the rail transit design specification event relation generated in the step 4 for preprocessing, and dividing texts in the training set in an event pair mode.
And 6, training the text processed in the step 5 by using a BERT-BilSTM-ATTENTION-SOFTMAX algorithm to train a relationship recognition model.
And 7, preprocessing the rail transit design specification to divide the items according to the specification.
And 8, inputting the rail transit standard text preprocessed in the step 7 into the event recognition model generated in the step 3, and extracting events in the standard, wherein the events comprise event trigger words and event elements.
And 9, unifying the events identified in the step 8.
And step 10, storing the event identified in the step 9 into an event database.
And 11, storing the events identified in the step 9 into a database in the form of a triple of 'event element-relation-event trigger'.
And step 12, taking out the events from the event database generated in the step 10, forming event pairs, inputting the event pairs into the event relation recognition model generated in the step 6, and extracting the relation among the events in the specification.
And step 13, storing the event pairs in the step 10 and the event relations extracted in the step 12 into a database in the form of a triple group of 'event trigger-relation-event trigger'.
In step 1, an event consists of an event trigger word and an event element; because most event trigger words are fixed words, manual labeling is accelerated by adopting a dictionary matching mode, and model training data are constructed; dictionary expansion may be by way of a synonym forest.
In step 3, a BERT-BilSTM-CRF algorithm is used for training an event recognition model, and the whole model consists of three parts, namely a BERT layer, a BilSTM layer and a CRF layer. The BERT pre-training model is used for obtaining a word vector containing the normative context feature information, the BilSTM layer is used for feature extraction, the sequence information of the whole text is utilized, and the CRF layer is used for learning the constraint condition of the sentence and filtering the wrong prediction sequence.
In the step 4, a semi-supervised snowball algorithm is utilized to construct an event relation recognition model training set. The snowball algorithm comprises the following specific steps:
step 4.1, manually marking a small number of event relations to form an event relation table; each event relationship is to an event relationship table.
Step 4.2, matching the original sentence containing the event in the event relation table in the original text by using the existing event relation table, and generating a template; the format of the template is five-tuple form, which is < left >, event 1 type, < middle >, event 2 type, < right > respectively; len is a length which can be set arbitrarily, < left > is a vector representation of len words on the left side of the event 1, < middle > is a vector representation of words between the event 1 and the event 2, and < right > is a vector representation of len words on the right side of the event; the event 1 type is a numerical definition event, and the event 2 type is a numerical definition event.
4.3, clustering the generated templates, clustering the templates with the similarity larger than a threshold value of 0.7 into a class, generating a new template by using an averaging method, and adding the new template into a rule base for storing the templates; . The format of the template known from step 4.2 can be written as
Figure BDA0002476714200000041
E1,E2Respectively indicating an event 1 type and an event 2 type of the template P,
Figure BDA0002476714200000042
represents E1The left 3-vocabulary length vector representation,
Figure BDA0002476714200000043
represents E1,E2The vector representation of the vocabulary in between,
Figure BDA0002476714200000044
represents E2Vector representation of the three lexical lengths on the right. Similarity calculation between templates, exemplified as follows, template 1:
Figure BDA0002476714200000051
template 2:
Figure BDA0002476714200000052
if the condition E is satisfied1=E1'&&E2=E'2I.e. satisfy the template P1Event 1 type E of1And a template P2Event 1 type E'1Identical and template P1Event 2 type E of2And a template P2Event 2 type E'2Same, then template P1And a template P2Can be determined by
Figure BDA0002476714200000053
Calculated as mu1μ2μ3Are weighted because
Figure BDA0002476714200000054
The calculation result of the similarity between the templates is greatly influenced, and mu can be set213(ii) a If the condition E is not satisfied1=E1'&&E2=E'2Then template P1And a template P2The similarity of (c) can be noted as 0.
Step 4.4, firstly, scanning the original text by using the event recognition model trained in the step 3, recognizing the event type contained in the text, then, matching the original text by using the template in the rule base generated in the step 4.3, and converting the text obtained by matching into a five-tuple form of the template;
step 4.5, similarity calculation is carried out on the new template generated in the step 4.4 and templates in the rule base, the template with the similarity smaller than the threshold value of 0.7 is discarded, and the event in the template with the similarity larger than the threshold value of 0.7 is added into the event relation table;
and 4.6, repeatedly executing the steps 4.2-4.5 until the original text processing is finished.
In step 6, a BERT-BilSTM-ATTENTION-SOFTMAX algorithm is used for training a relationship recognition model. The whole model consists of four parts, namely a BERT layer, a BilSTM layer, an ATTENTION layer and a SOFTMAX layer. The BERT pre-training model is used for obtaining a word vector containing normative context feature information, the BilSTM layer is used for feature extraction, sequence information of the whole text is utilized, the ATTENTION layer is used for calculating ATTENTION probability to highlight the importance degree of a key word in the text, the SOFTMAX layer is used for generating the probability of various relation classes, and the maximum class probability is taken as a model prediction class.
In step 9, texts which refer to the same event exist in the standard texts; in order to avoid a great deal of redundant information in the event database; the event unified processing algorithm is adopted, and comprises the following specific steps:
step 9.1, training a word2vec model by using the original track traffic text;
step 9.2, inputting a rail transit event by using the word2vec model generated in the step 9.1 to generate an event vector;
step 9.3, calculating the similarity between the events by utilizing the cosine function value, and clustering the events into a class according to the similarity value of more than 0.8; the cosine function is as follows:
Figure BDA0002476714200000061
9.4, generating a new event in the step 9.3, and randomly combining all the events to calculate the similarity between event pairs;
and 9.5, inputting the similarity of the event pair and the event into a trained logistic regression two-classification model, and judging the similarity of the event. The logistic regression mathematical model is as follows:
Figure BDA0002476714200000062
and 9.6, according to the classification result of the step 9.5, if the events are similar, discarding one event, and if the events are not similar, storing both events.
The invention has the beneficial effects that:
the invention provides a method for constructing a knowledge graph of a rail transit incident based on deep learning, aiming at the problems of complicated engineering information, defects of a traditional knowledge graph and large workload of construction of the knowledge graph in a rail transit construction design stage. Adopting an event trigger dictionary matching mode and a manual labeling mode to construct training data of a rail transit event recognition model; training a standard event recognition model by adopting a BERT-BilSTM-CRF algorithm, and automatically extracting standard entry events from a rail transit design standard text; event unification is carried out on the events output by the event recognition model by adopting a word2vec model, cosine similarity clustering and logistic regression two-classification model; adopting a snowball algorithm to construct training data of an event relation model; and training a relationship recognition model by adopting a BERT-BilSTM-ATTENTION-SOFTMAX algorithm, and automatically extracting the relationship between events to form a rail transit event knowledge map. The informatization of the rail transit construction design engineering is improved, and the workload of map construction is reduced.
Drawings
FIG. 1 is a general flowchart of a method for constructing a knowledge graph of a rail transit event based on deep learning according to the present invention;
FIG. 2 is a process of constructing an event training data set by dictionary matching and manual labeling in the rail transit event knowledge graph construction method based on deep learning according to the invention;
FIG. 3 is a process of building a normative event recognition model based on a BERT-BilSTM-CRF algorithm in the rail transit event knowledge map building method based on deep learning according to the invention;
FIG. 4 is a process of event unification of events output by an event recognition model by a word2vec model, cosine similarity clustering and logistic regression two-classification model in the rail transit event knowledge graph construction method based on deep learning of the present invention;
FIG. 5 is a process of constructing training data of an event relation model by adopting a snowball algorithm in the rail transit event knowledge graph construction method based on deep learning;
FIG. 6 is a process of building a relationship recognition model based on a BERT-BilSTM-ATTENTION-SOFTMAX algorithm in the rail transit event knowledge graph building method based on deep learning.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
Referring to fig. 1, the method for constructing the rail transit event knowledge graph based on deep learning specifically comprises the following steps:
step 1, as shown in fig. 2, training data of an event recognition model is constructed by adopting an event-triggered dictionary matching and manual tagging mode for an original text. The pseudo code labeling the training set algorithm is as follows:
Figure BDA0002476714200000081
and 2, extracting a training set from the rail transit design standard events for preprocessing, dividing texts in the training set by standard entries, and labeling the texts by parts of speech.
And 3, as shown in FIG. 3, training the track traffic design specification event recognition model by using a BERT-BilSTM-CRF algorithm on the text processed in the step 2. The pseudo code for constructing the event recognition model is as follows:
Figure BDA0002476714200000091
and 4, as shown in fig. 5, constructing an event relation recognition model training set by adopting a semi-supervised snowball algorithm on the original text. The snowball algorithm comprises the following specific steps:
step 4.1, manually marking a small number of event relations to form an event relation table; each event relationship is to an event relationship table.
Step 4.2, matching the original sentence containing the event in the event relation table in the original text by using the existing event relation table, and generating a template; the format of the template is five-tuple form, which is < left >, event 1 type, < middle >, event 2 type, < right > respectively; len is a length which can be set arbitrarily, < left > is a vector representation of len words on the left side of the event 1, < middle > is a vector representation of words between the event 1 and the event 2, and < right > is a vector representation of len words on the right side of the event; the event 1 type is a numerical definition event, and the event 2 type is a numerical definition event.
4.3, clustering the generated templates, clustering the templates with the similarity larger than a threshold value of 0.7 into a class, generating a new template by using an averaging method, and adding the new template into a rule base for storing the templates; . The format of the template known from step 4.2 can be written as
Figure BDA0002476714200000101
E1,E2Respectively indicating an event 1 type and an event 2 type of the template P,
Figure BDA0002476714200000102
represents E1The left 3-vocabulary length vector representation,
Figure BDA0002476714200000103
represents E1,E2The vector representation of the vocabulary in between,
Figure BDA0002476714200000104
represents E2Vector representation of the three lexical lengths on the right. Similarity calculation between templates, exemplified as follows, template 1:
Figure BDA0002476714200000105
template 2:
Figure BDA0002476714200000106
if the condition E is satisfied1=E1'&&E2=E'2I.e. satisfy the template P1Event 1 type E of1And a template P2Event 1 type E'1Identical and template P1Event 2 type E of2And a template P2Event 2 type E'2Same, then template P1And a template P2Can be determined by
Figure BDA0002476714200000107
Calculated as mu1μ2μ3Are weighted because
Figure BDA0002476714200000108
The calculation result of the similarity between the templates is greatly influenced, and mu can be set213(ii) a If the condition E is not satisfied1=E1'&&E2=E'2Then template P1And a template P2The similarity of (c) can be noted as 0.
Step 4.4, firstly, scanning the original text by using the event recognition model trained in the step 3, recognizing the event type contained in the text, then, matching the original text by using the template in the rule base generated in the step 4.3, and converting the text obtained by matching into a five-tuple form of the template;
step 4.5, similarity calculation is carried out on the new template generated in the step 4.4 and templates in the rule base, the template with the similarity smaller than the threshold value of 0.7 is discarded, and the event in the template with the similarity larger than the threshold value of 0.7 is added into the event relation table;
and 4.6, repeatedly executing the steps 4.2-4.5 until the original text processing is finished.
And 5, extracting a training set from the rail transit design specification event relation generated in the step 4 for preprocessing, and dividing the text in an event pair mode.
And 6, training the text processed in the step 5 by using a BERT-BilSTM-ATTENTION-SOFTMAX algorithm to train a relationship recognition model. The pseudo code for constructing the event relationship recognition model is as follows, as shown in FIG. 6:
Figure BDA0002476714200000111
and 7, preprocessing the rail transit design specification to divide the items according to the specification.
And 8, inputting the rail transit standard text preprocessed in the step 7 into the event recognition model generated in the step 3, and extracting events in the standard, wherein the events comprise event trigger words and event elements.
And 9, unifying the events identified in the step 8 as shown in fig. 4. There is text in the specification text that refers to the same event; in order to avoid a great deal of redundant information in the event database; the event unified processing algorithm is adopted, and comprises the following specific steps:
step 9.1, training a word2vec model by using the original track traffic text;
step 9.2, inputting a rail transit event by using the word2vec model generated in the step 9.1 to generate an event vector;
step 9.3, calculating the similarity between the events by utilizing the cosine function value, and clustering the events into a class according to the similarity value of more than 0.8; the cosine function is as follows:
Figure BDA0002476714200000121
9.4, generating a new event in the step 9.3, and randomly combining all the events to calculate the similarity between event pairs;
and 9.5, inputting the similarity of the event pair and the event into a trained logistic regression two-classification model, and judging the similarity of the event. The logistic regression mathematical model is as follows:
Figure BDA0002476714200000122
and 9.6, according to the classification result of the step 9.5, if the events are similar, discarding one event, and if the events are not similar, storing both events.
And step 10, storing the event identified in the step 9 into an event database.
And 11, storing the events identified in the step 9 into a database in the form of a triple of 'event element-relation-event trigger'. For example, "orbital center runway surface as emergency evacuation channel" is stored in the graph database with < orbital center runway surface, subject, as > and < emergency evacuation channel, subject, as >.
And step 12, taking out the events from the event database generated in the step 10, forming event pairs, inputting the event pairs into the event relation recognition model generated in the step 6, and extracting the relation among the events in the specification.
And step 13, storing the event pairs in the step 10 and the event relations extracted in the step 12 into a database in the form of a triple group of 'event trigger-relation-event trigger'. For example, the event relationship between "the track center lane bed surface is used as an emergency evacuation lane" and "the train end vehicles should be provided with special end doors and be provided with getting-off facilities" is stored in the map database as < as, conditional relationship, setup >.
The method adopts an event triggering dictionary matching mode and a manual labeling mode to construct training data of a rail transit event recognition model; training a standard event recognition model by adopting a BERT-BilSTM-CRF algorithm, and automatically extracting standard entry events from a rail transit design standard text; event unification is carried out on the events output by the event recognition model by adopting a word2vec model, cosine similarity clustering and logistic regression two-classification model; adopting a snowball algorithm to construct training data of an event relation model; and training a relationship recognition model by adopting a BERT-BilSTM-ATTENTION-SOFTMAX algorithm, and automatically extracting the relationship between events to form a rail transit event knowledge map. The informatization of the rail transit construction design engineering is improved, and the workload of map construction is reduced.

Claims (7)

1. A rail transit incident knowledge map construction method based on deep learning is characterized in that an incident trigger dictionary matching mode and a manual labeling mode are adopted to construct rail transit incident recognition model training data; training a standard event recognition model by adopting a BERT-BilSTM-CRF algorithm, and automatically extracting standard entry events from a rail transit design standard text; event unification is carried out on the events output by the event recognition model by adopting a word2vec model, cosine similarity clustering and logistic regression two-classification model; adopting a snowball algorithm to construct training data of an event relation model; and training a relationship recognition model by adopting a BERT-BilSTM-ATTENTION-SOFTMAX algorithm, and automatically extracting the relationship between events to form a rail transit event knowledge map.
2. The rail transit event knowledge graph construction method based on deep learning according to claim 1, is characterized by specifically comprising the following steps:
step 1, adopting an event triggering dictionary matching and manual labeling mode to an original text to construct training data of an event recognition model.
And 2, extracting a training set from the rail transit design standard events for preprocessing, dividing texts in the training set by standard entries, and labeling the texts by parts of speech.
And 3, training the text processed in the step 2 by using a BERT-BilSTM-CRF algorithm to train a rail transit design specification event recognition model.
And 4, constructing event relation training data by adopting a snowball algorithm on the original text.
And 5, extracting a training set from the rail transit design specification event relation generated in the step 4 for preprocessing, and dividing texts in the training set in an event pair mode.
And 6, training the text processed in the step 5 by using a BERT-BilSTM-ATTENTION-SOFTMAX algorithm to train a relationship recognition model.
And 7, preprocessing the rail transit design specification to divide the items according to the specification.
And 8, inputting the rail transit standard text preprocessed in the step 7 into the event recognition model generated in the step 3, and extracting events in the standard, wherein the events comprise event trigger words and event elements.
And 9, unifying the events identified in the step 8.
And step 10, storing the event identified in the step 9 into an event database.
And 11, storing the events identified in the step 9 into a database in the form of a triple of 'event element-relation-event trigger'.
And step 12, taking out the events from the event database generated in the step 10, forming event pairs, inputting the event pairs into the event relation recognition model generated in the step 6, and extracting the relation among the events in the specification.
And step 13, storing the event pairs in the step 10 and the event relations extracted in the step 12 into a database in the form of a triple group of 'event trigger-relation-event trigger'.
3. The method for building the knowledge graph of the rail transit events based on deep learning according to claim 2, wherein in the step 1, the events are composed of event trigger words and event elements; because most event trigger words are fixed words, manual labeling is accelerated by adopting a dictionary matching mode, and model training data are constructed; dictionary expansion may be by way of a synonym forest.
4. The method for constructing the rail transit event knowledge graph based on deep learning of claim 2, wherein in the step 3, an event recognition model is trained by using a BERT-BilSTM-CRF algorithm, and the whole model consists of three parts, namely a BERT layer, a BilSTM layer and a CRF layer; the BERT pre-training model is used for obtaining a word vector containing the normative context feature information, the BilSTM layer is used for feature extraction, the sequence information of the whole text is utilized, and the CRF layer is used for learning the constraint condition of the sentence and filtering the wrong prediction sequence.
5. The method for building a knowledge graph of rail transit events based on deep learning according to claim 2, wherein in the step 4, a semi-supervised snowball algorithm is used to build a training set of event relation recognition models. The snowball algorithm comprises the following specific steps:
step 4.1, manually marking a small number of event relations to form an event relation table; each event relationship is to an event relationship table.
Step 4.2, matching the original sentence containing the event in the event relation table in the original text by using the existing event relation table, and generating a template; the format of the template is five-tuple form, which is < left >, event 1 type, < middle >, event 2 type, < right > respectively; len is a length which can be set arbitrarily, < left > is a vector representation of len words on the left side of the event 1, < middle > is a vector representation of words between the event 1 and the event 2, and < right > is a vector representation of len words on the right side of the event; the event 1 type is a numerical definition event, and the event 2 type is a numerical definition event.
4.3, clustering the generated templates, clustering the templates with the similarity larger than a threshold value of 0.7 into a class, generating a new template by using an averaging method, and adding the new template into a rule base for storing the templates; . The format of the template known from step 4.2 can be written as
Figure FDA0002476714190000031
E1,E2Respectively indicating an event 1 type and an event 2 type of the template P,
Figure FDA0002476714190000032
represents E1The left 3-vocabulary length vector representation,
Figure FDA0002476714190000033
represents E1,E2The vector representation of the vocabulary in between,
Figure FDA0002476714190000034
represents E2Vector representation of the three lexical lengths on the right. Similarity calculation between templates, exemplified as follows, template 1:
Figure FDA0002476714190000041
template 2:
Figure FDA0002476714190000042
if the condition E is satisfied1=E′1&&E2=E′2I.e. satisfy the template P1Event 1 type E of1And a template P2Event 1 type E'1Identical and template P1Event 2 type E of2And a template P2Event 2 type E'2Same, then template P1And a template P2Can be determined by
Figure FDA0002476714190000043
Calculated as mu1μ2μ3Are weighted because
Figure FDA0002476714190000044
The calculation result of the similarity between the templates is greatly influenced, and mu can be set213(ii) a If the condition E is not satisfied1=E′1&&E2=E′2Then template P1And a template P2The similarity of (c) can be noted as 0.
Step 4.4, firstly, scanning the original text by using the event recognition model trained in the step 3, recognizing the event type contained in the text, then, matching the original text by using the template in the rule base generated in the step 4.3, and converting the text obtained by matching into a five-tuple form of the template;
step 4.5, similarity calculation is carried out on the new template generated in the step 4.4 and templates in the rule base, the template with the similarity smaller than the threshold value of 0.7 is discarded, and the event in the template with the similarity larger than the threshold value of 0.7 is added into the event relation table;
and 4.6, repeatedly executing the steps 4.2-4.5 until the original text processing is finished.
6. The method for building the knowledge graph of the rail transit events based on deep learning as claimed in claim 2, wherein in the step 6, a relation recognition model is trained by using a BERT-BilSTM-ATTENTION-SOFTMAX algorithm; the whole model consists of four parts, namely a BERT layer, a BilSTM layer, an ATTENTION layer and a SOFTMAX layer; the BERT pre-training model is used for obtaining a word vector containing normative context feature information, the BilSTM layer is used for feature extraction, sequence information of the whole text is utilized, the ATTENTION layer is used for calculating ATTENTION probability to highlight the importance degree of a key word in the text, the SOFTMAX layer is used for generating the probability of various relation classes, and the maximum class probability is taken as a model prediction class.
7. The method for building a knowledge graph of track traffic events based on deep learning as claimed in claim 2, wherein in the step 9, texts which refer to the same event exist in the specification texts; in order to avoid a great deal of redundant information in the event database; the event unified processing algorithm is adopted, and comprises the following specific steps:
step 9.1, training a word2vec model by using the original track traffic text;
step 9.2, inputting a rail transit event by using the word2vec model generated in the step 9.1 to generate an event vector;
step 9.3, calculating the similarity between the events by utilizing the cosine function value, and clustering the events into a class according to the similarity value of more than 0.8; the cosine function is as follows:
Figure FDA0002476714190000051
9.4, generating a new event in the step 9.3, and randomly combining all the events to calculate the similarity between event pairs;
and 9.5, inputting the similarity of the event pair and the event into a trained logistic regression two-classification model, and judging the similarity of the event. The logistic regression mathematical model is as follows:
Figure FDA0002476714190000052
and 9.6, according to the classification result of the step 9.5, if the events are similar, discarding one event, and if the events are not similar, storing both events.
CN202010365826.3A 2020-04-30 2020-04-30 Rail transit event knowledge graph construction method based on deep learning Active CN111597350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010365826.3A CN111597350B (en) 2020-04-30 2020-04-30 Rail transit event knowledge graph construction method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010365826.3A CN111597350B (en) 2020-04-30 2020-04-30 Rail transit event knowledge graph construction method based on deep learning

Publications (2)

Publication Number Publication Date
CN111597350A true CN111597350A (en) 2020-08-28
CN111597350B CN111597350B (en) 2023-06-02

Family

ID=72186939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010365826.3A Active CN111597350B (en) 2020-04-30 2020-04-30 Rail transit event knowledge graph construction method based on deep learning

Country Status (1)

Country Link
CN (1) CN111597350B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131401A (en) * 2020-09-14 2020-12-25 腾讯科技(深圳)有限公司 Method and device for constructing concept knowledge graph
CN112418696A (en) * 2020-11-27 2021-02-26 北京工业大学 Method and device for constructing urban traffic dynamic knowledge map
CN112463989A (en) * 2020-12-11 2021-03-09 交控科技股份有限公司 Knowledge graph-based information acquisition method and system
CN112733874A (en) * 2020-10-23 2021-04-30 招商局重庆交通科研设计院有限公司 Suspicious vehicle discrimination method based on knowledge graph reasoning
CN112800762A (en) * 2021-01-25 2021-05-14 上海犀语科技有限公司 Element content extraction method for processing text with format style
CN113268591A (en) * 2021-04-17 2021-08-17 中国人民解放军战略支援部队信息工程大学 Air target intention evidence judging method and system based on affair atlas
CN113535979A (en) * 2021-07-14 2021-10-22 中国地质大学(北京) Method and system for constructing knowledge graph in mineral field
CN113546426A (en) * 2021-07-21 2021-10-26 西安理工大学 Security policy generation method for data access event in game service
CN113987164A (en) * 2021-10-09 2022-01-28 国网江苏省电力有限公司电力科学研究院 Project studying and judging method and device based on domain event knowledge graph
CN115269931A (en) * 2022-09-28 2022-11-01 深圳技术大学 Rail transit station data map system based on service drive and construction method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018028077A1 (en) * 2016-08-11 2018-02-15 中兴通讯股份有限公司 Deep learning based method and device for chinese semantics analysis
CN107908671A (en) * 2017-10-25 2018-04-13 南京擎盾信息科技有限公司 Knowledge mapping construction method and system based on law data
CN110633409A (en) * 2018-06-20 2019-12-31 上海财经大学 Rule and deep learning fused automobile news event extraction method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018028077A1 (en) * 2016-08-11 2018-02-15 中兴通讯股份有限公司 Deep learning based method and device for chinese semantics analysis
CN107908671A (en) * 2017-10-25 2018-04-13 南京擎盾信息科技有限公司 Knowledge mapping construction method and system based on law data
CN110633409A (en) * 2018-06-20 2019-12-31 上海财经大学 Rule and deep learning fused automobile news event extraction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
洪文兴等: "面向司法案件的案情知识图谱自动构建", 《中文信息学报》 *
项威: "事件知识图谱构建技术与应用综述", 《计算机与现代化》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131401B (en) * 2020-09-14 2024-02-13 腾讯科技(深圳)有限公司 Concept knowledge graph construction method and device
CN112131401A (en) * 2020-09-14 2020-12-25 腾讯科技(深圳)有限公司 Method and device for constructing concept knowledge graph
CN112733874A (en) * 2020-10-23 2021-04-30 招商局重庆交通科研设计院有限公司 Suspicious vehicle discrimination method based on knowledge graph reasoning
CN112418696A (en) * 2020-11-27 2021-02-26 北京工业大学 Method and device for constructing urban traffic dynamic knowledge map
CN112418696B (en) * 2020-11-27 2024-06-18 北京工业大学 Construction method and device of urban traffic dynamic knowledge graph
CN112463989A (en) * 2020-12-11 2021-03-09 交控科技股份有限公司 Knowledge graph-based information acquisition method and system
CN112800762A (en) * 2021-01-25 2021-05-14 上海犀语科技有限公司 Element content extraction method for processing text with format style
CN113268591A (en) * 2021-04-17 2021-08-17 中国人民解放军战略支援部队信息工程大学 Air target intention evidence judging method and system based on affair atlas
CN113535979A (en) * 2021-07-14 2021-10-22 中国地质大学(北京) Method and system for constructing knowledge graph in mineral field
CN113546426B (en) * 2021-07-21 2023-08-22 西安理工大学 Security policy generation method for data access event in game service
CN113546426A (en) * 2021-07-21 2021-10-26 西安理工大学 Security policy generation method for data access event in game service
CN113987164A (en) * 2021-10-09 2022-01-28 国网江苏省电力有限公司电力科学研究院 Project studying and judging method and device based on domain event knowledge graph
CN115269931A (en) * 2022-09-28 2022-11-01 深圳技术大学 Rail transit station data map system based on service drive and construction method thereof
CN115269931B (en) * 2022-09-28 2022-11-29 深圳技术大学 Rail transit station data map system based on service drive and construction method thereof

Also Published As

Publication number Publication date
CN111597350B (en) 2023-06-02

Similar Documents

Publication Publication Date Title
CN111597350A (en) Rail transit event knowledge map construction method based on deep learning
CN109271631B (en) Word segmentation method, device, equipment and storage medium
CN111931506B (en) Entity relationship extraction method based on graph information enhancement
CN111209401A (en) System and method for classifying and processing sentiment polarity of online public opinion text information
CN111858932A (en) Multiple-feature Chinese and English emotion classification method and system based on Transformer
CN114036933B (en) Information extraction method based on legal documents
CN111783399A (en) Legal referee document information extraction method
CN112906397B (en) Short text entity disambiguation method
CN111832293B (en) Entity and relation joint extraction method based on head entity prediction
CN112084336A (en) Entity extraction and event classification method and device for expressway emergency
CN110717045A (en) Letter element automatic extraction method based on letter overview
CN113204967B (en) Resume named entity identification method and system
CN111897917B (en) Rail transit industry term extraction method based on multi-modal natural language features
CN113239663B (en) Multi-meaning word Chinese entity relation identification method based on Hopkinson
CN113934909A (en) Financial event extraction method based on pre-training language and deep learning model
CN112818698A (en) Fine-grained user comment sentiment analysis method based on dual-channel model
CN111597349B (en) Rail transit standard entity relation automatic completion method based on artificial intelligence
CN114239574A (en) Miner violation knowledge extraction method based on entity and relationship joint learning
CN116432645A (en) Traffic accident named entity recognition method based on pre-training model
CN116010553A (en) Viewpoint retrieval system based on two-way coding and accurate matching signals
CN116522165B (en) Public opinion text matching system and method based on twin structure
CN116910272B (en) Academic knowledge graph completion method based on pre-training model T5
CN112651241A (en) Chinese parallel structure automatic identification method based on semi-supervised learning
Wu et al. One improved model of named entity recognition by combining BERT and BiLSTM-CNN for domain of Chinese railway construction
CN111522913A (en) Emotion classification method suitable for long text and short text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant