CN111597350A - Rail transit event knowledge map construction method based on deep learning - Google Patents
Rail transit event knowledge map construction method based on deep learning Download PDFInfo
- Publication number
- CN111597350A CN111597350A CN202010365826.3A CN202010365826A CN111597350A CN 111597350 A CN111597350 A CN 111597350A CN 202010365826 A CN202010365826 A CN 202010365826A CN 111597350 A CN111597350 A CN 111597350A
- Authority
- CN
- China
- Prior art keywords
- event
- events
- template
- rail transit
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for constructing a knowledge graph of a rail transit incident based on deep learning; constructing event recognition model training data by adopting a dictionary matching mode and a manual labeling mode; training a standard event recognition model by adopting a BERT-BilSTM-CRF algorithm, and automatically extracting standard entry events from a rail transit design standard text; event unification is carried out on the events output by the event recognition model by adopting a word2vec model, cosine similarity clustering and logistic regression two-classification model; adopting a snowball algorithm to construct training data of an event relation model; and training a relationship recognition model by adopting a BERT-BilSTM-ATTENTION-SOFTMAX algorithm, and automatically extracting the relationship between the events. The informatization of the rail transit construction design engineering is improved, and the workload of map construction is reduced.
Description
Technical Field
The invention belongs to an important direction in the field of artificial intelligence, and particularly relates to a rail transit incident knowledge graph construction method based on deep learning.
Background
With the rapid development of the internet technology, a plurality of industries are deeply integrated with the emerging artificial intelligence technology, and remarkable results are obtained. The urban rail transit is used as a standard distribution of urban modernization, and plays an important role in promoting urban economic development. The rail transit construction engineering belongs to complex engineering and has the characteristics of large scale, long construction period, huge investment and the like. The early design planning stage in the rail transit construction engineering is the foundation of later engineering, and the later construction can be guaranteed only by complete early design planning. However, in the design and planning stage of the rail transit engineering, the referenced design standard is of a complicated variety, the information amount of each standard entry is huge, and the informatization degree of the whole rail transit construction engineering is low, so that the difficulty of inquiring the content of a certain standard in the design and planning stage is caused. And has extremely high requirements on the professional ability of designers in the design stage, so that the design work is extremely challenging. Therefore, knowledge map is needed to represent the track traffic design specification knowledge, and informatization of the track traffic construction engineering is promoted.
At present, most of knowledge maps are entity knowledge maps taking entities as cores, but entity information is separated from specific contexts, and one-sidedness of semantic information exists. Compared with an entity, the event can express semantic information more clearly. The event expression is mostly contained in the specification entry of the rail transit design standard. The design specification is thus expressed in the form of an event knowledge graph. Compared with the traditional knowledge graph construction method, most of the methods are low in automation degree, time-consuming and labor-consuming, so that the method for constructing the knowledge graph of the rail transit event based on deep learning is provided, the automation degree is improved, and the workload is reduced.
Disclosure of Invention
The invention aims to provide a method for constructing a knowledge graph of a rail transit incident based on deep learning. The specification is expressed through the event knowledge graph, so that the expressed content is richer and more accurate in semantics. The problems of low automation degree, time consumption and labor consumption in the traditional map construction technology are solved by utilizing deep learning.
The technical scheme adopted by the invention is that an event trigger dictionary matching mode and a manual tagging mode are adopted to construct training data of a rail transit event recognition model; training a standard event recognition model by adopting a BERT-BilSTM-CRF algorithm, and automatically extracting standard entry events from a rail transit design standard text; event unification is carried out on the events output by the event recognition model by adopting a word2vec model, cosine similarity clustering and logistic regression two-classification model; adopting a snowball algorithm to construct training data of an event relation model; and training a relationship recognition model by adopting a BERT-BilSTM-ATTENTION-SOFTMAX algorithm, and automatically extracting the relationship between events to form a rail transit event knowledge map. The event knowledge graph construction process comprises the following steps:
step 1, adopting an event triggering dictionary matching and manual labeling mode to an original text to construct training data of an event recognition model.
And 2, extracting a training set from the rail transit design standard events for preprocessing, dividing texts in the training set by standard entries, and labeling the texts by parts of speech.
And 3, training the text processed in the step 2 by using a BERT-BilSTM-CRF algorithm to train a rail transit design specification event recognition model.
And 4, constructing event relation training data by adopting a snowball algorithm on the original text.
And 5, extracting a training set from the rail transit design specification event relation generated in the step 4 for preprocessing, and dividing texts in the training set in an event pair mode.
And 6, training the text processed in the step 5 by using a BERT-BilSTM-ATTENTION-SOFTMAX algorithm to train a relationship recognition model.
And 7, preprocessing the rail transit design specification to divide the items according to the specification.
And 8, inputting the rail transit standard text preprocessed in the step 7 into the event recognition model generated in the step 3, and extracting events in the standard, wherein the events comprise event trigger words and event elements.
And 9, unifying the events identified in the step 8.
And step 10, storing the event identified in the step 9 into an event database.
And 11, storing the events identified in the step 9 into a database in the form of a triple of 'event element-relation-event trigger'.
And step 12, taking out the events from the event database generated in the step 10, forming event pairs, inputting the event pairs into the event relation recognition model generated in the step 6, and extracting the relation among the events in the specification.
And step 13, storing the event pairs in the step 10 and the event relations extracted in the step 12 into a database in the form of a triple group of 'event trigger-relation-event trigger'.
In step 1, an event consists of an event trigger word and an event element; because most event trigger words are fixed words, manual labeling is accelerated by adopting a dictionary matching mode, and model training data are constructed; dictionary expansion may be by way of a synonym forest.
In step 3, a BERT-BilSTM-CRF algorithm is used for training an event recognition model, and the whole model consists of three parts, namely a BERT layer, a BilSTM layer and a CRF layer. The BERT pre-training model is used for obtaining a word vector containing the normative context feature information, the BilSTM layer is used for feature extraction, the sequence information of the whole text is utilized, and the CRF layer is used for learning the constraint condition of the sentence and filtering the wrong prediction sequence.
In the step 4, a semi-supervised snowball algorithm is utilized to construct an event relation recognition model training set. The snowball algorithm comprises the following specific steps:
step 4.1, manually marking a small number of event relations to form an event relation table; each event relationship is to an event relationship table.
Step 4.2, matching the original sentence containing the event in the event relation table in the original text by using the existing event relation table, and generating a template; the format of the template is five-tuple form, which is < left >, event 1 type, < middle >, event 2 type, < right > respectively; len is a length which can be set arbitrarily, < left > is a vector representation of len words on the left side of the event 1, < middle > is a vector representation of words between the event 1 and the event 2, and < right > is a vector representation of len words on the right side of the event; the event 1 type is a numerical definition event, and the event 2 type is a numerical definition event.
4.3, clustering the generated templates, clustering the templates with the similarity larger than a threshold value of 0.7 into a class, generating a new template by using an averaging method, and adding the new template into a rule base for storing the templates; . The format of the template known from step 4.2 can be written asE1,E2Respectively indicating an event 1 type and an event 2 type of the template P,represents E1The left 3-vocabulary length vector representation,represents E1,E2The vector representation of the vocabulary in between,represents E2Vector representation of the three lexical lengths on the right. Similarity calculation between templates, exemplified as follows, template 1:template 2:if the condition E is satisfied1=E1'&&E2=E'2I.e. satisfy the template P1Event 1 type E of1And a template P2Event 1 type E'1Identical and template P1Event 2 type E of2And a template P2Event 2 type E'2Same, then template P1And a template P2Can be determined byCalculated as mu1μ2μ3Are weighted becauseThe calculation result of the similarity between the templates is greatly influenced, and mu can be set2>μ1>μ3(ii) a If the condition E is not satisfied1=E1'&&E2=E'2Then template P1And a template P2The similarity of (c) can be noted as 0.
Step 4.4, firstly, scanning the original text by using the event recognition model trained in the step 3, recognizing the event type contained in the text, then, matching the original text by using the template in the rule base generated in the step 4.3, and converting the text obtained by matching into a five-tuple form of the template;
step 4.5, similarity calculation is carried out on the new template generated in the step 4.4 and templates in the rule base, the template with the similarity smaller than the threshold value of 0.7 is discarded, and the event in the template with the similarity larger than the threshold value of 0.7 is added into the event relation table;
and 4.6, repeatedly executing the steps 4.2-4.5 until the original text processing is finished.
In step 6, a BERT-BilSTM-ATTENTION-SOFTMAX algorithm is used for training a relationship recognition model. The whole model consists of four parts, namely a BERT layer, a BilSTM layer, an ATTENTION layer and a SOFTMAX layer. The BERT pre-training model is used for obtaining a word vector containing normative context feature information, the BilSTM layer is used for feature extraction, sequence information of the whole text is utilized, the ATTENTION layer is used for calculating ATTENTION probability to highlight the importance degree of a key word in the text, the SOFTMAX layer is used for generating the probability of various relation classes, and the maximum class probability is taken as a model prediction class.
In step 9, texts which refer to the same event exist in the standard texts; in order to avoid a great deal of redundant information in the event database; the event unified processing algorithm is adopted, and comprises the following specific steps:
step 9.1, training a word2vec model by using the original track traffic text;
step 9.2, inputting a rail transit event by using the word2vec model generated in the step 9.1 to generate an event vector;
step 9.3, calculating the similarity between the events by utilizing the cosine function value, and clustering the events into a class according to the similarity value of more than 0.8; the cosine function is as follows:
9.4, generating a new event in the step 9.3, and randomly combining all the events to calculate the similarity between event pairs;
and 9.5, inputting the similarity of the event pair and the event into a trained logistic regression two-classification model, and judging the similarity of the event. The logistic regression mathematical model is as follows:
and 9.6, according to the classification result of the step 9.5, if the events are similar, discarding one event, and if the events are not similar, storing both events.
The invention has the beneficial effects that:
the invention provides a method for constructing a knowledge graph of a rail transit incident based on deep learning, aiming at the problems of complicated engineering information, defects of a traditional knowledge graph and large workload of construction of the knowledge graph in a rail transit construction design stage. Adopting an event trigger dictionary matching mode and a manual labeling mode to construct training data of a rail transit event recognition model; training a standard event recognition model by adopting a BERT-BilSTM-CRF algorithm, and automatically extracting standard entry events from a rail transit design standard text; event unification is carried out on the events output by the event recognition model by adopting a word2vec model, cosine similarity clustering and logistic regression two-classification model; adopting a snowball algorithm to construct training data of an event relation model; and training a relationship recognition model by adopting a BERT-BilSTM-ATTENTION-SOFTMAX algorithm, and automatically extracting the relationship between events to form a rail transit event knowledge map. The informatization of the rail transit construction design engineering is improved, and the workload of map construction is reduced.
Drawings
FIG. 1 is a general flowchart of a method for constructing a knowledge graph of a rail transit event based on deep learning according to the present invention;
FIG. 2 is a process of constructing an event training data set by dictionary matching and manual labeling in the rail transit event knowledge graph construction method based on deep learning according to the invention;
FIG. 3 is a process of building a normative event recognition model based on a BERT-BilSTM-CRF algorithm in the rail transit event knowledge map building method based on deep learning according to the invention;
FIG. 4 is a process of event unification of events output by an event recognition model by a word2vec model, cosine similarity clustering and logistic regression two-classification model in the rail transit event knowledge graph construction method based on deep learning of the present invention;
FIG. 5 is a process of constructing training data of an event relation model by adopting a snowball algorithm in the rail transit event knowledge graph construction method based on deep learning;
FIG. 6 is a process of building a relationship recognition model based on a BERT-BilSTM-ATTENTION-SOFTMAX algorithm in the rail transit event knowledge graph building method based on deep learning.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
Referring to fig. 1, the method for constructing the rail transit event knowledge graph based on deep learning specifically comprises the following steps:
step 1, as shown in fig. 2, training data of an event recognition model is constructed by adopting an event-triggered dictionary matching and manual tagging mode for an original text. The pseudo code labeling the training set algorithm is as follows:
and 2, extracting a training set from the rail transit design standard events for preprocessing, dividing texts in the training set by standard entries, and labeling the texts by parts of speech.
And 3, as shown in FIG. 3, training the track traffic design specification event recognition model by using a BERT-BilSTM-CRF algorithm on the text processed in the step 2. The pseudo code for constructing the event recognition model is as follows:
and 4, as shown in fig. 5, constructing an event relation recognition model training set by adopting a semi-supervised snowball algorithm on the original text. The snowball algorithm comprises the following specific steps:
step 4.1, manually marking a small number of event relations to form an event relation table; each event relationship is to an event relationship table.
Step 4.2, matching the original sentence containing the event in the event relation table in the original text by using the existing event relation table, and generating a template; the format of the template is five-tuple form, which is < left >, event 1 type, < middle >, event 2 type, < right > respectively; len is a length which can be set arbitrarily, < left > is a vector representation of len words on the left side of the event 1, < middle > is a vector representation of words between the event 1 and the event 2, and < right > is a vector representation of len words on the right side of the event; the event 1 type is a numerical definition event, and the event 2 type is a numerical definition event.
4.3, clustering the generated templates, clustering the templates with the similarity larger than a threshold value of 0.7 into a class, generating a new template by using an averaging method, and adding the new template into a rule base for storing the templates; . The format of the template known from step 4.2 can be written asE1,E2Respectively indicating an event 1 type and an event 2 type of the template P,represents E1The left 3-vocabulary length vector representation,represents E1,E2The vector representation of the vocabulary in between,represents E2Vector representation of the three lexical lengths on the right. Similarity calculation between templates, exemplified as follows, template 1:template 2:if the condition E is satisfied1=E1'&&E2=E'2I.e. satisfy the template P1Event 1 type E of1And a template P2Event 1 type E'1Identical and template P1Event 2 type E of2And a template P2Event 2 type E'2Same, then template P1And a template P2Can be determined byCalculated as mu1μ2μ3Are weighted becauseThe calculation result of the similarity between the templates is greatly influenced, and mu can be set2>μ1>μ3(ii) a If the condition E is not satisfied1=E1'&&E2=E'2Then template P1And a template P2The similarity of (c) can be noted as 0.
Step 4.4, firstly, scanning the original text by using the event recognition model trained in the step 3, recognizing the event type contained in the text, then, matching the original text by using the template in the rule base generated in the step 4.3, and converting the text obtained by matching into a five-tuple form of the template;
step 4.5, similarity calculation is carried out on the new template generated in the step 4.4 and templates in the rule base, the template with the similarity smaller than the threshold value of 0.7 is discarded, and the event in the template with the similarity larger than the threshold value of 0.7 is added into the event relation table;
and 4.6, repeatedly executing the steps 4.2-4.5 until the original text processing is finished.
And 5, extracting a training set from the rail transit design specification event relation generated in the step 4 for preprocessing, and dividing the text in an event pair mode.
And 6, training the text processed in the step 5 by using a BERT-BilSTM-ATTENTION-SOFTMAX algorithm to train a relationship recognition model. The pseudo code for constructing the event relationship recognition model is as follows, as shown in FIG. 6:
and 7, preprocessing the rail transit design specification to divide the items according to the specification.
And 8, inputting the rail transit standard text preprocessed in the step 7 into the event recognition model generated in the step 3, and extracting events in the standard, wherein the events comprise event trigger words and event elements.
And 9, unifying the events identified in the step 8 as shown in fig. 4. There is text in the specification text that refers to the same event; in order to avoid a great deal of redundant information in the event database; the event unified processing algorithm is adopted, and comprises the following specific steps:
step 9.1, training a word2vec model by using the original track traffic text;
step 9.2, inputting a rail transit event by using the word2vec model generated in the step 9.1 to generate an event vector;
step 9.3, calculating the similarity between the events by utilizing the cosine function value, and clustering the events into a class according to the similarity value of more than 0.8; the cosine function is as follows:
9.4, generating a new event in the step 9.3, and randomly combining all the events to calculate the similarity between event pairs;
and 9.5, inputting the similarity of the event pair and the event into a trained logistic regression two-classification model, and judging the similarity of the event. The logistic regression mathematical model is as follows:
and 9.6, according to the classification result of the step 9.5, if the events are similar, discarding one event, and if the events are not similar, storing both events.
And step 10, storing the event identified in the step 9 into an event database.
And 11, storing the events identified in the step 9 into a database in the form of a triple of 'event element-relation-event trigger'. For example, "orbital center runway surface as emergency evacuation channel" is stored in the graph database with < orbital center runway surface, subject, as > and < emergency evacuation channel, subject, as >.
And step 12, taking out the events from the event database generated in the step 10, forming event pairs, inputting the event pairs into the event relation recognition model generated in the step 6, and extracting the relation among the events in the specification.
And step 13, storing the event pairs in the step 10 and the event relations extracted in the step 12 into a database in the form of a triple group of 'event trigger-relation-event trigger'. For example, the event relationship between "the track center lane bed surface is used as an emergency evacuation lane" and "the train end vehicles should be provided with special end doors and be provided with getting-off facilities" is stored in the map database as < as, conditional relationship, setup >.
The method adopts an event triggering dictionary matching mode and a manual labeling mode to construct training data of a rail transit event recognition model; training a standard event recognition model by adopting a BERT-BilSTM-CRF algorithm, and automatically extracting standard entry events from a rail transit design standard text; event unification is carried out on the events output by the event recognition model by adopting a word2vec model, cosine similarity clustering and logistic regression two-classification model; adopting a snowball algorithm to construct training data of an event relation model; and training a relationship recognition model by adopting a BERT-BilSTM-ATTENTION-SOFTMAX algorithm, and automatically extracting the relationship between events to form a rail transit event knowledge map. The informatization of the rail transit construction design engineering is improved, and the workload of map construction is reduced.
Claims (7)
1. A rail transit incident knowledge map construction method based on deep learning is characterized in that an incident trigger dictionary matching mode and a manual labeling mode are adopted to construct rail transit incident recognition model training data; training a standard event recognition model by adopting a BERT-BilSTM-CRF algorithm, and automatically extracting standard entry events from a rail transit design standard text; event unification is carried out on the events output by the event recognition model by adopting a word2vec model, cosine similarity clustering and logistic regression two-classification model; adopting a snowball algorithm to construct training data of an event relation model; and training a relationship recognition model by adopting a BERT-BilSTM-ATTENTION-SOFTMAX algorithm, and automatically extracting the relationship between events to form a rail transit event knowledge map.
2. The rail transit event knowledge graph construction method based on deep learning according to claim 1, is characterized by specifically comprising the following steps:
step 1, adopting an event triggering dictionary matching and manual labeling mode to an original text to construct training data of an event recognition model.
And 2, extracting a training set from the rail transit design standard events for preprocessing, dividing texts in the training set by standard entries, and labeling the texts by parts of speech.
And 3, training the text processed in the step 2 by using a BERT-BilSTM-CRF algorithm to train a rail transit design specification event recognition model.
And 4, constructing event relation training data by adopting a snowball algorithm on the original text.
And 5, extracting a training set from the rail transit design specification event relation generated in the step 4 for preprocessing, and dividing texts in the training set in an event pair mode.
And 6, training the text processed in the step 5 by using a BERT-BilSTM-ATTENTION-SOFTMAX algorithm to train a relationship recognition model.
And 7, preprocessing the rail transit design specification to divide the items according to the specification.
And 8, inputting the rail transit standard text preprocessed in the step 7 into the event recognition model generated in the step 3, and extracting events in the standard, wherein the events comprise event trigger words and event elements.
And 9, unifying the events identified in the step 8.
And step 10, storing the event identified in the step 9 into an event database.
And 11, storing the events identified in the step 9 into a database in the form of a triple of 'event element-relation-event trigger'.
And step 12, taking out the events from the event database generated in the step 10, forming event pairs, inputting the event pairs into the event relation recognition model generated in the step 6, and extracting the relation among the events in the specification.
And step 13, storing the event pairs in the step 10 and the event relations extracted in the step 12 into a database in the form of a triple group of 'event trigger-relation-event trigger'.
3. The method for building the knowledge graph of the rail transit events based on deep learning according to claim 2, wherein in the step 1, the events are composed of event trigger words and event elements; because most event trigger words are fixed words, manual labeling is accelerated by adopting a dictionary matching mode, and model training data are constructed; dictionary expansion may be by way of a synonym forest.
4. The method for constructing the rail transit event knowledge graph based on deep learning of claim 2, wherein in the step 3, an event recognition model is trained by using a BERT-BilSTM-CRF algorithm, and the whole model consists of three parts, namely a BERT layer, a BilSTM layer and a CRF layer; the BERT pre-training model is used for obtaining a word vector containing the normative context feature information, the BilSTM layer is used for feature extraction, the sequence information of the whole text is utilized, and the CRF layer is used for learning the constraint condition of the sentence and filtering the wrong prediction sequence.
5. The method for building a knowledge graph of rail transit events based on deep learning according to claim 2, wherein in the step 4, a semi-supervised snowball algorithm is used to build a training set of event relation recognition models. The snowball algorithm comprises the following specific steps:
step 4.1, manually marking a small number of event relations to form an event relation table; each event relationship is to an event relationship table.
Step 4.2, matching the original sentence containing the event in the event relation table in the original text by using the existing event relation table, and generating a template; the format of the template is five-tuple form, which is < left >, event 1 type, < middle >, event 2 type, < right > respectively; len is a length which can be set arbitrarily, < left > is a vector representation of len words on the left side of the event 1, < middle > is a vector representation of words between the event 1 and the event 2, and < right > is a vector representation of len words on the right side of the event; the event 1 type is a numerical definition event, and the event 2 type is a numerical definition event.
4.3, clustering the generated templates, clustering the templates with the similarity larger than a threshold value of 0.7 into a class, generating a new template by using an averaging method, and adding the new template into a rule base for storing the templates; . The format of the template known from step 4.2 can be written asE1,E2Respectively indicating an event 1 type and an event 2 type of the template P,represents E1The left 3-vocabulary length vector representation,represents E1,E2The vector representation of the vocabulary in between,represents E2Vector representation of the three lexical lengths on the right. Similarity calculation between templates, exemplified as follows, template 1:template 2:if the condition E is satisfied1=E′1&&E2=E′2I.e. satisfy the template P1Event 1 type E of1And a template P2Event 1 type E'1Identical and template P1Event 2 type E of2And a template P2Event 2 type E'2Same, then template P1And a template P2Can be determined byCalculated as mu1μ2μ3Are weighted becauseThe calculation result of the similarity between the templates is greatly influenced, and mu can be set2>μ1>μ3(ii) a If the condition E is not satisfied1=E′1&&E2=E′2Then template P1And a template P2The similarity of (c) can be noted as 0.
Step 4.4, firstly, scanning the original text by using the event recognition model trained in the step 3, recognizing the event type contained in the text, then, matching the original text by using the template in the rule base generated in the step 4.3, and converting the text obtained by matching into a five-tuple form of the template;
step 4.5, similarity calculation is carried out on the new template generated in the step 4.4 and templates in the rule base, the template with the similarity smaller than the threshold value of 0.7 is discarded, and the event in the template with the similarity larger than the threshold value of 0.7 is added into the event relation table;
and 4.6, repeatedly executing the steps 4.2-4.5 until the original text processing is finished.
6. The method for building the knowledge graph of the rail transit events based on deep learning as claimed in claim 2, wherein in the step 6, a relation recognition model is trained by using a BERT-BilSTM-ATTENTION-SOFTMAX algorithm; the whole model consists of four parts, namely a BERT layer, a BilSTM layer, an ATTENTION layer and a SOFTMAX layer; the BERT pre-training model is used for obtaining a word vector containing normative context feature information, the BilSTM layer is used for feature extraction, sequence information of the whole text is utilized, the ATTENTION layer is used for calculating ATTENTION probability to highlight the importance degree of a key word in the text, the SOFTMAX layer is used for generating the probability of various relation classes, and the maximum class probability is taken as a model prediction class.
7. The method for building a knowledge graph of track traffic events based on deep learning as claimed in claim 2, wherein in the step 9, texts which refer to the same event exist in the specification texts; in order to avoid a great deal of redundant information in the event database; the event unified processing algorithm is adopted, and comprises the following specific steps:
step 9.1, training a word2vec model by using the original track traffic text;
step 9.2, inputting a rail transit event by using the word2vec model generated in the step 9.1 to generate an event vector;
step 9.3, calculating the similarity between the events by utilizing the cosine function value, and clustering the events into a class according to the similarity value of more than 0.8; the cosine function is as follows:
9.4, generating a new event in the step 9.3, and randomly combining all the events to calculate the similarity between event pairs;
and 9.5, inputting the similarity of the event pair and the event into a trained logistic regression two-classification model, and judging the similarity of the event. The logistic regression mathematical model is as follows:
and 9.6, according to the classification result of the step 9.5, if the events are similar, discarding one event, and if the events are not similar, storing both events.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010365826.3A CN111597350B (en) | 2020-04-30 | 2020-04-30 | Rail transit event knowledge graph construction method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010365826.3A CN111597350B (en) | 2020-04-30 | 2020-04-30 | Rail transit event knowledge graph construction method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111597350A true CN111597350A (en) | 2020-08-28 |
CN111597350B CN111597350B (en) | 2023-06-02 |
Family
ID=72186939
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010365826.3A Active CN111597350B (en) | 2020-04-30 | 2020-04-30 | Rail transit event knowledge graph construction method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111597350B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112131401A (en) * | 2020-09-14 | 2020-12-25 | 腾讯科技(深圳)有限公司 | Method and device for constructing concept knowledge graph |
CN112418696A (en) * | 2020-11-27 | 2021-02-26 | 北京工业大学 | Method and device for constructing urban traffic dynamic knowledge map |
CN112463989A (en) * | 2020-12-11 | 2021-03-09 | 交控科技股份有限公司 | Knowledge graph-based information acquisition method and system |
CN112733874A (en) * | 2020-10-23 | 2021-04-30 | 招商局重庆交通科研设计院有限公司 | Suspicious vehicle discrimination method based on knowledge graph reasoning |
CN112800762A (en) * | 2021-01-25 | 2021-05-14 | 上海犀语科技有限公司 | Element content extraction method for processing text with format style |
CN113268591A (en) * | 2021-04-17 | 2021-08-17 | 中国人民解放军战略支援部队信息工程大学 | Air target intention evidence judging method and system based on affair atlas |
CN113535979A (en) * | 2021-07-14 | 2021-10-22 | 中国地质大学(北京) | Method and system for constructing knowledge graph in mineral field |
CN113546426A (en) * | 2021-07-21 | 2021-10-26 | 西安理工大学 | Security policy generation method for data access event in game service |
CN113987164A (en) * | 2021-10-09 | 2022-01-28 | 国网江苏省电力有限公司电力科学研究院 | Project studying and judging method and device based on domain event knowledge graph |
CN115269931A (en) * | 2022-09-28 | 2022-11-01 | 深圳技术大学 | Rail transit station data map system based on service drive and construction method thereof |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018028077A1 (en) * | 2016-08-11 | 2018-02-15 | 中兴通讯股份有限公司 | Deep learning based method and device for chinese semantics analysis |
CN107908671A (en) * | 2017-10-25 | 2018-04-13 | 南京擎盾信息科技有限公司 | Knowledge mapping construction method and system based on law data |
CN110633409A (en) * | 2018-06-20 | 2019-12-31 | 上海财经大学 | Rule and deep learning fused automobile news event extraction method |
-
2020
- 2020-04-30 CN CN202010365826.3A patent/CN111597350B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018028077A1 (en) * | 2016-08-11 | 2018-02-15 | 中兴通讯股份有限公司 | Deep learning based method and device for chinese semantics analysis |
CN107908671A (en) * | 2017-10-25 | 2018-04-13 | 南京擎盾信息科技有限公司 | Knowledge mapping construction method and system based on law data |
CN110633409A (en) * | 2018-06-20 | 2019-12-31 | 上海财经大学 | Rule and deep learning fused automobile news event extraction method |
Non-Patent Citations (2)
Title |
---|
洪文兴等: "面向司法案件的案情知识图谱自动构建", 《中文信息学报》 * |
项威: "事件知识图谱构建技术与应用综述", 《计算机与现代化》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112131401B (en) * | 2020-09-14 | 2024-02-13 | 腾讯科技(深圳)有限公司 | Concept knowledge graph construction method and device |
CN112131401A (en) * | 2020-09-14 | 2020-12-25 | 腾讯科技(深圳)有限公司 | Method and device for constructing concept knowledge graph |
CN112733874A (en) * | 2020-10-23 | 2021-04-30 | 招商局重庆交通科研设计院有限公司 | Suspicious vehicle discrimination method based on knowledge graph reasoning |
CN112418696A (en) * | 2020-11-27 | 2021-02-26 | 北京工业大学 | Method and device for constructing urban traffic dynamic knowledge map |
CN112418696B (en) * | 2020-11-27 | 2024-06-18 | 北京工业大学 | Construction method and device of urban traffic dynamic knowledge graph |
CN112463989A (en) * | 2020-12-11 | 2021-03-09 | 交控科技股份有限公司 | Knowledge graph-based information acquisition method and system |
CN112800762A (en) * | 2021-01-25 | 2021-05-14 | 上海犀语科技有限公司 | Element content extraction method for processing text with format style |
CN113268591A (en) * | 2021-04-17 | 2021-08-17 | 中国人民解放军战略支援部队信息工程大学 | Air target intention evidence judging method and system based on affair atlas |
CN113535979A (en) * | 2021-07-14 | 2021-10-22 | 中国地质大学(北京) | Method and system for constructing knowledge graph in mineral field |
CN113546426B (en) * | 2021-07-21 | 2023-08-22 | 西安理工大学 | Security policy generation method for data access event in game service |
CN113546426A (en) * | 2021-07-21 | 2021-10-26 | 西安理工大学 | Security policy generation method for data access event in game service |
CN113987164A (en) * | 2021-10-09 | 2022-01-28 | 国网江苏省电力有限公司电力科学研究院 | Project studying and judging method and device based on domain event knowledge graph |
CN115269931A (en) * | 2022-09-28 | 2022-11-01 | 深圳技术大学 | Rail transit station data map system based on service drive and construction method thereof |
CN115269931B (en) * | 2022-09-28 | 2022-11-29 | 深圳技术大学 | Rail transit station data map system based on service drive and construction method thereof |
Also Published As
Publication number | Publication date |
---|---|
CN111597350B (en) | 2023-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111597350A (en) | Rail transit event knowledge map construction method based on deep learning | |
CN109271631B (en) | Word segmentation method, device, equipment and storage medium | |
CN111931506B (en) | Entity relationship extraction method based on graph information enhancement | |
CN111209401A (en) | System and method for classifying and processing sentiment polarity of online public opinion text information | |
CN111858932A (en) | Multiple-feature Chinese and English emotion classification method and system based on Transformer | |
CN114036933B (en) | Information extraction method based on legal documents | |
CN111783399A (en) | Legal referee document information extraction method | |
CN112906397B (en) | Short text entity disambiguation method | |
CN111832293B (en) | Entity and relation joint extraction method based on head entity prediction | |
CN112084336A (en) | Entity extraction and event classification method and device for expressway emergency | |
CN110717045A (en) | Letter element automatic extraction method based on letter overview | |
CN113204967B (en) | Resume named entity identification method and system | |
CN111897917B (en) | Rail transit industry term extraction method based on multi-modal natural language features | |
CN113239663B (en) | Multi-meaning word Chinese entity relation identification method based on Hopkinson | |
CN113934909A (en) | Financial event extraction method based on pre-training language and deep learning model | |
CN112818698A (en) | Fine-grained user comment sentiment analysis method based on dual-channel model | |
CN111597349B (en) | Rail transit standard entity relation automatic completion method based on artificial intelligence | |
CN114239574A (en) | Miner violation knowledge extraction method based on entity and relationship joint learning | |
CN116432645A (en) | Traffic accident named entity recognition method based on pre-training model | |
CN116010553A (en) | Viewpoint retrieval system based on two-way coding and accurate matching signals | |
CN116522165B (en) | Public opinion text matching system and method based on twin structure | |
CN116910272B (en) | Academic knowledge graph completion method based on pre-training model T5 | |
CN112651241A (en) | Chinese parallel structure automatic identification method based on semi-supervised learning | |
Wu et al. | One improved model of named entity recognition by combining BERT and BiLSTM-CNN for domain of Chinese railway construction | |
CN111522913A (en) | Emotion classification method suitable for long text and short text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |