CN111597349B

CN111597349B - Rail transit standard entity relation automatic completion method based on artificial intelligence

Info

Publication number: CN111597349B
Application number: CN202010363261.5A
Authority: CN
Inventors: 朱磊; 冯林林; 黑新宏; 刘尧林; 吕泓瑾; 张晋源; 林泓; 刘瑞; 刘旭华
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2022-10-11
Anticipated expiration: 2040-04-30
Also published as: CN111597349A

Abstract

The invention discloses an artificial intelligence-based automatic track traffic standard entity relationship completion method, which comprises the steps of constructing an entity relationship completion model, inputting track traffic standards and noun part-of-speech participles into the entity relationship completion model, judging whether the input standards are simple sentences or not, if the input standards are the simple sentences, searching entity related attributes in the track traffic standards, generating entity relationship triples, and if the input standards are not the simple sentences, extracting track traffic standard rear sentence attribute words and entities to enable front sentence entities and rear sentence attribute words n to be as follows: n, matching or judging whether the grammar of the front sentence is a principal object and the grammar of the rear sentence is an object supplement, if so, directly matching the entity of the front sentence with the object, directly matching the keyword of the rear sentence with the entity of the object, generating an entity relationship triple, if not, outputting the entity and entity relationship of which the vocabulary relevancy exceeds a threshold value, generating an entity relationship triple, obtaining a complete semantic structure entity specification, and finishing the automatic completion of the track traffic specification entity relationship.

Description

Automatic completion method for track traffic standard entity relationship based on artificial intelligence

Technical Field

The invention belongs to the technical field of artificial intelligence natural language processing, and relates to an automatic track traffic standard entity relationship completion method based on artificial intelligence.

Background

The knowledge graph is a semantic knowledge base, and the knowledge graph stores knowledge by adopting triples. The knowledge graph can promote a computer to better understand natural language, provide better service for people, and realize natural switching of human and machine, and entity identification and relation extraction are important processes for constructing the knowledge graph.

Because Chinese grammar is complex, sentence structure is disordered, no strict limitation exists, entity components in part of standard documents are missing, and description texts can be generated by meeting basic semantics. These reasons have led to large differences in writing of specifications in various professional areas. Therefore, when a knowledge graph in the professional field is constructed, although the entities can be extracted from the specification entries, the relationship between the entities cannot be judged due to the missing entity components.

In recent years, the rail transit of China is rapidly developed, so that the production and living modes of people are deeply influenced, and the development of social productivity is greatly promoted. The design standard of rail transit is the main basis of design, construction, detection and maintenance of rail transit, and it provides clear requirements and explanation for each part of the design of rail transit. Therefore, the entities and the relations in the specification are extracted to construct a knowledge graph, and then the knowledge graph is manufactured into the design, detection and construction of the subway by combining the technologies of retrieval, reasoning, query and the like of the knowledge graph. And a plurality of constraints of each single object can be inquired on a website or an application program generated by the correlation of the knowledge graph according to keyword search. The construction personnel can design and construct according to the query result when constructing each independent object in the rail transit, and the detection personnel can directly compare according to the displayed constraint when detecting whether the project meets the standard. Therefore, the construction personnel can more conveniently design and detect whether the subway design meets the standard or not.

The missing entity relation components are supplemented, and the method is a very important basis in the construction work of the knowledge graph. The missing entity relationship directly affects the information extraction, and further affects the structure of nodes and edges in the knowledge graph, so that the reasoning performance of the knowledge graph is greatly reduced. The automatic map construction of the rail transit standard and the later inquiry, reasoning and intelligent question and answer can be carried out by complementing the entity relationship.

Because the map construction and the normative natural language processing of each field are in the bud stage at present, the existing probability model and the dictionary are lack of a large amount of accurate labeled documents. However, these problems require professional practitioners to perform a lot of analysis and processing, and perform manual entity relationship completion, which is time-consuming and labor-consuming; and the design specification relates to more than 30 industries, so that personnel can only carry out completion treatment from the prior experience of the personnel, and the problem of low accuracy of the completion entity relationship is caused.

Disclosure of Invention

The invention aims to provide an artificial intelligence-based automatic completion method for a rail transit standard entity relationship, which solves the problems that the conventional method for completing the rail transit standard entity relationship can only be manually performed, is time-consuming and labor-consuming and has low accuracy.

The technical scheme adopted by the invention is that an artificial intelligence-based automatic completion method for the track traffic standard entity relationship comprises the following steps:

step 1: constructing an entity relation completion model according to the rail transit standard;

and 2, step: performing part-of-speech tagging on the rail transit specification, and extracting noun part-of-speech participles in the rail transit specification;

and 3, step 3: inputting all rail transit specifications and the extracted noun part-of-speech participles into an entity relationship completion model, wherein the extracted noun part-of-speech participles serve as entities to be completed; judging whether each input rail transit standard is a simple sentence or not by using a symbol detection method, if so, performing a step 4, and if not, performing a step 5;

and 4, step 4: searching entity related attributes in the rail transit specification, judging the relationship between the entity and the entity, generating entity relationship triples and storing the entity relationship triples;

and 5: performing deep learning-based dependency syntax analysis on the rail transit standard, if the former sentence is a Noun Phrase (Noun Phrase, NP) with a parallel structure, performing step 6, and if not, performing step 7;

step 6: extracting the attribute words and entities of the back sentence, matching the attribute words n of the front sentence with the attribute words n of the back sentence, generating entity relation triples and storing the entity relation triples;

and 7: judging whether the grammar of the front sentence is a main object or a subordinate object and the grammar of the back sentence is an object supplement, if so, performing a step 8, and if not, performing a step 9;

and 8: directly matching the front sentence entity with the object, and directly matching the rear sentence keyword with the object entity to generate an entity relationship triple and store the entity relationship triple;

and step 9: calculating the vocabulary relevancy, outputting the entity-entity relationship with the relevancy exceeding a threshold value, generating an entity-relationship triple, and storing the entity-relationship triple;

step 10: and (4) outputting the entity relationship triples generated in the steps 4, 8 and 9 to obtain a complete semantic structure entity specification, namely completing the automatic completion of the track traffic specification entity relationship.

The present invention is also technically characterized in that,

an entity-relationship triple is "entity-verb-entity" or "entity-degree-attribute".

The specific operation steps of step 1 are as follows:

step 1.1: acquiring rail transit standard textual data, preprocessing and training the acquired rail transit standard textual data, and generating a dictionary;

step 1.2: processing the dictionary, mining the missing features, and extracting entity completion rules and methods;

step 1.3: and constructing an entity relation completion model by adopting the extracted entity completion rule and method.

In the step 2, a Bi-LSTM + CRF part-of-speech tagging model is adopted to tag the part-of-speech of the rail transit standard, adjective part-of-speech participles are divided into attribute words, and verb part-of-speech participles are used for judging the relation between the entities.

In step 3, the simple sentence is a sentence containing only a pause and a period.

The specific operation steps of step 4 are as follows:

step 4.1: searching attributes and action relations related to the entities in the rail transit specification;

step 4.2: extracting verbs, judging the relation between the entities, and analyzing the part of speech to extract attributes;

step 4.4: and generating an entity relationship triple according to the entity-entity relationship and the extracted attributes, and storing the entity relationship.

The specific operation steps of step 9 are as follows:

step 9.1: judging whether the front sentence is in a non-parallel structure or not and the rear sentence has a substitute word or not according to the rail transit specification, if so, calculating the vocabulary relevancy between the attribute word behind the substitute word and all entities of the front sentence, otherwise, naming the entity recognition word group and calculating the vocabulary relevancy between all the participles;

step 9.2: and outputting the entity and entity relationship with the vocabulary relevancy exceeding the threshold value, generating an entity relationship triple and storing the entity relationship triple.

In step 9, the vocabulary relevancy is calculated by using a vocabulary relevancy calculation algorithm based on the word bank of the known network.

The calculation algorithm of the vocabulary relevancy based on the word bank of the known network comprises the following steps:

rel(w ₁ ，w ₂ )＝max{a ₁ *sim(s ₁ ，s ₂ )+(1-a ₁ )*asso(s ₁ ，s ₂ )}

ssso(s ₁ ，s ₂ )＝∑r _i *asso(p ₁ ，p ₂ )

in the above formula, rel (w) ₁ ，w ₂ ) The expression w ₁ And the word w ₂ Correlation of (c), sim(s) ₁ ，s ₂ ) The expression vocabulary w ₁ And the word w ₂ Similarity of (a), asso(s) ₁ ，s ₂ ) Representing an entity s ₁ And s ₂ The semantic relevance of (2); alpha ₁ Expressing adjustable parameters for linear harmony of the similarity and the semantic relevance, and the value range of the adjustable parameters is [0,1 ]]；s _li The expression w ₁ I = 1.. N denotes the vocabulary w ₁ Having n sense items; s _2j The expression vocabulary w ₂ Is said to mean that m represents the vocabulary w ₂ Having m sense items; gamma ray _i The semantic correlation coefficient representing different parts in the entity concept is the fitting to each part of the two concepts, and simultaneously, the sigma r must be satisfied _i ＝1，p ₁ Is an item of sense s ₁ Of (2) a sense atom, p ₂ Is an item of sense s ₂ The sense of (1).

The method has the advantages that the rail transit normative relation is obtained through deep learning, then the relation among entity type nouns is supplemented according to the semantic method of the known network, the relation completion in the construction process of the knowledge map is completed, the workload of manually constructing the map and the relation database is greatly reduced, the accuracy of the rail transit normative entity relation completion is improved, the structural accuracy of the rail transit normative knowledge map is improved, and a foundation is laid for intelligent rail transit query, reasoning and question and answer based on the knowledge map;

through deep learning, the recognition degree of the entities and the relations thereof is improved, and the semantic similarity of the entity words in the knowledge network is adopted for automatic judgment, so that the supplement of the entity relations is realized, and a insist foundation is provided for constructing a knowledge graph.

Drawings

FIG. 1 is a flow chart of an automatic completion method for track traffic regulation entity relationship based on artificial intelligence;

FIG. 2 is a schematic diagram of a process of performing part-of-speech tagging on a rail transit specification by using a BI + LSTM + CRF part-of-speech tagging model in the automatic completion method for a rail transit specification entity relationship according to the present invention;

FIG. 3 is a schematic diagram of a process of calculating vocabulary relevancy by using a vocabulary relevancy calculation algorithm based on a known network lexicon in the rail transit normative entity relationship automatic completion method of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention discloses an artificial intelligence-based automatic completion method for a track traffic regulation entity relationship, which comprises the following steps of:

step 1: building entity relation completion model according to rail transit standard

Step 1.1: acquiring rail transit standard original text data from urban rail transit technical specifications, carrying out format check on the acquired original text data, deleting unnecessary information such as blank spaces and the like, acquiring preprocessed data, and then training the preprocessed data to generate a dictionary;

step 1.2: processing data in the dictionary, mining the missing features, and extracting entity completion rules and methods;

step 1.3: and constructing an entity relationship completion model by adopting the extracted entity completion rule and method.

Step 2: referring to fig. 2, a Bi-LSTM + CRF part-of-speech tagging model is adopted to tag the part-of-speech of the rail transit standard, data in a dictionary are converted into an hdf5 format, then the data are input into the Bi-LSTM + CRF part-of-speech tagging model, noun part-of-speech participles in the extracted rail transit standard are output, adjective part-of-speech participles are divided into attribute words, and verb part-of-speech participles are used for judging the relation between an entity and the entity.

The specific process of performing part-of-speech tagging on the rail transit specification by adopting the Bi-LSTM + CRF part-of-speech tagging model is shown as the following table:

and step 3: inputting all rail transit standard texts and the extracted noun part-of-speech participles into an entity relationship completion model, wherein the extracted noun part-of-speech participles serve as entities to be completed; then, judging whether each input rail traffic specification is a simple sentence or not by using a symbol detection method, wherein the simple sentence is a sentence only containing a pause sign and a period sign and comprises a subject, a predicate or an object;

if the simple sentence is the simple sentence, performing step 4, if the simple sentence is not the complex sentence, wherein the complex sentence usually comprises a plurality of subjects, a plurality of predicates and a plurality of objects, performing step 5;

and 4, step 4: searching entity related attributes in the rail transit specification, judging the relationship between the entity and the entity, generating entity relationship triples and storing the entity relationship triples; the method specifically comprises the following operation steps:

and 4.2: extracting verbs, judging the relation between the entities, and analyzing the part of speech to extract attributes;

step 4.4: and generating entity relationship triples according to the entity-entity relationship and the extracted attributes, and storing the entity relationship.

And 5: performing deep learning-based dependency syntax analysis on the rail transit standard, if the preceding sentence is an NP phrase with a parallel structure, performing step 6, and if not, performing step 7;

step 6: extracting the attribute words and entities of the later sentence to ensure that the attribute words n of the earlier sentence and the attribute words n of the later sentence are: n, matching, generating entity relationship triples, namely 'entity-verb-entity' or 'entity-degree-attribute', and storing;

and 7: judging whether the grammar of the front sentence is a main predicate object and the grammar of the rear sentence is an object supplement, if so, performing a step 8, and if not, performing a step 9;

and step 8: directly matching the former sentence entity with the object, and directly matching the latter sentence keyword with the object entity to generate an entity relationship triple, namely 'entity-verb-entity' or 'entity-degree-attribute', and storing the entity relationship triple;

and step 9: referring to fig. 3, the vocabulary relevancy is calculated by adopting a vocabulary relevancy calculation algorithm based on a known network word library, the entity and entity relationship with the relevancy exceeding a threshold value is output, and an entity relationship triple, namely an entity-verb-entity or an entity-degree-attribute, is generated and stored;

the specific operation steps of step 9 are as follows:

step 9.1: judging whether the front sentence is in a non-parallel structure or not and the rear sentence has a substitute word or not, if so, calculating the vocabulary relevancy between the attribute word after the substitute word and all entities of the front sentence, if not, naming the entity recognition phrase, then calculating the vocabulary relevancy between all the participles, and calculating the relevancy between all the participles;

the vocabulary relevancy calculation algorithm based on the word bank of the known network is as follows:

the 'Zhi Wang' adopts the minimal unit which is most basic in meaning and is not suitable for being divided, adopts 1618 meaning sources in total, and describes 62174 concept entities. In the book of Zhi, an entity-class-meaning primitive phrase w is set ₁ And w ₂ If w is ₁ There are different concepts (semantic terms): s is ₁₁ ，s ₁₂ ，……,s _1n ，w ₂ There are different concepts (semantic terms): s ₂₁ ，s ₂₂ ，……,s _2m The vocabulary relevancy calculation algorithm based on the word bank of the known network is as follows:

rel(w _l ，w ₂ )＝max{a ₁ *sim(s ₁ ，s ₂ )-(1-a ₁ )*asso(s ₁ ，s ₂ )}

asso(s ₁ ，s ₂ )＝Σr _i *asso(p ₁ ，p ₂ )

in the above formula, rel (w) ₁ ，w ₂ ) Expression wordHui W ₁ And the word w ₂ Correlation of (c), sim(s) ₁ ,s ₂ ) The expression vocabulary w ₁ And the word w ₂ Similarity of (c), asso(s) ₁ ,s ₂ ) Representing an entity s ₁ And s ₂ The semantic relevance of (2); alpha ₁ Expressing adjustable parameters for linear harmony of the similarity and the semantic relevance, and the value range of the adjustable parameters is [0,1 ]]；s _1i The expression vocabulary w ₁ I =1, \8230, n represents a word w ₁ Having n sense items; s is _2j The expression vocabulary w ₂ J =1, \ 8230;, m represents the word w ₂ Having m sense items; gamma ray _i The semantic correlation coefficient representing different parts in the entity concept is the fitting to each part of the two concepts, and simultaneously, the sigma r must be satisfied _i ＝1，p ₁ Is an item of significance s ₁ Prosthetic group of (a), p ₂ Is an item of sense s ₂ The sense of (1).

According to the three formulas, the semantic similarity between the entities of the two words is calculated. If the similarity of the entities is higher, the correlation degree between the entities is higher; the greater the degree of association between the entity senses of two words, the greater their similarity. And then, calculating after linear combination adjustment is carried out on the similarity and the relevance to obtain the final semantic similarity.

Step 9.2: and outputting the entity and entity relationship with the vocabulary relevancy exceeding the threshold value, generating an entity relationship triple, namely 'entity-verb-entity' or 'entity-degree-attribute', and storing. Wherein the threshold value is determined by preliminary experiments in the rail transit specification entry.

For example: the running speed of the train on the plane curve is calculated according to the radius of the curve, and the unbalanced transverse acceleration of the train is not suitable to exceed 0.4m/s ² . Firstly, the sentence structure is analyzed, and according to the rule, the attribute that the radius is a plane curve and the operationBoth the line speed and the equilibrium lateral acceleration are attributes of a "train". For another example, "the bogie performance and the main size should be matched with the train body and the train line, and the related components should be ensured to be within the allowable abrasion limit, so that the train can be ensured to safely and smoothly run at the maximum allowable speed. Firstly, the complex sentences are divided, the first sentence "performance" and "size" of the complex sentences are parallel attributes, the bogies are entities of the complex sentences, the performance and the size are matched with the train body and the line, the related components can be distinguished as attribute words of the bogies according to word relevancy, the train and the speed are in attribute relation, and the train and the running are in action relation.

The automatic completion method for the track traffic standard entity relationship reasonably and effectively solves the problem of the wrong completion of the entity relationship caused by the entity, the attribute and the relationship due to the fact that no definite semantics exist when the entity is in a missing relationship. After the model processing according to the invention is carried out, the semantic relation of the missing of the completion entity can be improved, and the relevance among the entities and the accuracy of the entity relation can be improved.

Claims

1. An automatic track traffic regulation entity relationship completion method based on artificial intelligence is characterized by comprising the following steps:

step 1: constructing an entity relationship completion model according to the rail transit standard;

step 2: performing part-of-speech tagging on the rail transit specification, and extracting noun part-of-speech participles in the rail transit specification;

and step 3: inputting all rail transit specifications and extracted noun part-of-speech participles into an entity relationship completion model, wherein the extracted noun part-of-speech participles serve as entities to be completed; judging whether each input rail transit standard is a simple sentence or not by using a symbol detection method, if so, performing a step 4, and if not, performing a step 5;

and 5: performing dependency syntax analysis based on deep learning on the rail transit standard, if the previous sentence is an NP phrase with a parallel structure, performing step 6, and if not, performing step 7;

step 6: extracting the attribute words and entities of the back sentence, and enabling the attribute words n of the front sentence and the attribute words n of the back sentence to be: n, matching, generating entity relation triples and storing the entity relation triples;

2. The rail transit regulation entity relationship automatic completion method based on artificial intelligence of claim 1, wherein the entity relationship triple is "entity-verb-entity" or "entity-degree-attribute".

3. The method for automatically completing the track traffic regulation entity relationship based on artificial intelligence according to claim 1, wherein the specific operation steps of the step 1 are as follows:

4. The method as claimed in claim 1, wherein in step 2, a Bi-LSTM + CRF part-of-speech tagging model is used to tag part-of-speech of the rail transit specification, wherein the adjective part-of-speech is divided into attribute words, and the verb part-of-speech is used to determine the relationship between the entity and the entity.

5. The method as claimed in claim 4, wherein in the step 3, the simple sentence is a sentence with a pause and a period.

6. The method for automatically completing the track traffic regulation entity relationship based on artificial intelligence according to claim 5, wherein the specific operation steps of the step 4 are as follows:

7. The method for automatically completing the relationship between the rail transit specification entities based on the artificial intelligence as claimed in claim 1, wherein the specific operation steps of the step 9 are as follows:

step 9.1: judging whether the front sentence is in a non-parallel structure or not and the rear sentence has a substitute word or not, if so, calculating the vocabulary relevancy between the attribute word after the substitute word and all entities of the front sentence, otherwise, naming the entity recognition phrase and calculating the vocabulary relevancy between all the participles;

8. The method as claimed in claim 7, wherein in step 9, a calculation algorithm of vocabulary relevancy based on a word bank of a known network is used to calculate the vocabulary relevancy.

9. The method for automatically completing the rail transit regulation entity relationship based on artificial intelligence as claimed in claim 8, wherein the calculation algorithm of the vocabulary relevancy based on the word bank of the known network is as follows:

asso(s ₁ ，s ₂ )＝∑r _i *asso(p ₁ ，p ₂ )

in the above formula, rel (w) ₁ ，w ₂ ) The expression vocabulary w ₁ And the word w ₂ Correlation of (c), sim(s) ₁ ,s ₂ ) The expression w ₁ And the word w ₂ Similarity of (a), asso(s) ₁ ,s ₂ ) Representing an entity s ₁ And s ₂ The semantic relevance of (2); alpha ₁ Expressing adjustable parameters for linear harmony of the similarity and the semantic relevance, and the value range of the adjustable parameters is [0,1 ]]；s _1i The expression vocabulary w ₁ I =1, \8230, n represents a word w ₁ Having n sense items; s _2j The expression vocabulary w ₂ J =1, \ 8230;, m represents the word w ₂ Having m sense items; gamma ray _i The semantic relevance coefficient representing different parts in the entity concept is a fit to each part of the two concepts, and must satisfy sigma r _i ＝1，p ₁ Is an item of sense s ₁ Of (2) a sense atom, p ₂ Is an item of sense s ₂ The sense of (3).