CN111597349A

CN111597349A - Rail transit standard entity relation automatic completion method based on artificial intelligence

Info

Publication number: CN111597349A
Application number: CN202010363261.5A
Authority: CN
Inventors: 朱磊; 冯林林; 黑新宏; 刘尧林; 吕泓瑾; 张晋源; 林泓; 刘瑞; 刘旭华
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2020-08-28
Anticipated expiration: 2040-04-30
Also published as: CN111597349B

Abstract

The invention discloses an automatic track traffic standard entity relationship completion method based on artificial intelligence, which comprises the steps of constructing an entity relationship completion model, inputting track traffic standards and noun part-of-speech participles into the entity relationship completion model, judging whether the input standards are simple sentences, searching entity related attributes in the track traffic standards if the input standards are the simple sentences, generating entity relationship triplets, and extracting track traffic standard back sentence attribute words and entities if the input standards are not the simple sentences to ensure that the front sentence entities and the back sentence attribute words are n: n, matching or judging whether the grammar of the front sentence is a principal object and the grammar of the rear sentence is an object supplement, if so, directly matching the entity of the front sentence with the object, directly matching the keyword of the rear sentence with the entity of the object, generating an entity relationship triple, if not, outputting the entity and entity relationship of which the vocabulary relevancy exceeds a threshold value, generating an entity relationship triple, obtaining a complete semantic structure entity specification, and finishing the automatic completion of the track traffic specification entity relationship.

Description

Rail transit standard entity relation automatic completion method based on artificial intelligence

Technical Field

The invention belongs to the technical field of artificial intelligence natural language processing, and relates to an automatic track traffic standard entity relationship completion method based on artificial intelligence.

Background

The knowledge graph is a semantic knowledge base, and the knowledge graph stores knowledge by adopting triples. The knowledge graph can promote a computer to better understand natural language, provide better service for people, and realize natural switching of human and machine, and entity identification and relation extraction are important processes for constructing the knowledge graph.

Because Chinese grammar is complex, sentence structure is disordered, no strict limitation exists, entity components in part of standard documents are missing, and description texts can be generated by meeting basic semantics. These reasons have led to large differences in writing of specifications in various professional areas. Therefore, when a knowledge graph in the professional field is constructed, although the entities can be extracted from the specification entries, the relationship between the entities cannot be judged due to the missing entity components.

In recent years, the rail transit of China is rapidly developed, so that the production and living modes of people are deeply influenced, and the development of social productivity is greatly promoted. The rail transit design standard is a main basis for rail transit design, construction, detection and maintenance, and provides clear requirements and explanation for each part of rail transit design. Therefore, the entities and the relations in the specification are extracted to construct a knowledge graph, and then the knowledge graph is manufactured into the design, detection and construction of the subway by combining the technologies of retrieval, reasoning, query and the like of the knowledge graph. And a plurality of constraints of each single object can be inquired on a website or an application program generated by the correlation of the knowledge graph according to keyword search. The construction personnel can design and construct according to the query result when constructing each independent object in the rail transit, and the detection personnel can directly compare according to the displayed constraint when detecting whether the project meets the standard. Therefore, the construction personnel can more conveniently design and detect whether the subway design meets the standard or not.

The missing entity relation components are supplemented, and the method is a very important basis in the construction work of the knowledge graph. The missing entity relationship directly affects the information extraction, and further affects the structure of nodes and edges in the knowledge graph, so that the reasoning performance of the knowledge graph is greatly reduced. The automatic map construction of the rail transit standard and the later inquiry, reasoning and intelligent question and answer can be carried out by complementing the entity relationship.

Because the natural language processing for map construction and specification of each domain is in the bud stage, the existing probability model and dictionary-based method lack a large amount of accurate labeled documents. However, these problems require professional practitioners to perform a large amount of analysis and processing, and perform manual entity relationship completion, which is time-consuming and labor-consuming; and the design specification relates to more than 30 industries, so that personnel can only carry out completion treatment from the prior experience of the personnel, and the problem of low accuracy of the completion entity relationship is caused.

Disclosure of Invention

The invention aims to provide an artificial intelligence-based automatic completion method for a rail transit standard entity relationship, which solves the problems that the conventional method for completing the rail transit standard entity relationship can only be manually performed, is time-consuming and labor-consuming and has low accuracy.

The technical scheme adopted by the invention is that an artificial intelligence-based automatic track traffic regulation entity relationship completion method comprises the following steps:

step 1: constructing an entity relation completion model according to the rail transit standard;

step 2: performing part-of-speech tagging on the rail transit specification, and extracting noun part-of-speech participles in the rail transit specification;

and step 3: inputting all rail transit specifications and extracted noun part-of-speech participles into an entity relationship completion model, wherein the extracted noun part-of-speech participles serve as entities to be completed; judging whether each input rail transit standard is a simple sentence or not by using a symbol detection method, if so, performing a step 4, and if not, performing a step 5;

and 4, step 4: searching entity related attributes in the rail transit specification, judging the relationship between the entity and the entity, generating entity relationship triples and storing the entity relationship triples;

and 5: performing dependency syntax analysis based on deep learning on the rail transit specification, if the former sentence is a Noun Phrase (Noun Phrase, NP) with a parallel structure, performing step 6, and if not, performing step 7;

step 6: extracting the attribute words and entities of the back sentence, matching the attribute words n of the front sentence with the attribute words n of the back sentence, generating entity relation triples and storing the entity relation triples;

and 7: judging whether the grammar of the front sentence is a main predicate object and the grammar of the rear sentence is an object supplement, if so, performing a step 8, and if not, performing a step 9;

and 8: directly matching the front sentence entity with the object, and directly matching the rear sentence keyword with the object entity to generate an entity relationship triple and store the entity relationship triple;

and step 9: calculating the vocabulary relevancy, outputting the entity-entity relationship with the relevancy exceeding a threshold value, generating an entity-relationship triple, and storing the entity-relationship triple;

step 10: and (4) outputting the entity relationship triples generated in the steps 4, 8 and 9 to obtain a complete semantic structure entity specification, namely completing the automatic completion of the track traffic specification entity relationship.

The present invention is also technically characterized in that,

the entity-relationship triplets are "entity-verb-entity" or "entity-degree-attribute".

The specific operation steps of step 1 are as follows:

step 1.1: acquiring rail transit standard textual data, preprocessing and training the acquired rail transit standard textual data, and generating a dictionary;

step 1.2: processing the dictionary, mining the missing features, and extracting entity completion rules and methods;

step 1.3: and constructing an entity relation completion model by adopting the extracted entity completion rule and method.

In the step 2, a Bi-LSTM + CRF part-of-speech tagging model is adopted to tag the part-of-speech of the rail transit standard, adjective part-of-speech participles are divided into attribute words, and verb part-of-speech participles are used for judging the relation between the entity and the entity.

In step 3, the simple sentence is a sentence containing only a pause and a period.

The specific operation steps of step 4 are as follows:

step 4.1: searching attributes and action relations related to the entities in the rail transit specification;

step 4.2: extracting verbs, judging the relation between the entities, and analyzing the part of speech to extract attributes;

step 4.4: and generating an entity relationship triple according to the entity-entity relationship and the extracted attributes, and storing the entity relationship.

The specific operation steps of step 9 are as follows:

step 9.1: judging whether the front sentence is in a non-parallel structure or not and the rear sentence has a substitute word or not according to the rail transit standard, if so, calculating the vocabulary relevancy between the attribute word behind the substitute word and all entities of the front sentence, otherwise, naming the entity recognition phrase and calculating the vocabulary relevancy between all the participles;

step 9.2: and outputting the entity and entity relationship with the vocabulary relevancy exceeding the threshold value, generating an entity relationship triple and storing the entity relationship triple.

In step 9, the vocabulary relevancy is calculated by using a vocabulary relevancy calculation algorithm based on the word bank of the known network.

The vocabulary relevancy calculation algorithm based on the word bank of the known network is as follows:

rel(w₁，w₂)＝max{a₁*sim(s₁，s₂)+(1-a₁)*asso(s₁，s₂)}

ssso(s₁，s₂)＝∑r_i*asso(p₁，p₂)

in the above formula, rel (w)₁，w₂) The expression vocabulary w₁And the word w₂Correlation of (c), sim(s)₁，s₂) The expression vocabulary w₁And the word w₂Similarity of (a), asso(s)₁，s₂) Representing an entity s₁And s₂The semantic relevance of (2); alpha₁Expressing adjustable parameters for linearly harmonizing the similarity and the semantic association degree, wherein the value range of the adjustable parameters is [0,1 ]]；s_liThe expression vocabulary w₁N denotes the vocabulary w₁Having n sense items; s_2jThe expression vocabulary w₂Is 1, m represents the vocabulary w₂Having m sense items; gamma ray_iThe semantic relevance coefficient representing different parts of the entity concept is a fit to each part of the two concepts, and must satisfy ∑ r_i＝1，p₁Is an item of sense s₁Of (2) a sense atom, p₂Is an item of sense s₂The sense of (1).

The method has the advantages that the rail transit normative relation is obtained through deep learning, then the relation among entity type nouns is supplemented according to the semantic method of the known network, the relation completion in the construction process of the knowledge map is completed, the workload of manually constructing the map and the relation database is greatly reduced, the accuracy of the rail transit normative entity relation completion is improved, the structural accuracy of the rail transit normative knowledge map is improved, and a foundation is laid for intelligent rail transit query, reasoning and question and answer based on the knowledge map;

through deep learning, the recognition degree of the entities and the relations thereof is improved, and the semantic similarity of the entity words in the knowledge network is adopted for automatic judgment, so that the supplement of the entity relations is realized, and a insist foundation is provided for constructing a knowledge graph.

Drawings

FIG. 1 is a flow chart of an automatic completion method for track traffic regulation entity relationship based on artificial intelligence;

FIG. 2 is a schematic diagram of a process of performing part-of-speech tagging on a rail transit specification by using a BI + LSTM + CRF part-of-speech tagging model in the automatic completion method for a rail transit specification entity relationship according to the present invention;

FIG. 3 is a schematic diagram of a process of calculating vocabulary relevancy by using a vocabulary relevancy calculation algorithm based on a known network lexicon in the rail transit normative entity relationship automatic completion method of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention discloses an artificial intelligence-based automatic completion method for a track traffic regulation entity relationship, which comprises the following steps of:

step 1: building entity relation completion model according to rail transit standard

Step 1.1: acquiring rail transit standard original text data from urban rail transit technical specifications, carrying out format check on the acquired original text data, deleting unnecessary information such as blank spaces and the like, acquiring preprocessed data, and then training the preprocessed data to generate a dictionary;

step 1.2: processing data in the dictionary, mining the missing features, and extracting entity completion rules and methods;

Step 2: referring to FIG. 2, a Bi-LSTM + CRF part-of-speech tagging model is adopted to tag the part of speech of the rail transit standard, data in a dictionary is converted into a format of hdf5, then the data are input into the Bi-LSTM + CRF part-of-speech tagging model, extracted noun part-of-speech participles in the rail transit standard are output, the adjective part-of-speech participles are divided into attribute words, and verb part-of-speech participles are used for judging the relation between an entity and the entity.

The specific process of performing part-of-speech tagging on the rail transit specification by adopting the Bi-LSTM + CRF part-of-speech tagging model is shown as the following table:

and step 3: inputting all rail transit standard texts and the extracted noun part-of-speech participles into an entity relationship completion model, wherein the extracted noun part-of-speech participles serve as entities to be completed; then, judging whether each input rail traffic specification is a simple sentence or not by using a symbol detection method, wherein the simple sentence is a sentence only containing a pause sign and a period sign and comprises a subject, a predicate or an object;

if the simple sentence is the simple sentence, performing the step 4, if the simple sentence is not the simple sentence, performing the step 5, wherein the complex sentence usually comprises a plurality of subjects, a plurality of predicates and a plurality of objects;

and 4, step 4: searching entity related attributes in the rail transit specification, judging the relationship between the entity and the entity, generating entity relationship triples and storing the entity relationship triples; the method specifically comprises the following operation steps:

And 5: performing dependency syntax analysis based on deep learning on the rail transit standard, if the previous sentence is an NP phrase with a parallel structure, performing step 6, and if not, performing step 7;

step 6: extracting the attribute words and entities of the later sentence to ensure that the attribute words n of the earlier sentence and the attribute words n of the later sentence are: n, matching, generating entity relation triples, namely 'entity-verb-entity' or 'entity-degree-attribute', and storing;

and 8: directly matching the former sentence entity with the object, and directly matching the latter sentence keyword with the object entity to generate an entity relationship triple, namely 'entity-verb-entity' or 'entity-degree-attribute', and storing the entity relationship triple;

and step 9: referring to fig. 3, the vocabulary relevancy is calculated by using a vocabulary relevancy calculation algorithm based on the known network lexicon, the entity and entity relationship with the relevancy exceeding a threshold value is output, and an entity relationship triple, namely an entity-verb-entity or an entity-degree-attribute, is generated and stored;

the specific operation steps of step 9 are as follows:

step 9.1: judging whether the front sentence is in a non-parallel structure or not and the rear sentence has a substitute word or not, if so, calculating the vocabulary relevancy between the attribute word after the substitute word and all entities of the front sentence, if not, naming the entity recognition phrase, then calculating the vocabulary relevancy between all the participles, and calculating the relevancy between all the participles;

the book of understanding adopts the minimal unit that the sememe is the most basic and is not suitable for being divided, adopts 1618 sememes in total, and describes 62174 conceptual entities. In "Zhi Li", an entity-class-semantic-primitive phrase w is set₁And w₂If w is₁There are different concepts (semantic terms): s₁₁，s₁₂，……,s_1n，w₂There are different concepts (semantic terms): s₂₁，s₂₂，……,s_2mThe vocabulary relevancy calculation algorithm based on the word bank of the known network is as follows:

rel(w_l，w₂)＝max{a₁*sim(s₁，s₂)-(1-a₁)*asso(s₁，s₂)}

asso(s₁，s₂)＝Σr_i*asso(p₁，p₂)

in the above formula, rel (w)₁，w₂) The expression vocabulary w₁And the word w₂Correlation of (c), sim(s)₁,s₂) The expression vocabulary w₁And the word w₂Similarity of (a), asso(s)₁,s₂) Representing an entity s₁And s₂The semantic relevance of (2); alpha₁Expressing adjustable parameters for linearly harmonizing the similarity and the semantic association degree, wherein the value range of the adjustable parameters is [0,1 ]]；s_1iThe expression vocabulary w₁Meaning term of (i) ═ 1, …, n denotes the word w₁Having n sense items; s_2jThe expression vocabulary w₂J is 1, …, m represents the word w₂Having m sense items; gamma ray_iThe semantic relevance coefficient representing different parts of the entity concept is a fit to each part of the two concepts, and must satisfy ∑ r_i＝1，p₁Is an item of sense s₁Of (2) a sense atom, p₂Is an item of sense s₂The sense of (1).

According to the three formulas, the semantic similarity between the entities of the two words is calculated. If the similarity of the entities is higher, the correlation degree between the entities is higher; the greater the degree of association between the entity senses of two words, the greater their similarity. And then, calculating after linear combination adjustment is carried out on the similarity and the relevance to obtain the final semantic similarity.

Step 9.2: and outputting the entity and entity relationship with the vocabulary relevancy exceeding the threshold value, generating an entity relationship triple, namely 'entity-verb-entity' or 'entity-degree-attribute', and storing. Wherein the threshold value is determined by preliminary experiments in the rail transit specification entry.

For example: the running speed of the train on the plane curve is calculated according to the radius of the curve, and the unbalanced transverse acceleration of the train is not suitable to exceed 0.4m/s². Firstly, the statement structure is analyzed, and according to the rule, the attribute that the radius is a plane curve can be obtained, and the running speed and the balance transverse acceleration are both the attributes of a train. And for another example, the performance and the main size of the bogie are matched with those of a vehicle body and a line, and related parts are ensured to be within an allowable abrasion limit, so that the train can safely and smoothly run at the highest allowable speed. Firstly, the complex sentences are divided, the first sentence "performance" and "size" of the complex sentences are parallel attributes, the "bogie" is the entity of the complex sentences, the "performance" and the "size" can be distinguished as the attribute words of the "bogie" according to the word correlation by matching, the "train" and the "speed" are in attribute relation, and the "running" is in action relation.

The automatic completion method for the track traffic regulation entity relationship reasonably and effectively solves the problem of entity relationship error completion caused by the fact that no clear semantics exist among entities, attributes and relationships when the entities are in a missing relationship. After the model processing according to the invention is carried out, the semantic relation of the missing of the completion entity can be improved, and the relevance among the entities and the accuracy of the entity relation can be improved.

Claims

1. An automatic track traffic regulation entity relationship completion method based on artificial intelligence is characterized by comprising the following steps:

step 6: extracting the attribute words and entities of the later sentence to ensure that the attribute words n of the earlier sentence and the attribute words n of the later sentence are: n, matching, generating entity relation triples and storing the entity relation triples;

2. The rail transit regulation entity relationship automatic completion method based on artificial intelligence of claim 1, wherein the entity relationship triple is "entity-verb-entity" or "entity-degree-attribute".

3. The method for automatically completing the track traffic regulation entity relationship based on artificial intelligence according to claim 1, wherein the specific operation steps of the step 1 are as follows:

4. The method as claimed in claim 1, wherein in step 2, a Bi-LSTM + CRF part-of-speech tagging model is used to tag part-of-speech of the rail transit specification, wherein the adjective part-of-speech is divided into attribute words, and the verb part-of-speech is used to determine the relationship between the entity and the entity.

5. The method as claimed in claim 4, wherein in the step 3, the simple sentence is a sentence with a pause and a period.

6. The method for automatically completing the track traffic regulation entity relationship based on artificial intelligence according to claim 5, wherein the specific operation steps of the step 4 are as follows:

7. The method for automatically completing the track traffic regulation entity relationship based on artificial intelligence according to claim 1, wherein the specific operation steps of the step 9 are as follows:

step 9.1: judging whether the front sentence is in a non-parallel structure or not and the rear sentence has a substitute word or not, if so, calculating the vocabulary relevancy between the attribute word after the substitute word and all entities of the front sentence, otherwise, naming the entity recognition phrase and calculating the vocabulary relevancy between all the participles;

8. The method for automatically completing rail transit regulation entity relationship based on artificial intelligence as claimed in claim 7, wherein in the step 9, the vocabulary relevancy is calculated by using a vocabulary relevancy calculation algorithm based on a word bank of a known network.

9. The method for automatically completing the rail transit regulation entity relationship based on artificial intelligence as claimed in claim 8, wherein the calculation algorithm of the vocabulary relevancy based on the word bank of the known network is as follows:

rel(w₁，w₂)＝max{a₁*sim(s₁，s₂)+(1-a₁)*asso(s₁，s₂)}

asso(s₁，s₂)＝∑r_i*asso(p₁，p₂)

in the above formula, rel (w)₁，w₂) The expression vocabulary w₁And the word w₂Correlation of (c), sim(s)₁,s₂) The expression vocabulary w₁And the word w₂Similarity of (a), asso(s)₁,s₂) Representing an entity s₁And s₂The semantic relevance of (2); alpha₁Expressing adjustable parameters for linearly harmonizing the similarity and the semantic association degree, wherein the value range of the adjustable parameters is [0,1 ]]；s_1iTo representWord w₁Meaning term of (i) ═ 1, …, n denotes the word w₁Having n sense items; s_2jThe expression vocabulary w₂J is 1, …, m represents the word w₂Having m sense items; gamma ray_iThe semantic relevance coefficient representing different parts of the entity concept is a fit to each part of the two concepts, and must satisfy ∑ r_i＝1，p₁Is an item of sense s₁Of (2) a sense atom, p₂Is an item of sense s₂The sense of (1).