CN114996476A - Knowledge fusion method, device and program product for high-speed train product structure tree - Google Patents
Knowledge fusion method, device and program product for high-speed train product structure tree Download PDFInfo
- Publication number
- CN114996476A CN114996476A CN202210654049.3A CN202210654049A CN114996476A CN 114996476 A CN114996476 A CN 114996476A CN 202210654049 A CN202210654049 A CN 202210654049A CN 114996476 A CN114996476 A CN 114996476A
- Authority
- CN
- China
- Prior art keywords
- structure tree
- knowledge
- ontology
- data
- stage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007500 overflow downdraw method Methods 0.000 title claims description 11
- 230000004927 fusion Effects 0.000 claims abstract description 94
- 238000013507 mapping Methods 0.000 claims abstract description 79
- 238000000034 method Methods 0.000 claims abstract description 58
- 238000004458 analytical method Methods 0.000 claims abstract description 35
- 230000008520 organization Effects 0.000 claims abstract description 34
- 239000013598 vector Substances 0.000 claims description 24
- 238000013461 design Methods 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 18
- 238000002372 labelling Methods 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 15
- 238000003860 storage Methods 0.000 claims description 8
- 230000010354 integration Effects 0.000 abstract description 8
- 230000007547 defect Effects 0.000 abstract description 2
- 238000012423 maintenance Methods 0.000 description 23
- 238000004364 calculation method Methods 0.000 description 22
- 238000004519 manufacturing process Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 5
- 238000010606 normalization Methods 0.000 description 5
- 239000000463 material Substances 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 3
- 238000012163 sequencing technique Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/322—Trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/14—Tree-structured documents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method, a device and a program product for fusing knowledge of a structure tree of a high-speed train product, wherein the method comprises the following steps: acquiring organization analysis data of the high-speed train multivariate data, constructing an ontology mode, and constructing a domain knowledge map according to the ontology mode; the method comprises the steps of integrating a domain knowledge graph into a stage knowledge graph, wherein ontology integration of the stage knowledge graph is realized by mapping and aligning ontology concepts output by named entity recognition based on example data, and entity integration of the stage knowledge graph is realized by performing entity alignment by utilizing multi-information fusion similarity based on clustering entities; and performing mapping fusion of the stage domain structure tree body based on the stage domain structure tree of the organization analysis data so as to realize fusion of knowledge maps of all stages. The invention can realize the fusion of knowledge maps in each stage, make the knowledge interconnection relation in each stage field clearer, and solve the defects of data redundancy and heterogeneity of the high-speed train.
Description
Technical Field
The invention relates to the technical field of train heterogeneous knowledge fusion, in particular to a method, a device and a program product for fusing knowledge of a structure tree of a high-speed train product.
Background
The life cycle of the high-speed train comprises a plurality of stages of design, manufacture and operation and maintenance, a great deal of experience knowledge is formed along with the accumulation of multi-stage historical data, and the experience knowledge can be provided for each stage of the design, manufacture, operation and maintenance of products. However, these experience knowledge are stored in different platforms, knowledge sources are different in each life cycle stage and the form and structure are complex, and knowledge maps are generally adopted to organize and represent the experience knowledge, so that the experience knowledge can be better acquired.
Knowledge-graphs, referred to as learned AI, are often used to fuse multi-source data to build large-scale knowledge bases. The knowledge graph is essentially a semantic knowledge base, is a semantic network, is used for representing semantic relations between entities, and is suitable for describing complex structure knowledge of a high-speed train.
In the prior art, due to the fact that knowledge sources of a high-speed train are complex and exist in different levels of different stages, knowledge maps constructed in different stages have the characteristic of heterogeneity, and the problems that knowledge in all stages is associated with each other and is not clear, knowledge is not intercommunicated, knowledge is redundant and heterogeneous are caused.
Disclosure of Invention
The invention provides a knowledge fusion method, a knowledge fusion device and a program product for a high-speed train product structure tree, which are used for solving the defects of multi-stage knowledge redundancy, isomerism and isomerism of knowledge maps constructed in different stages in the prior art, realizing the fusion of knowledge maps in all stages and enabling the knowledge intercommunication and interconnection relation in all stages to be clearer.
The invention provides a knowledge fusion method for a high-speed train product structure tree, which comprises the following steps:
acquiring organization analysis data of multivariate data of a high-speed train, constructing an ontology mode, and constructing a domain knowledge graph according to the ontology mode;
merging the domain knowledge maps into a stage knowledge map, wherein ontology merging of the stage knowledge map is realized by mapping and aligning ontology concepts output by named entity recognition based on example data, and entity merging of the stage knowledge map is realized by entity aligning by utilizing multi-information merging similarity based on clustering entities;
and based on the stage domain structure tree of the organization and analysis data, mapping and fusing the stage domain structure tree body to realize the fusion of knowledge maps of all stages.
According to the method for fusing knowledge of the high-speed train product structure tree, provided by the invention, the organization analysis data of the high-speed train multivariate data is obtained, the body mode is constructed, and the domain knowledge map is constructed according to the body mode, and the method comprises the following steps:
acquiring organization analysis data based on high-speed train multivariate data, wherein the organization analysis data comprises: a data source, a domain phase, a domain structure tree, and a phase domain structure tree, the phase domain structure tree comprising: a product family main structure tree, a design example structure tree and an assembly example structure tree;
based on the organization analysis data, acquiring context information and data types of the data service and constructing an ontology mode;
and acquiring a global ontology according to the stage domain structure tree, acquiring a local ontology based on the global ontology, and acquiring data by combining an ontology mode to construct a domain knowledge graph.
According to the method for fusing the knowledge of the high-speed train product structure tree, the data source comprises structured data, semi-structured data and unstructured data.
According to the method for fusing knowledge of the high-speed train product structure tree, the ontology fusion of the phase knowledge graph is realized by mapping and aligning the ontology concept output by named entity recognition based on example data, and the method comprises the following steps:
training and acquiring a named entity recognition model based on pre-acquired corpus data, wherein entity labeling and sequence labeling are carried out on the corpus data and then the corpus data is input into the named entity recognition model;
acquiring an ontology concept corresponding to an entity in the domain knowledge graph based on the named entity recognition model;
and acquiring the ontology concept mapping relation of the stage knowledge graph based on the comparison result of the ontology concepts corresponding to the entities in the domain knowledge graph and the ontology concepts in the domain structure tree.
According to the method for fusing the knowledge of the structure tree of the high-speed train product, provided by the invention, the entity fusion of the phase knowledge graph is realized by aligning the entities by utilizing the multi-information fusion similarity based on the clustering entity, and the method comprises the following steps:
normalizing attributes of the entities of the domain knowledge graph, wherein the types of the attributes comprise structured attributes and unstructured attributes;
based on the ontology concept mapping relation of the stage knowledge graph, acquiring clustering entities with the same ontology concept of the mapping relation;
based on the unit and constraint matching of the structured attributes, determining the similarity of the structured attributes in the clustering entities by using the minimum editing distance;
based on the semantic feature vector of the unstructured attribute, determining the unstructured attribute similarity of the clustering entity by utilizing the cosine similarity of the semantic feature vector;
and determining the comprehensive similarity of the entities based on the similarity of the structured attributes and the similarity of the unstructured attributes, and realizing the entity alignment of the stage knowledge graph.
According to the method for fusing knowledge of the high-speed train product structure tree, the stage domain structure tree based on the organization and analysis data is mapped and fused with the stage domain structure tree body to realize the fusion of knowledge maps of all stages, and the method comprises the following steps:
constructing a product coding structure tree based on the coding attributes of the stage field structure tree to realize body alignment;
normalizing the entity attributes of the phase domain structure tree;
acquiring a clustering entity under the ontology concept of the stage domain structure tree;
and determining entity comprehensive similarity based on the clustering entities so as to obtain the mapping relation between the stage entities in each field under the ontology concept of the stage field structure tree, wherein the comprehensive similarity comprises structured attribute similarity and unstructured attribute similarity.
The invention also provides a knowledge fusion device for the high-speed train product structure tree, which comprises the following components:
the building module is used for obtaining organization analysis data of the high-speed train multivariate data, building an ontology mode and building a domain knowledge map according to the ontology mode;
the first fusion module is used for fusing the domain knowledge graph into a stage knowledge graph, wherein the ontology fusion of the stage knowledge graph is realized by mapping and aligning ontology concepts output by named entity recognition based on example data, and the entity fusion of the stage knowledge graph is realized by aligning entities based on clustering entities by utilizing multi-information fusion similarity;
and the second fusion module is used for mapping and fusing the stage field structure tree body based on the stage field structure tree organizing and analyzing data so as to realize the fusion of knowledge maps of all stages.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the method for fusing the knowledge of the high-speed train product structure tree.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements a method of high speed train product structure tree knowledge fusion as described in any of the above.
The invention also provides a computer program product comprising a computer program, wherein the computer program is used for realizing the method for fusing the knowledge of the structure tree of the high-speed train product when being executed by a processor.
According to the method, the device and the program product for fusing the knowledge of the structure tree of the high-speed train product, the organization analysis of the multi-element data of the high-speed train is performed, the stage field structure tree and the field structure tree are based on the field structure tree, the field knowledge graph is constructed through the ontology mode, the ontology concept output by the named entity recognition based on the example data is mapped and aligned based on the field knowledge graph, the entity alignment is performed by utilizing the multi-information fusion similarity based on the clustering entity, the heterogeneous field knowledge graph is fused into the stage knowledge graph, and the stage data fusion is realized; meanwhile, based on the phase domain structure tree, a product coding structure tree is constructed to realize the body alignment of the phase domain structure tree, the mapping fusion of example data layers in the phase domain structure tree is realized by establishing the mapping relation among phase entities, the fusion of knowledge maps of all phases of the high-speed train is realized, the intercommunication interconnection relation among the high-speed train domain data is clearer, and the problems of isomerism and redundancy of multi-phase data in the high-speed train domain are further solved.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart diagram of a high-speed train product structure tree knowledge fusion method provided by the invention;
FIG. 2 is a schematic diagram of a construction process of an ontology schema provided by the present invention;
FIG. 3 is a schematic diagram of an ontology alignment process of a phase knowledge graph provided by the present invention;
FIG. 4 is a second schematic diagram illustrating the ontology alignment process of the phase knowledge-graph according to the present invention;
FIG. 5 is a schematic flow chart of the fusion of knowledge-graphs of various stages provided by the present invention;
FIG. 6 is a schematic structural diagram of a knowledge fusion device for a high-speed train product structure tree provided by the invention;
fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
The method for fusing knowledge of the product structure tree of the high-speed train according to the invention is described below with reference to fig. 1-5.
Fig. 1 is a schematic flow chart of a knowledge fusion method for a high-speed train product structure tree provided by the invention, and as shown in fig. 1, the method includes:
and 110, acquiring organization analysis data of the high-speed train multivariate data, constructing an ontology mode, and constructing a domain knowledge graph according to the ontology mode.
Optionally, fig. 2 is a schematic diagram of a construction process of an ontology schema provided by the present invention, and as shown in fig. 2, the method includes:
based on the multivariate data of the high-speed train, obtaining organization analysis data, wherein the organization analysis data comprises: a data source, a domain phase, a domain structure tree, and a phase domain structure tree, the phase domain structure tree comprising: a product family main structure tree, a design example structure tree and an assembly example structure tree;
based on the organization analysis data, acquiring context information and data types of the data service and constructing an ontology mode;
and acquiring a global ontology according to the stage domain structure tree, acquiring a local ontology based on the global ontology, and acquiring data by combining an ontology mode to construct a domain knowledge graph.
Optionally, the high-speed train data has a staged characteristic, the product full life cycle can be divided into three stages of design, manufacture and operation and maintenance, the data in different stages has different characteristics, in the design stage, the data includes but is not limited to demand data, geometric data and attribute parameter data, the data corresponds to a specific design example, the design example is obtained by instantiating a meta-model, and one-to-many relations exist between the meta-model and the design example; through the design examples, the production manufacturing can obtain a plurality of corresponding manufacturing examples for specific assembly; the data of the operation and maintenance stage is generated in a specific assembly example, so that a many-to-one relationship exists between the assembly example and the design example. Therefore, the stage domain structure tree and the domain structure tree are constructed according to the analysis of the data source characteristics. Illustratively, the fault domain structure tree and the maintenance domain structure tree in the operation and maintenance phase both belong to a domain structure tree, and the division of the phase domain structure tree is shown in table 1:
TABLE 1 stage Domain Structure Tree partitioning and corresponding data characteristics
Optionally, as shown in fig. 2, the ontology of the domain knowledge graph is constructed in a top-down manner, and a seven-step method is adopted to construct an ontology schema in combination with a data source, a domain stage and a corresponding domain structure tree, where the ontology schema construction step includes: the method comprises the steps of obtaining context information of data service, obtaining a data type, designing an ontology summary model, instantiating verification evaluation, formalizing representation of an ontology, defining classes and hierarchical structures of the classes, defining attributes and constraints, wherein in the three steps of designing the ontology summary model, instantiating verification evaluation and formalizing representation of the ontology, iterative optimization is carried out by combining data layer example data, and an optimal ontology mode is obtained.
Optionally, the stage domain structure tree of each stage of design, manufacture, operation and maintenance is used as a global ontology, the global ontology is locally expanded to obtain a local ontology, and knowledge is acquired by combining the ontology mode to form a domain knowledge graph.
Optionally, the data sources include structured data, semi-structured data, and unstructured data. The structured data is converted through a D2RQ tool, the table name of the relational database is directly mapped to the class in the RDF, the field is mapped to the attribute of the class, and the relationship between the classes is obtained from the table representing the relationship; the semi-structured data is used for acquiring entity relations in a template matching mode, and the non-structured data can be used for carrying out named entity recognition and relation extraction by applying an NLP (non line segment) technology and based on a deep learning model.
And step 120, merging the domain knowledge graph into a stage knowledge graph, wherein ontology merging of the stage knowledge graph is realized by mapping and aligning ontology concepts output by named entity recognition based on example data, and entity merging of the stage knowledge graph is realized by aligning entities by utilizing multi-information fusion similarity based on clustering entities.
Fig. 3 is a schematic diagram of an ontology alignment process of the phase knowledge graph provided by the present invention, fig. 4 is a schematic diagram of an ontology alignment process of the phase knowledge graph provided by the present invention, and as shown in fig. 3 to fig. 4, the ontology fusion method of the phase knowledge graph includes:
training and acquiring a named entity recognition model based on pre-acquired corpus data, wherein entity labeling and sequence labeling are carried out on the corpus data and then the corpus data is input into the named entity recognition model;
acquiring an ontology concept corresponding to an entity in the domain knowledge graph based on the named entity recognition model;
and acquiring the ontology concept mapping relation of the stage knowledge graph based on the comparison result of the ontology concept corresponding to the entity in the domain knowledge graph and the ontology concept in the domain structure tree.
Optionally, as shown in fig. 3 to 4, the phase domain structure tree corresponding to the domain knowledge graph is used as an alignment body of the phase knowledge graph, and the entity of the phase domain structure tree corresponding to the domain knowledge graph is used as an alignment entity of the phase knowledge graph, so as to implement the fusion of the phase knowledge graph.
Alternatively, as shown in fig. 3, an ontology O1(a1, a2, … …, Am) and an entity G1(E1, E2, … …, En) of the domain knowledge graph and an ontology O2(B1, B2, … …, Bq) of the phase domain structure tree of the corresponding phase are obtained, where Ai represents a concept in the ontology O1, Ei represents an entity set under the concept Ai in the ontology O1, Bj represents a concept in the ontology O2 of the phase domain structure tree, and the result of the ontology alignment is a mapping relationship f: ai- > Bj, the Ai- > Bj in the mapping set represents that the concept Ai in the ontology O1 has an equal or contained relationship with the concept Bj in the ontology O2, and the equal or contained relationship is determined as follows:
for an entity Ei under the concept Ai, for one mapping f Ai- > Bj,
Optionally, as shown in fig. 4, an entity set Ei of the concept Ai in the ontology O1 is input in the named entity recognition model, the output concept of the named entity recognition model is compared with the concept Bj in the ontology O2, if the two concepts are the same, the mapping relationship f: Ai- > Bj is output, whether the same mapping exists is continuously searched, if the mapping does not exist, whether the mapping of the concept Ai in the ontology O1 exists is determined, if the mapping exists, the mapping is added to the existing mapping set, and if the mapping does not exist, the mapping set is created.
Optionally, the named entity recognition model adopts a BERT-BILSTM-CRF model, and the construction and training steps of the named entity recognition model are as follows:
1) obtaining corpus data
The corpus data includes but is not limited to text data of a webpage and a domain dictionary, the text data in the webpage is acquired from the webpage in a crawler mode, such as Baidu encyclopedia and Chinese-national-knowledge network, keywords are set as domain-related vocabularies, the acquired text data is subjected to manual preprocessing and dirty data removal, and the corpus data finally used for training is acquired.
2) Entity tagging
The method comprises the steps of carrying out field entity labeling on the obtained corpus data through a brat tool, exemplarily, labeling a bogie as a system, labeling an axle as a part and the like, carrying out BIO sequence labeling on the corpus after labeling through program compiling, exemplarily, labeling the bogie as a BII form, and dividing the labeled data into a training set, a testing set and a verification set, wherein the data volume ratio is 8:1: 1.
3) Named entity recognition model construction and training
Constructing a named entity recognition model based on a BERT-BILSTM-CRF model, carrying out word segmentation, word removal and stop word removal on a training set, obtaining an initial word vector, inputting the obtained initial word vector into the BERT model, carrying out fine tuning training to obtain an output word vector, then inputting the output word vector into the BILSTM-CRF model, and carrying out sequence labeling, wherein the CRF model is used for improving the accuracy of a sequence labeling result and outputting a labeling sequence with the maximum probability; and inputting the training set into the constructed model for training, verifying the obtained result through the verification set to obtain the best training result, and finally outputting the sequence labeling result with the maximum probability.
Optionally, the method for entity fusion of the phase knowledge graph includes:
normalizing attributes of entities of the domain knowledge graph, wherein the types of the attributes comprise structured attributes and unstructured attributes;
acquiring clustering entities under the same ontology concept with mapping relation based on the ontology concept mapping relation of the stage knowledge graph;
based on the unit and constraint matching of the structured attributes, determining the similarity of the structured attributes in the clustering entities by using the minimum editing distance;
based on the semantic feature vector of the unstructured attribute, determining the unstructured attribute similarity of the clustering entity by utilizing the cosine similarity of the semantic feature vector;
and determining the entity comprehensive similarity based on the structured attribute similarity and the unstructured attribute similarity, and realizing the entity alignment of the stage knowledge graph.
Illustratively, the entity attribute normalization approaches include, but are not limited to: and (4) a manual mode. The attribute of the entity is normalized manually, including but not limited to attribute name, attribute value, and unit of attribute value, for example, unit "g" and "g" in the numeric structured attribute are normalized uniformly as "g", and attribute name "highest environmental temperature" and "highest environmental temperature" in the textual unstructured attribute are normalized uniformly as "highest environmental temperature".
Optionally, after ontology mapping of the stage knowledge graph, classifying the entities according to concept labels in the ontology, clustering the entities under the same concept label, and taking the entities under two ontology concepts having a mapping relationship as entities to be aligned to obtain clustered entities under the concept labels. Namely, acquiring a mapping relation (Ai, Bj), entity sets Ei under Ai and Ej under Bj, and after entity clustering, outputting clustered entity sets Ei and Ej.
Optionally, the method for calculating the similarity of the entity structured attribute includes:
for the numeric structured attributes, units and constraints are matched, and the unit and constraint matching templates are shown in table 2.
TABLE 2 Unit and constraint matching templates
Restraint (constraint) | Unit (unit) | Sign (Flag) |
Must not be greater than | mm | 1 |
Not more than | % | 1 |
≤ | mm | 1 |
Must not exceed | Lifting of water | 1 |
Not more than | mm | 1 |
Not less than | Year of year | 2 |
Not less than | km | 2 |
Must not be less than | mm | 2 |
± | g | 6 |
± | kN | 6 |
± | mm | 6 |
And calculating the edit distance similarity of each public attribute by adopting the minimum edit distance, wherein the calculation formula of the edit distance similarity is shown as the formula (1):
wherein, P ai The ith attribute, P, representing entity a bi The ith attribute, edit (V), representing entity b ai ,V bi ) Minimum edit distance, len (V), representing the ith common attribute value of entity a and entity b ai ) And len (V) bi ) Respectively representing the length of the ith common attribute of the entity a and the entity b.
The calculation formula of the structured attribute similarity is shown as formula (2):
wherein w i The weight of the ith common attribute is represented, and t represents the number of common attributes.
Optionally, the method for calculating the similarity of the unstructured attributes includes:
and for the unstructured text attributes in the entity, performing word embedding through the description text and the field dictionary of the corresponding attributes to construct semantic feature vectors of the unstructured text attributes, and calculating the cosine similarity of the semantic feature vectors to obtain the similarity of the unstructured text attributes.
1) Obtaining word vectors
The method comprises the steps of taking a description text of the text as a training corpus, preprocessing the text by using a jieba word segmentation and LTP word stopping library, constructing word vectors by randomly initializing the text, and training the word vectors by a CBOW model and a BILSTM model to obtain comprehensive word vectors.
2) Unstructured attribute similarity calculation
Obtaining the feature vector of the unstructured text attribute of each entity pair to be aligned, and utilizing each unstructured text attribute feature vector V of the entity a a And each unstructured text attribute feature vector V of entity b b Calculating the cosine similarity, wherein a formula for calculating the cosine similarity is shown as a formula (3):
wherein, F ai The ith unstructured text attribute, F, representing entity a bi The ith unstructured text attribute, V, representing entity b ai The feature vector V corresponding to the ith unstructured text attribute representing entity a bi The feature vector corresponding to the ith unstructured text attribute representing entity b, | V ai I and I V bi Respectively representing a feature vector V ai And V bi Die length of (1), Sim (F) ai ,F bi ) Representing cosine similarity of each unstructured attribute between the entity pairs, setting a threshold value of the cosine similarity to be 0.6, selecting the attribute pair with the highest cosine similarity, if the cosine similarity is greater than 0.6, classifying the cosine similarity as a similar attribute, recording the similarity, circularly comparing to obtain t similar attribute pairs, and calculating the unstructured text similarity of the entity a and the entity b, wherein the unstructured text similarity is shown as a formula (4):
optionally, the structured attribute similarity and the unstructured attribute similarity are subjected to weight normalization to obtain an entity comprehensive similarity, where the entity comprehensive similarity is shown in formula (5):
SimE(a,b)=w 1 ·SimZ(a,b)+w 2 ·SimF(a,b) (5),
wherein, w 1 For structured attribute similarity weight, w 2 Is an unstructured attribute similarity weight.
And for each entity a in the domain knowledge graph concept Ai, performing descending order sequencing on the entities b in the mapping concept Bi in the stage domain structure tree through entity comprehensive similarity SimE (a, b) values to generate an optimal candidate sequence, and selecting the entity b with the highest similarity and larger than a set threshold value as an alignable entity.
And step 130, mapping and fusing the phase domain structure tree body based on the phase domain structure tree organizing and analyzing data so as to realize the fusion of knowledge maps of all phases.
Optionally, fig. 5 is a schematic diagram of a fusion process of knowledge graphs at various stages provided by the present invention, and as shown in fig. 5, the method includes:
constructing a product coding structure tree based on the coding attributes of the stage field structure tree, and realizing body alignment;
standardizing entity attributes of the stage field structure tree;
acquiring a clustering entity under the ontology concept of a stage domain structure tree;
and determining entity comprehensive similarity based on the clustering entities to obtain the mapping relation between the entity in each field stage under the ontology concept of the stage field structure tree, wherein the comprehensive similarity comprises structural attribute similarity and unstructured attribute similarity.
Optionally, in the stage domain structure tree, the product main family structure tree has a meta node code as a unique identifier, the design example structure tree simultaneously has a meta node code and a module code, the module code serves as a unique identifier, the assembly example structure tree simultaneously has a meta node code, a module code and a manufacturing code, the manufacturing code serves as a unique identifier, and a mapping relationship among the three codes is the meta node code: and (3) module coding: manufacturing code 1: n: n is a radical of 2 . In the process of constructing the product coding structure tree ontology, a unified mode is used for describing concepts, so that the ontology concepts are realizedAnd (4) finishing.
Optionally, the mapping fusion of the example data layer is represented by suggesting mapping relationships among the entities of the stage domain structure trees, the product main family structure tree M1(a1, a2, … …, Am), the design example structure tree M2(B1, B2, … …, Bn), the assembly example structure tree M3(C1, C2, … …, Cq), where Ai represents the entity set under the corresponding concept label in M1, Bi represents the entity set under the corresponding concept label in M2, Ci represents the entity set under the corresponding concept label in M3, the entities under the three stage domain structure trees are aligned two by two, and the alignment result is a mapping relationship f: ai- > Bi or f: ai- > Ci or f: bi- > Ci.
Optionally, units and constraints of the numerical structured attributes can be unified manually, and attribute names of the same attributes in the text unstructured attributes are unified, so that attribute normalization is realized.
Optionally, in different stage domain structure trees, entities under the same concept label are obtained for clustering, so as to obtain a clustered entity.
Optionally, the method for calculating the entity comprehensive similarity in step 130 is the same as the method for calculating the entity comprehensive similarity in step 120, and is not repeated here. And after the entity comprehensive similarity calculation is finished, establishing a mapping relation between the two entities to obtain a triple (a, mapping and b), and storing the mapping relation.
Illustratively, taking a fault knowledge graph and a maintenance knowledge graph in an operation and maintenance phase as examples, the fusion of the phase knowledge graphs is exemplified, and taking a design example structure tree and an assembly example structure tree as examples, the fusion of the phase knowledge graphs is exemplified.
And step 110, the fault body and the maintenance body belong to operation and maintenance stages, when a fault body mode and a maintenance body mode are constructed, the assembly example structure tree is used as a global body, the body mode is constructed based on a seven-step method, and the fault body mode and the maintenance body mode are respectively constructed. The fault data and the maintenance data are respectively derived from a fault recording text and a maintenance manual, belong to unstructured data, and are extracted by a BERT-BILSTM-CRF-based model to construct a fault knowledge map and a maintenance knowledge map.
Because the fault knowledge graph and the maintenance knowledge graph belong to the operation and maintenance stage, the assembly example structure tree body is used as an alignment body, and the assembly example structure tree body is used as an alignment body, so that the fusion of the knowledge graphs in the stage field is realized.
1) Mapping alignment of ontology concept layer
Acquiring a fault knowledge graph body O1(A1, A2, … … Am), a fault knowledge graph entity G1(E1, E2, … …, En) and an assembly example structure tree body O2(B1, B2, … … Bq), inputting entities under concepts in the fault knowledge graph into a named entity recognition model, for example, inputting entities under concepts of a system, a subsystem, a component and the like, such as a bogie, a framework and the like into the named entity recognition model, comparing an output result with the concept of the body O2 to obtain an entity recognition result (bogie, system), (framework and subsystem), and establishing a mapping relation f: the system-system and the subsystem-subsystem, wherein the entities under the concept system in the fault knowledge graph all belong to the entities in the assembly instance structure tree body and have equal relation.
TABLE 3 partial mapping set
Similarly, a maintenance knowledge map body, a maintenance knowledge map entity and an assembly example structure tree body are obtained, entities under concepts such as a concept structure, a maintenance method and a maintenance strategy are input into a named entity recognition model for named entity recognition, the obtained entity recognition result is (bogie, system), (axle and part) and the like, and a mapping relation f is established: the structure- > system, the structure- > part, because the entity under the concept structure in the maintenance knowledge map includes the entity under the concept system, the part, and the subsystem in the assembly example structure tree body, the concept structure and the concept system, the subsystem, the component, and the part are in inclusion relationship, and the established partial mapping set is as shown in table 3.
2) Instance layer entity alignment based on multi-information fusion similarity calculation
(1) Attribute normalization
In the failure knowledge graph, entity attributes such as manufacturing codes, names, manufacturers, manufacturing dates, and the like are normalized manually so that the same attributes have the same name.
(2) Entity clustering
And after ontology mapping fusion, obtaining mapping concept relations (system ), (subsystem, subsystem), (components, parts), (parts ), obtaining entities under corresponding concepts according to the concept labels, thus obtaining a clustering entity set, and taking the concept label 'system' as an example, calculating the similarity between the entity set under the concept label 'system' in the fault knowledge graph and the entities under the concept 'system' in the assembly example structure tree to obtain an alignment entity pair. The entity set under the concept label "system" in the failure knowledge graph entity is taken as (bogie (CA358505_ ZXJ _0001), bogie (CA360305_ ZXJ _0004), … …), and the entity set under the concept label "system" of the assembly example structure tree is taken as (Xiamen 01 bogie, Fujian 01 bogie … …)
(3) Structured attribute similarity calculation
The results of calculation of the partial common attribute similarity of the bogie in the failure knowledge graph entity (CA358505_ ZXJ _0001) and the Fujian 01 bogie and the Xiamen 01 bogie in the assembled example structure tree entity are shown in table 4.
In the bogie of the entity mansion 01 and the bogie of the entity Fujian 01, the calculation result of the total structural attribute similarity is shown as the formula (6) to the formula (7):
fujian 01 bogie:
table 4 calculation results of common attribute similarity of faulty entity parts
Public attribute names | Mansion 01 bogie | Fujian 01 bogie |
Manufacturing code | 1 | 0.2 |
Date of manufacture | 1 | 0.2 |
Axle weight | 1 | 0.8 |
Design speed | 1 | 0.6 |
(4) Unstructured attribute similarity calculation
The results of the calculation of the similarity of the common attributes of the bogie in the failure knowledge map entity (CA358505_ ZXJ _0001) and the parts of the fujian 01 bogie and the mansion 01 bogie in the assembly example structure tree entity are shown in table 5.
Table 5 calculation results of common attribute similarity of faulty entity portions
Attribute name | Mansion 01 bogie | Fujian 01 bogie |
Bogie type | 1 | 1 |
Wheel type | 1 | 1 |
Wheel tread pattern | 1 | 1 |
Axle material | 0.9 | 0.6 |
Brake disc material | 0.8 | 0.7 |
And setting the similarity threshold to be 0.6, extracting the attribute pairs with the similarity greater than 0.6, and calculating the similarity of the unstructured attributes. In the bogie of the entity mansion 01 and the bogie of the entity Fujian 01, the calculation result of the total structural attribute similarity is shown as the formula (8) to the formula (9):
(5) calculation of integrated calculation of entities
And carrying out weight normalization on the structured attribute similarity and the unstructured attribute similarity to obtain the comprehensive similarity of the entities. Setting the unstructured attribute similarity weight to 0.8, setting the structured attribute similarity weight to 0.2, and calculating the comprehensive similarity of the bogie (CA358505_ ZXJ _0001) "and the mansion 01 bogie and the Fujian 01 bogie in the entity of the assembly example structure tree as shown in the formula (10) to the formula (11):
mansion 01 bogie: SimE (a, b) ═ 0.8 × 2.3+0.2 × 0.94 ═ 2.028 (10),
fujian 01 bogie: SimE (a, b) ═ 0.8 × 0.15+0.2 × 0.86 ═ 0.292 (11).
Setting the preset to be 0.8, and sequencing the comprehensive similarity in a descending order to obtain the optimal candidate sequence: mansion 01 bogie. Obtaining a final alignment entity pair: (bogie (CA358505_ ZXJ _0001), Xiamen 01 bogie).
1) Entity clustering
Taking the concept label "system" in the design example structure tree and the assembly example structure tree as an example, clustering entities Ai (CRH380A, CRH3A, … …) and Bi (mansion 01 bogie, fujian 01 bogie, … …) under the corresponding concept labels are obtained.
2) Similarity calculation
(1) The results of calculating the similarity of common attributes of the building 01 bogie and the Fujian 01 bogie in the design example structure tree entity CRH380A and the assembly example structure tree entity are shown in Table 6.
Table 6 partial result of calculation of similarity of common attribute
Public attribute names | Mansion 01 bogie | Fujian 01 bogie |
Meta-node encoding | 1 | 1 |
Module encoding | 1 | 0.7 |
Axle load | 1 | 1 |
Adapted to ambient temperature | 1 | 0.8 |
The total structural attribute similarity calculation result is shown in formula (12) to formula (13):
(2) unstructured attribute similarity calculation
Setting a similarity threshold value to be 0.6, selecting an attribute pair with the highest similarity, if the similarity is greater than 0.6, classifying the attribute pair as a similar attribute, and calculating the similarity of the unstructured attributes of a mansion 01 bogie and a Fujian 01 bogie in a design example structure tree entity CRH380A and an assembly example structure tree entity as shown in Table 7.
TABLE 7 unstructured Attribute similarity calculation results
Attribute name | Mansion 01 bogie | Fujian 01 bogie |
Bogie type | 1 | 1 |
Wheel type | 1 | 1 |
Wheel tread pattern | 1 | 1 |
Axle material | 0.9 | 0.6 |
Brake disc material | 0.8 | 0.7 |
And (3) extracting attribute pairs with the similarity larger than 0.6, wherein the similarity of the unstructured attributes is shown as a formula (14) to a formula (15):
(3) entity integrated similarity calculation
Setting the unstructured attribute similarity weight to 0.8 and the structured attribute similarity weight to 0.2, the calculation results of the integrated similarity of the entity "CRH 380A" in the design example structure tree and the entity mansion 01 bogie and the fujian 01 bogie in the assembly example structure tree are shown in (16) -formula (17):
mansion 01 bogie: SimE (a, b) ═ 0.8 × 2.3+0.2 × 0.94 ═ 2.028 (16),
fujian 01 bogie: SimE (a, b) ═ 0.8 × 2.06+0.2 × 0.86 ═ 1.82 (17).
Setting the threshold value to be 0.8, and sequencing entity comprehensive similarity in a descending order to obtain the optimal candidate sequence: (Xiamen 01 bogie, Fujian 01 bogie), and then the final alignment entity pair: (CRH380A, mansion door 01 bogie), and establishing a mapping relation to obtain a triad (CRH380A, mapping, mansion door 01 bogie).
The invention provides a knowledge fusion method of a high-speed train product structure tree, which comprises the steps of constructing a domain knowledge map through a body mode based on a domain structure tree by organizing and analyzing multi-element data of a high-speed train, mapping and aligning body concepts output by named entity recognition based on example data based on the domain structure tree, aligning the entities by utilizing multi-information fusion similarity based on a clustering entity, fusing heterogeneous domain knowledge maps into a phase knowledge map, and realizing data fusion of phases; meanwhile, based on the phase domain structure tree, a product coding structure tree is constructed to realize the body alignment of the phase domain structure tree, the mapping fusion of example data layers in the phase domain structure tree is realized by establishing the mapping relation among phase entities, the fusion of knowledge maps of all phases of the high-speed train is realized, the intercommunication interconnection relation among the high-speed train domain data is clearer, and the problems of isomerism and redundancy of multi-phase data in the high-speed train domain are further solved.
The high-speed train product structure tree knowledge fusion device provided by the invention is described below, and the high-speed train product structure tree knowledge fusion device described below and the high-speed train product structure tree knowledge fusion method described above can be referred to correspondingly.
The invention also provides a knowledge fusion device for the product structure tree of the high-speed train, fig. 6 is a schematic structural diagram of the knowledge fusion device for the product structure tree of the high-speed train provided by the invention, as shown in fig. 6, the knowledge fusion device 200 for the product structure tree of the high-speed train comprises: a building module 201, a first fusion module 202, a second fusion module 203, wherein:
the building module 201 is used for obtaining organization analysis data of the high-speed train multivariate data, building an ontology mode and building a domain knowledge map according to the ontology mode;
the first fusion module 202 is configured to fuse the domain knowledge graph into a stage knowledge graph, where ontology fusion of the stage knowledge graph is implemented by mapping and aligning ontology concepts output by named entity identification based on instance data, and entity fusion of the stage knowledge graph is implemented by performing entity alignment based on a clustering entity by using multi-information fusion similarity;
and the second fusion module 203 is configured to perform mapping fusion on the stage domain structure tree body based on the stage domain structure tree organizing and analyzing data, so as to implement fusion of knowledge maps of each stage.
The high-speed train product structure tree knowledge fusion device provided by the invention constructs a domain knowledge map through a body mode based on the domain structure tree through the organization analysis of high-speed train metadata, maps and aligns a body concept output by named entity recognition based on example data based on the domain structure tree, aligns entities by utilizing multi-information fusion similarity based on a clustering entity, fuses heterogeneous domain knowledge maps into a phase knowledge map, and realizes data fusion of phases; meanwhile, based on the phase field structure tree, a product coding structure tree is constructed to realize the body alignment of the phase field structure tree, the mapping fusion of example data layers in the phase field structure tree is realized by establishing the mapping relation among phase entities, the fusion of knowledge maps of all phases of the high-speed train is realized, the intercommunication interconnection relation among the high-speed train field data is clearer, and the problems of isomerism and redundancy of the high-speed train field multi-phase data are further solved.
Optionally, the building module 201 is specifically configured to:
based on the multivariate data of the high-speed train, obtaining organization analysis data, wherein the organization analysis data comprises: data source, domain stage, domain structure tree and stage domain structure tree, the stage domain structure tree includes: a product family main structure tree, a design example structure tree and an assembly example structure tree; the data sources include structured data, semi-structured data, and unstructured data;
based on the organization and analysis data, acquiring context information, data types and an ontology summary model of the data service, and constructing an ontology mode;
and acquiring a global ontology according to the stage domain structure tree, acquiring a local ontology based on the global ontology, and acquiring data by combining an ontology mode to construct a domain knowledge graph.
Optionally, the first fusion module 202 is specifically configured to:
training and acquiring a named entity recognition model based on pre-acquired corpus data, wherein entity labeling and sequence labeling are carried out on the corpus data and then the corpus data is input into the named entity recognition model;
acquiring an ontology concept corresponding to an entity in the domain knowledge graph based on the named entity recognition model;
and acquiring the ontology concept mapping relation of the stage knowledge graph based on the comparison result of the ontology concept corresponding to the entity in the domain knowledge graph and the ontology concept in the domain structure tree.
Optionally, the first fusion module 202 is specifically configured to:
normalizing attributes of entities of the domain knowledge graph, wherein the types of the attributes comprise structured attributes and unstructured attributes;
acquiring clustering entities under the same ontology concept with mapping relation based on the ontology concept mapping relation of the stage knowledge graph;
based on the unit and constraint matching of the structured attributes, determining the similarity of the structured attributes in the clustering entities by using the minimum editing distance;
based on the semantic feature vector of the unstructured attribute, determining the unstructured attribute similarity of the clustered entity by utilizing the cosine similarity of the semantic feature vector;
and determining the entity comprehensive similarity based on the structured attribute similarity and the unstructured attribute similarity, and realizing the entity alignment of the stage knowledge graph.
Optionally, the second fusion module 203 is specifically configured to:
constructing a product coding structure tree based on the coding attributes of the stage field structure tree, and realizing body alignment;
standardizing entity attributes of the stage field structure tree;
acquiring a clustering entity under the ontology concept of the stage domain structure tree;
and determining entity comprehensive similarity based on the clustering entities to obtain the mapping relation between the entity in each field stage under the ontology concept of the stage field structure tree, wherein the comprehensive similarity comprises structural attribute similarity and unstructured attribute similarity.
Fig. 7 illustrates a physical structure diagram of an electronic device, and as shown in fig. 7, the electronic device 300 may include: a processor (processor)310, a communication Interface (Communications Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may invoke logic instructions in the memory 330 to perform a high speed train product structure tree knowledge fusion method comprising:
acquiring organization analysis data of the high-speed train multivariate data, constructing an ontology mode, and constructing a domain knowledge map according to the ontology mode;
the method comprises the steps of integrating a domain knowledge graph into a stage knowledge graph, wherein ontology integration of the stage knowledge graph is realized by mapping and aligning ontology concepts output by named entity recognition based on example data, and entity integration of the stage knowledge graph is realized by performing entity alignment by utilizing multi-information fusion similarity based on clustering entities;
and performing mapping fusion of the stage domain structure tree body based on the stage domain structure tree of the organization analysis data so as to realize fusion of knowledge maps of all stages.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention further provides a computer program product, the computer program product includes a computer program, the computer program can be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer can execute the method for fusing knowledge of a structure tree of a high speed train product provided by the above methods, and the method includes:
acquiring organization analysis data of the high-speed train multivariate data, constructing an ontology mode, and constructing a domain knowledge map according to the ontology mode;
the method comprises the steps of integrating a domain knowledge graph into a stage knowledge graph, wherein ontology integration of the stage knowledge graph is realized by mapping and aligning ontology concepts output by named entity recognition based on example data, and entity integration of the stage knowledge graph is realized by performing entity alignment by utilizing multi-information fusion similarity based on clustering entities;
and performing mapping fusion of the stage domain structure tree body based on the stage domain structure tree of the organization analysis data so as to realize fusion of knowledge maps of all stages.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing a method for high speed train product structure tree knowledge fusion provided by the above methods, the method including:
acquiring organization analysis data of the high-speed train multivariate data, constructing an ontology mode, and constructing a domain knowledge map according to the ontology mode;
the method comprises the steps of integrating a domain knowledge graph into a stage knowledge graph, wherein ontology integration of the stage knowledge graph is realized by mapping and aligning ontology concepts output by named entity recognition based on example data, and entity integration of the stage knowledge graph is realized by utilizing multi-information fusion similarity to perform entity alignment based on a clustering entity;
and performing mapping fusion of the stage domain structure tree body based on the stage domain structure tree of the organization analysis data so as to realize fusion of knowledge maps of all stages.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A high-speed train product structure tree knowledge fusion method is characterized by comprising the following steps:
acquiring organization analysis data of multivariate data of a high-speed train, constructing an ontology mode, and constructing a domain knowledge graph according to the ontology mode;
merging the domain knowledge maps into a stage knowledge map, wherein ontology merging of the stage knowledge map is realized by mapping and aligning ontology concepts output by named entity recognition based on example data, and entity merging of the stage knowledge map is realized by entity aligning by utilizing multi-information merging similarity based on clustering entities;
and based on the stage domain structure tree of the organization and analysis data, mapping and fusing the stage domain structure tree body to realize the fusion of knowledge maps of all stages.
2. The method for knowledge fusion of the product structure tree of the high-speed train according to claim 1, wherein the obtaining of organization analysis data of the multivariate data of the high-speed train, the building of the ontology schema and the building of the domain knowledge graph according to the ontology schema comprises:
acquiring organization analysis data based on high-speed train multivariate data, wherein the organization analysis data comprises: a data source, a domain phase, a domain structure tree, and a phase domain structure tree, the phase domain structure tree comprising: a product family main structure tree, a design example structure tree and an assembly example structure tree;
based on the organization analysis data, acquiring context information and data types of the data service and constructing an ontology mode;
and acquiring a global ontology according to the stage domain structure tree, acquiring a local ontology based on the global ontology, and acquiring data by combining an ontology mode to construct a domain knowledge graph.
3. The method of high speed train product structure tree knowledge fusion of claim 2, wherein the data sources include structured data, semi-structured data, and unstructured data.
4. The method for high-speed train product structure tree knowledge fusion according to claim 2 or 3, wherein the ontology fusion of the phase knowledge graph is realized by mapping and aligning ontology concepts output by named entity recognition based on instance data, and comprises the following steps:
training and acquiring a named entity recognition model based on pre-acquired corpus data, wherein entity labeling and sequence labeling are carried out on the corpus data and then the corpus data is input into the named entity recognition model;
acquiring an ontology concept corresponding to an entity in the domain knowledge graph based on the named entity recognition model;
and acquiring the ontology concept mapping relation of the stage knowledge graph based on the comparison result of the ontology concept corresponding to the entity in the domain knowledge graph and the ontology concept in the domain structure tree.
5. The method for high-speed train product structure tree knowledge fusion of claim 4, wherein the entity fusion of the phase knowledge graph is realized by performing entity alignment based on clustering entities by using multi-information fusion similarity, and comprises the following steps:
normalizing attributes of the entities of the domain knowledge graph, wherein the types of the attributes comprise structured attributes and unstructured attributes;
based on the ontology concept mapping relation of the stage knowledge graph, acquiring clustering entities with the same ontology concept of the mapping relation;
based on the unit and constraint matching of the structured attributes, determining the similarity of the structured attributes in the clustering entities by using the minimum editing distance;
based on the semantic feature vector of the unstructured attribute, determining the unstructured attribute similarity of the clustering entity by utilizing the cosine similarity of the semantic feature vector;
and determining the comprehensive similarity of the entities based on the similarity of the structured attributes and the similarity of the unstructured attributes, and realizing the entity alignment of the stage knowledge graph.
6. The method for fusing knowledge of a high-speed train product structure tree according to claim 1 or 2, wherein the step of performing mapping fusion of the phase domain structure tree body based on the phase domain structure tree of the organization analysis data to realize fusion of knowledge maps of each phase comprises:
constructing a product coding structure tree based on the coding attribute of the stage field structure tree to realize body alignment;
normalizing the entity attribute of the phase domain structure tree;
acquiring a clustering entity under the ontology concept of the stage domain structure tree;
and determining entity comprehensive similarity based on the clustering entities so as to obtain the mapping relation between the stage entities in each field under the ontology concept of the stage field structure tree, wherein the comprehensive similarity comprises structured attribute similarity and unstructured attribute similarity.
7. The utility model provides a high speed train product structure tree knowledge fusion device which characterized in that includes:
the building module is used for obtaining organization analysis data of the high-speed train multivariate data, building a body mode and building a domain knowledge graph according to the body mode;
the first fusion module is used for fusing the domain knowledge graph into a stage knowledge graph, wherein the ontology fusion of the stage knowledge graph is realized by mapping and aligning ontology concepts output by named entity recognition based on example data, and the entity fusion of the stage knowledge graph is realized by aligning entities by utilizing multi-information fusion similarity based on clustering entities;
and the second fusion module is used for mapping and fusing the stage field structure tree body based on the stage field structure tree organizing and analyzing data so as to realize the fusion of knowledge maps of all stages.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the method of high speed train product tree knowledge fusion of any one of claims 1 to 6.
9. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the method for high speed train product structure tree knowledge fusion according to any one of claims 1 to 6.
10. A computer program product comprising a computer program, wherein the computer program when executed by a processor implements the method of high speed train product structure tree knowledge fusion of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210654049.3A CN114996476A (en) | 2022-06-09 | 2022-06-09 | Knowledge fusion method, device and program product for high-speed train product structure tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210654049.3A CN114996476A (en) | 2022-06-09 | 2022-06-09 | Knowledge fusion method, device and program product for high-speed train product structure tree |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114996476A true CN114996476A (en) | 2022-09-02 |
Family
ID=83033134
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210654049.3A Pending CN114996476A (en) | 2022-06-09 | 2022-06-09 | Knowledge fusion method, device and program product for high-speed train product structure tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114996476A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116383345A (en) * | 2023-06-05 | 2023-07-04 | 中国医学科学院医学信息研究所 | Method, device, electronic equipment and storage medium for body fusion |
CN118171726A (en) * | 2024-05-15 | 2024-06-11 | 江西博微新技术有限公司 | Method, system, storage medium and computer for constructing project whole process knowledge graph |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111191044A (en) * | 2019-12-25 | 2020-05-22 | 湖北大学 | Knowledge extraction and fusion method based on big data |
CN112836362A (en) * | 2021-01-22 | 2021-05-25 | 中车工业研究院有限公司 | Design method, system, equipment and storage medium for rail transit vehicle product platform |
CN114064910A (en) * | 2021-09-29 | 2022-02-18 | 清华大学 | Knowledge graph construction method and system |
CN114417015A (en) * | 2022-01-26 | 2022-04-29 | 西南交通大学 | Method for constructing maintainability knowledge graph of high-speed train |
-
2022
- 2022-06-09 CN CN202210654049.3A patent/CN114996476A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111191044A (en) * | 2019-12-25 | 2020-05-22 | 湖北大学 | Knowledge extraction and fusion method based on big data |
CN112836362A (en) * | 2021-01-22 | 2021-05-25 | 中车工业研究院有限公司 | Design method, system, equipment and storage medium for rail transit vehicle product platform |
CN114064910A (en) * | 2021-09-29 | 2022-02-18 | 清华大学 | Knowledge graph construction method and system |
CN114417015A (en) * | 2022-01-26 | 2022-04-29 | 西南交通大学 | Method for constructing maintainability knowledge graph of high-speed train |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116383345A (en) * | 2023-06-05 | 2023-07-04 | 中国医学科学院医学信息研究所 | Method, device, electronic equipment and storage medium for body fusion |
CN116383345B (en) * | 2023-06-05 | 2023-08-22 | 中国医学科学院医学信息研究所 | Method, device, electronic equipment and storage medium for body fusion |
CN118171726A (en) * | 2024-05-15 | 2024-06-11 | 江西博微新技术有限公司 | Method, system, storage medium and computer for constructing project whole process knowledge graph |
CN118171726B (en) * | 2024-05-15 | 2024-07-19 | 江西博微新技术有限公司 | Method, system, storage medium and computer for constructing project whole process knowledge graph |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111708773B (en) | Multi-source scientific and creative resource data fusion method | |
CN114996476A (en) | Knowledge fusion method, device and program product for high-speed train product structure tree | |
CN111325029B (en) | Text similarity calculation method based on deep learning integrated model | |
CN113191148B (en) | Rail transit entity identification method based on semi-supervised learning and clustering | |
CN113254507B (en) | Intelligent construction and inventory method for data asset directory | |
US11360953B2 (en) | Techniques for database entries de-duplication | |
CN114077674A (en) | Power grid dispatching knowledge graph data optimization method and system | |
CN114661914B (en) | Contract examination method, device, equipment and storage medium based on deep learning and knowledge graph | |
CN115982379A (en) | User portrait construction method and system based on knowledge graph | |
CN111401065A (en) | Entity identification method, device, equipment and storage medium | |
CN114417015A (en) | Method for constructing maintainability knowledge graph of high-speed train | |
CN117725222B (en) | Method for extracting document complex knowledge object by integrating knowledge graph and large language model | |
CN113360654A (en) | Text classification method and device, electronic equipment and readable storage medium | |
CN116127090A (en) | Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction | |
CN114579709B (en) | Intelligent question-answering intention identification method based on knowledge graph | |
CN115600605A (en) | Method, system, equipment and storage medium for jointly extracting Chinese entity relationship | |
CN113312903B (en) | Method and system for constructing word stock of 5G mobile service product | |
CN112613318B (en) | Entity name normalization system, method thereof and computer readable medium | |
CN110807096A (en) | Information pair matching method and system on small sample set | |
CN114741510A (en) | Unsupervised learning clustering algorithm-based method for automatically fusing data space heterogeneous data | |
CN114357175A (en) | Data mining system based on semantic network | |
CN114722159A (en) | Multi-source heterogeneous data processing method and system for numerical control machine tool manufacturing resources | |
CN113869024A (en) | Method and system for generating initial guarantee scheme of airplane | |
CN114372148A (en) | Data processing method based on knowledge graph technology and terminal equipment | |
Wei et al. | A Data-Driven Human–Machine Collaborative Product Design System Toward Intelligent Manufacturing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |