CN114911951A - Knowledge graph construction method for man-machine cooperation assembly task - Google Patents

Knowledge graph construction method for man-machine cooperation assembly task Download PDF

Info

Publication number
CN114911951A
CN114911951A CN202210539195.1A CN202210539195A CN114911951A CN 114911951 A CN114911951 A CN 114911951A CN 202210539195 A CN202210539195 A CN 202210539195A CN 114911951 A CN114911951 A CN 114911951A
Authority
CN
China
Prior art keywords
entity
word
sequence number
knowledge
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210539195.1A
Other languages
Chinese (zh)
Inventor
傅卫平
何林涛
高志强
杜慧龙
刘波
彭丽霞
陈小虎
杨世强
李睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202210539195.1A priority Critical patent/CN114911951A/en
Publication of CN114911951A publication Critical patent/CN114911951A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/904Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a knowledge graph construction method facing a man-machine cooperation assembly task, which comprises the following steps: 1) establishing a knowledge graph construction flow chart facing man-machine cooperation assembly; 2) collecting assembly knowledge resources; 3) uniformly preprocessing the collected assembly knowledge into a natural language text form; 4) performing syntactic analysis on the preprocessed natural language text corpora; 5) extracting information from the result of the syntax analysis by adopting a rule of a method based on a rule template; 6) carrying out knowledge fusion on triple data obtained by information extraction by adopting an entity matching method; 7) conceptualizing and abstracting knowledge in a data layer to obtain mode layer knowledge, and managing the mode layer knowledge by using an ontology library; 8) the data-layer and schema-layer knowledge is stored in the graph database Neo4j in the form of an attribute graph model and visualized in the form of a graph structure. The method is brief and effective, has strong logicality and wide application value.

Description

Knowledge graph construction method for man-machine cooperation assembly task
Technical Field
The invention belongs to the technical field of man-machine cooperation assembly, and relates to a knowledge graph construction method for a man-machine cooperation assembly task.
Background
In the field of human-computer cooperation assembly, if a cooperation robot is to be enabled to correctly understand the assembly behavior intention of an operator when assisting human beings to assemble products, give human assistance naturally, coordinately and timely like human beings, and adapt to the work rhythm of human beings, the robot has to form common sense cognition for the whole assembly environment and assembly process, including structure and dimension information of parts, attribute information of assembly tools and machines, assembly process information, assembly environment information and the like. However, most of the information appears in an unstructured or semi-structured form, and is usually stored in a text document or table directly in a text form, the semantic relationship between the information cannot be sufficiently mined by the storage method, the information management is difficult, and the information utilization rate is low, however, in recent years, the knowledge map technology which is applied more and more widely can manage and organize the multi-source heterogeneous assembly knowledge in a structured manner, link the knowledge in a graph structure form, fully express the semantic relationship between the knowledge, and provide a feasible way for a robot to correctly understand the assembly behavior intention of an operator.
Regarding the knowledge graph research oriented to human-computer cooperation, the West Star sea of the Donghua university in 2021 provides an assembly semantic information modeling method for an assembly product, and intelligent integration of geometric information of a CAD model and process information of an assembly process document is realized. The method comprises the steps of firstly designing a body layer of a knowledge graph in a top-down mode, including class and relation, then extracting and mapping relevant semantic data from the body layer by combining a CAD model and an assembly process document, and constructing a data layer. The schema in the mode is constructed manually, and has high requirements on the specialty, the accuracy and the completeness, so the construction cost is high, the limitation is large, the acquisition of the knowledge of the middle and the long tails is difficult, and meanwhile, the schema only stores the assembly semantic information in the form of a knowledge graph and is not oriented to the actual man-machine cooperation assembly process.
A knowledge graph construction method facing man-machine cooperation disassembly tasks is provided by Liujiayi and other people of Wuhan theory of workers and universities in 2020. The method constructs a man-machine cooperation disassembly model, divides knowledge into information of products to be disassembled, disassembly scheme information and process data, and obtains structured triple information through a knowledge map construction technology. The method does not explain the specific processing details in the information extraction part, how to obtain entity, relation and attribute triple information, what is the method used, and meanwhile, the traditional bidirectional LSTM network is adopted in the information fusion part to obtain the vector representation of the entity, so that the problem of long-distance matching cannot be solved.
Disclosure of Invention
The invention aims to provide a knowledge graph construction method facing a human-computer cooperation assembly task, which solves the problem that the existing neural network method for extracting structured triple knowledge from multi-source heterogeneous information in a supervision learning mode is limited in available data and obviously limits the training of a network model aiming at the professional field of human-computer cooperation assembly; meanwhile, the neural network has the characteristic of a black box, and the process of information mining cannot be explained, so that the problem that the wrong knowledge cannot be corrected by searching for the reason of the wrong knowledge is solved.
The technical scheme adopted by the invention is that a knowledge graph construction method facing a man-machine cooperation assembly task is implemented according to the following steps:
s1: analyzing an actual human-computer cooperation assembly task to obtain a specific knowledge graph construction flow chart;
s2: acquiring original assembly data with different modes and different structures from different channels;
s3: uniformly preprocessing the collected multi-source heterogeneous assembly data into a natural language text form;
s4: performing syntactic analysis on the preprocessed natural language text corpus by adopting an LTP language technology platform, wherein the syntactic analysis comprises word segmentation, part of speech tagging, named entity identification, dependency syntactic analysis and semantic role tagging;
s5: designing a new rule by adopting a method based on a rule template, and extracting information from the result of the syntactic analysis, wherein the information extraction comprises entity extraction, relation extraction and attribute extraction;
s6: carrying out knowledge fusion on triple data obtained by information extraction by adopting an entity matching method so as to eliminate problems of diversity and ambiguity of entity nominal items, wherein the problems comprise entity disambiguation and coreference resolution;
s7: conceptualizing and abstracting knowledge in a data layer to obtain mode layer knowledge, and managing the knowledge by using an ontology base;
s8: and storing the data layer knowledge and the mode layer knowledge in a graph database Neo4j in an attribute graph model mode, and visualizing in a graph structure mode to obtain a knowledge graph facing man-machine cooperation assembly.
The beneficial effects of the invention comprise the following aspects:
1) the multi-source heterogeneous assembly knowledge resources are uniformly processed into a natural language text form, so that the natural language text form is convenient to store, manage and analyze, and the linguistic support can be provided for subsequent related researches.
2) A new information extraction rule is designed by adopting a rule-based method, so that the combined extraction of the entity and the relationship can be realized, the extraction of the entity attribute and the relationship attribute is completed by combining the linguistic rule, and the problem of error propagation is effectively reduced. The designed rule is high in generalization, information extraction can be accurately and efficiently completed on a general natural language text, structured triple knowledge can be obtained, a label can be automatically generated for the assembled corpus training data required by a subsequent data-driven deep learning method, and the construction of a training set is completed.
3) The method has the advantages that the original assembly data with multiple source isomerism is converted into structured knowledge triple data, a huge knowledge base with rich semantics is formed, and addition, deletion, modification and check operations can be performed on knowledge in the knowledge base so as to process complex relationships among different knowledge. The designed knowledge graph has interpretability and supports knowledge reasoning.
4) A new man-machine interaction mode is provided, so that the cooperative robot has knowledge of common sense, a foundation is laid for subsequent human intention understanding, and complex assembly tasks can be completed by flexibly and efficiently matching with human beings.
5) The method provided by the invention obtains the information extraction rule with higher accuracy by using less linguistic data, and the whole extraction process is distinct and clear, thereby well solving the problems in the neural network method.
Drawings
FIG. 1 is an assembly knowledge graph building flow diagram of an embodiment of the present invention;
FIG. 2 is a flow chart of data acquisition and preprocessing according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating the result of data preprocessing according to an embodiment of the present invention;
FIG. 4 is a natural language processing flow diagram of an embodiment of the present invention;
FIG. 5 is a flow chart of entity extraction according to an embodiment of the present invention;
FIG. 6 is a flow diagram of relationship extraction according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of entity attribute extraction according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of relationship attribute extraction according to an embodiment of the present invention;
FIG. 9 is an entity disambiguation flow diagram of an embodiment of the invention;
FIG. 10 is a coreference resolution flow diagram according to an embodiment of the invention;
FIG. 11 is an ontology construction flow diagram according to an embodiment of the present invention;
FIG. 12 is a schematic view of an assembly ontology library according to an embodiment of the present invention;
FIG. 13 is an assembly knowledge graph of an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention provides a knowledge graph construction method facing a man-machine cooperation assembly task, wherein a brand-new extraction rule is designed by adopting a rule template-based method in the information extraction process, the rule can achieve a better extraction effect only by a small amount of data corpora, and meanwhile, the extraction process has strong logicality and the extraction result has interpretability.
The method of the invention is implemented according to the following steps:
s1: acquiring a knowledge graph construction flow chart facing the human-computer cooperation assembly by analyzing the knowledge characteristics of the human-computer cooperation assembly field and the related knowledge graph technology;
because the development of the field of industrial assembly is more and more automatic and intelligent at present, and the assembly task information at the bottom layer is more and more hierarchical, large-scale and industrialized, the step adopts a bottom-up mode to construct a knowledge graph facing human-computer cooperation assembly, namely a data layer is constructed first and a mode layer is constructed second;
s2: and collecting assembly knowledge resources, namely collecting original data of a man-machine cooperation assembly task for constructing an assembly knowledge graph from various channels such as enterprise production data, internet resources, assembly field professional books and the like.
The collected original assembly data has multi-source heterogeneity, and the data modality of the original assembly data is mainly text data and assisted by picture data; the data structure takes unstructured data as a main part and structured data as an auxiliary part; the data content is the assembly process knowledge of common industrial products such as valves, engines, reducers, crankshaft connecting rods and the like;
s3: uniformly preprocessing the collected multi-source heterogeneous assembly knowledge into a natural language text form,
due to the multi-source heterogeneity of raw data collected from different channels, and in the field of industrial assembly, assembly information of most industrial products is usually contained in an assembly process file in a text form more. Therefore, in the data preprocessing process, the multi-source heterogeneous original data is uniformly processed into an unstructured natural language text form in the specific process:
s31: extracting a semi-structured natural language text contained in enterprise production data (such as an Excel document), carrying out syntax adjustment to enable the text to be smooth and ordered, and then putting the text into a document A in a text form according to an assembly sequence;
s32: acquiring semantic information contained in picture data acquired from a man-machine cooperation assembly scene in a natural language description mode, and supplementing the semantic information into a document A;
s33: obtaining attribute characteristics and assembly characteristics of the part from a three-dimensional CAD model of the part, wherein the data is directly used as structured data to be additionally stored without being supplemented into a document A;
s34: directly supplementing relevant knowledge resources collected from internet resources and professional books into a document A;
s4: performing syntactic analysis on the preprocessed natural language text corpus by adopting an LTP language technology platform, wherein the syntactic analysis comprises the aspects of word segmentation, part of speech tagging, named entity identification, dependency syntactic analysis, semantic role tagging and the like;
the natural language processing tool used in the step is an LTP language technology platform developed by Harbin university of industry, and firstly, the LTP language technology platform is sourced software developed specially for Chinese text processing and meets the research requirements of the method; secondly, the bottom layers of all analysis modules of the software are processed according to linguistic rules, so that the applicability is strong; finally, the analysis modules are combined together in a pipeline mode to form a set of unified Chinese natural language processing system which is simple and convenient to use;
s5: a brand-new rule is designed by adopting a method based on a rule template, and information extraction is carried out from the result of the syntactic analysis, wherein the information extraction comprises entity extraction, relationship extraction and attribute extraction;
the objective of this step is to extract the structured triple knowledge meeting the assembly requirement from the natural language text corpus, and the definition of the triple is as follows: if the knowledge graph is G, the head entity is H, the tail entity is T, the relation between the head entity and the tail entity is R, the entity label is L, the entity attribute is V, and the relation attribute is P, the elements are organized together, and the expression of the structured triple is defined as follows:
G=[H:L{ATT:V}]-(R{ATT:P})-[T:L{ATT:V}]
in step S5, the specific procedure is:
s51: performing relationship extraction from the result of the dependency syntax analysis to obtain a core predicate and an association predicate which are contained in the corpus, namely the candidate feature words extracted by the entity, wherein the result of the dependency syntax analysis is represented in the form of (sequence number 1, sequence number 2, keyword), and the method specifically comprises the following steps:
s511: if the keyword is HED and the serial number 2 is represented by 0, the word indicated by the serial number 1 is used as a core predicate, and there is only one word;
s512: if the keyword is COO and the word with sequence number 2 is the core predicate, the word with sequence number 1 is used as the association predicate, and a plurality of association predicates can be provided;
s52: according to the design rule required in the assembly process, removing the characteristic words which do not meet the requirements from the alternative characteristic word set, and using the remaining words as the relation characteristic words R of the connector entity and the tail entity;
s53: according to the result of semantic role labeling, matching the subject and object corresponding to each relation characteristic word R from the corpus to obtain a head entity H and a tail entity T of the structured triple, wherein the result of semantic role labeling is represented in the form of (sequence number 1, [ (keyword, sequence number 2, sequence number 3), … … ]), the range of the word indicated by the keyword is in the [ sequence number 2, sequence number 3] interval, and the method specifically comprises the following substeps:
s531: if the word indicated by the sequence number 1 is a relation characteristic word R, and a word H identified by a keyword A0 or ARGM-MNR and a word T identified by a keyword A1 exist, the word H is extracted as a head entity corresponding to the relation characteristic word R, and the word T is extracted as a tail entity corresponding to the relation characteristic word R;
s532: if the word indicated by the sequence number 1 is the relation characteristic word R, and both the word H identified by the keyword A1 and the word T identified by the keyword A2 exist, the word H is extracted as a head entity corresponding to the relation characteristic word R, and the word T is extracted as a tail entity corresponding to the relation characteristic word R;
s533: if the word indicated by the sequence number 1 is a relational feature word R and only a word H identified by the keyword A2 exists, extracting the word H as a head entity;
s534: because the invention is oriented to the field of human-computer cooperation assembly, in the process, most assembly actions are completed manually, and most assembly elements are directly related to the manual work, the main body of the manual work is the most important entity in the whole assembly process, and is also the key point for associating all the assembly elements, so when the entity is extracted, an additional triple is added in the step, and the specific mode is as follows:
s5341: if the head entity H in the triple [ H, R, T ] is identified by the keyword ARGM-MNR and the verb of the modifier H is 'use', automatically adding the triple [ operator, use, H ] after the triple extraction is finished;
s5342: if the head entity H in the triplets [ H, R and T ] is identified by the keyword A2, automatically adding the triplets [ operator, Na and H ] after the extraction of the triplets is finished;
s54: in the entity attribute extraction process, the result of the dependency syntax analysis is represented in the form of (sequence number 1, sequence number 2, keyword), and the method specifically comprises the following small steps:
s541: if the keyword is ATT and the word indicated by the sequence number 2 is included in the head entity or tail entity list, the relation between the word indicated by the sequence number 1 and the word indicated by the sequence number 2 is represented, and the word indicated by the sequence number 1 is extracted as an attribute value V of the pronoun indicated by the sequence number 2;
s542: if the keyword is an SBV and the word indicated by the sequence number 2 is included in the head entity or tail entity list, the primary and secondary relationship between the word indicated by the sequence number 1 and the word indicated by the sequence number 2 is shown, and the word indicated by the sequence number 1 is extracted as an attribute value V of the pronoun indicated by the sequence number 2;
s55: in the process of extracting the relationship attribute, the result of semantic role labeling is expressed in the form of (sequence number 1, [ (keyword, sequence number 2, sequence number 3), … … ]), and specifically includes the following sub-steps:
s551: if the word indicated by the sequence number 1 is the relation characteristic word R and the keyword is ARGM-ADV, the part indicated by the interval [ sequence number 1, sequence number 2] is used as the shape language of the relation characteristic word R in the corpus, and the part indicated by the [ sequence number 1, sequence number 2] is extracted as an attribute value P of the relation characteristic word R;
s552: if the word indicated by the sequence number 1 is the relation characteristic word R and the keyword is ARGM-TMP, the word indicated by the sequence number 1 is taken as a time-like language of the relation characteristic word R in the corpus, and the word indicated by the sequence number 1 is extracted as an attribute value P of the relation characteristic word R;
s6: carrying out knowledge fusion on triple data obtained by information extraction by adopting an entity matching method so as to eliminate problems of diversity and ambiguity of entity nominal items, wherein the problems comprise entity disambiguation and coreference resolution;
the entity disambiguation is used for solving the ambiguity problem of the same entity nominal item in the corpus, the coreference resolution is used for solving the diversity problem of the same entity nominal item in the corpus, and the specific process is as follows:
s61: establishing a classification set A of each entity by using a clustering method, classifying entities with similar semantic information into one class, and further obtaining high-dimensional vector representation of each classification set;
s62: obtaining high-dimensional vector representation of an entity nominal item E by utilizing a word segmentation result, wherein the same entity nominal item has different vector representations in different context environments based on a BERT pre-training model;
s63: calculating cosine similarity cos (A, E) between each entity nominal item and the classification set, wherein the expression is as follows:
Figure BDA0003649893480000091
s64: carrying out entity disambiguation according to the cosine similarity value, and solving the ambiguity problem of the entity nominal item; if cos (A, E) > cos (B, E), the entity named item is classified as a class A set, otherwise, the entity named item is classified as a class B set;
s65: before the knowledge graph is constructed, an entity dictionary base is established in advance according to encyclopedic information and consists of a plurality of key value pairs, the key value of each key value pair is the name of an entity named item A, the value is a list base, and the partial name of the entity named item A is contained in the list base;
s66: performing coreference resolution according to the result of entity matching so as to solve the problem of diversity of entity nominal items;
s7: conceptualizing and abstracting knowledge in a data layer to obtain mode layer knowledge, and managing the knowledge by using an ontology base;
the ontology is a knowledge semantic expression method for communicating different main bodies in the same field, and is mainly displayed in a tree structure aiming at a mode layer of a knowledge map, so that strict membership can be presented in the ontology, and the relation among entities, relations and attributes in a data layer can be standardized, and the specific process is as follows:
s71: dividing entities belonging to the same concept set by using clustering operation, and consulting data to abstract the concepts of the entities to obtain data of a mode layer;
s72: constructing a body library by taking the data of the mode layer as the superior word and taking the entity in the set as the inferior word, and managing the label of the entity according to the body library;
s73: carrying out quality evaluation on the obtained structured triple knowledge by adopting a manual review mode, and correcting or eliminating error knowledge contained in the structured triple knowledge;
s8: and storing the data layer knowledge and the mode layer knowledge in a graph database Neo4j in an attribute graph mode, and visualizing the data layer knowledge and the mode layer knowledge in a graph structure mode to obtain a knowledge graph facing man-machine cooperation assembly.
The knowledge graph is a semantic network in nature, and is a novel paradigm of typical knowledge storage in a graph structure. For the storage of this type of data, the graph database Neo4j with the highest market popularity is used for storage in this step, and the operations of adding, deleting, modifying, querying and the like can be performed on the data in the database by using the standard query language CQL.
Experimental verification
S1: as shown in FIG. 1, the invention is a knowledge graph construction flow chart facing to human-computer cooperation assembly task. The whole map construction process is carried out in a bottom-up mode and mainly comprises modules of data acquisition, data preprocessing, natural language processing, knowledge extraction, knowledge fusion, knowledge processing, knowledge storage and the like. Specifically, experimental verification was performed according to the following procedure.
S2: before the knowledge graph is constructed, assembly knowledge resources are collected, and the assembly knowledge resources refer to unstructured expression of an assembly scene and an assembly process in the whole assembly process. As shown in FIG. 2, the description of the assembled knowledge resources collected by the present invention mainly includes the following four aspects:
the data source is as follows: most of the data come from enterprise production data, internet resources and assembly field professional books, and a small amount of the data come from an assembly environment;
data modality: the method comprises the following steps of (1) containing a large amount of text data and a small amount of picture data;
data structure: contains a large amount of unstructured and semi-structured data and a small amount of structured data;
data content: the assembly process comprises the assembly process of common industrial products such as valves, engines, reducers, crankshaft connecting rods and the like;
s3: because the collected original assembly data has the characteristic of multi-source isomerism, in order to realize unified storage, management and analysis of the data, data preprocessing is required to be carried out, and the data are converted into data with the same format and the same mode. In the field of human-computer cooperation assembly, because assembly information of industrial products is generally contained in an assembly process file in a text form more, the step uniformly preprocesses multi-source heterogeneous assembly knowledge resources into a natural language text form, as shown in fig. 2. The method specifically comprises the following steps, and the finally extracted partial text corpus is shown in fig. 3.
S31: extracting unstructured assembly text data contained in enterprise production data (such as Excel documents), and performing syntax adjustment to ensure that the unstructured assembly text data are smooth and ordered;
s32: the text data with the adjusted word sequence is classified into a text document A according to the assembly sequence;
s33: obtaining semantic information contained in the picture data by adopting a natural language description mode, and supplementing the semantic information into the document A;
s34: obtaining attribute characteristic and structural characteristic data of the parts from a three-dimensional CAD model of the assembled parts, and directly storing the attribute characteristic and the structural characteristic data as structural data;
s35: collecting assembly field knowledge resources with high reliability and rich linguistic data from an encyclopedia website, and classifying the assembly field knowledge resources into a document A according to an assembly product;
s36: collecting human assembly experience from professional books to perform supplementary explanation on the content in the document A;
s4: after the multi-source heterogeneous assembly knowledge resources are converted into natural language text corpora through pretreatment, the invention uses an LTP language technology platform to carry out natural language processing on each corpus, wherein the natural language processing comprises sentence segmentation, word segmentation, part of speech tagging, named entity recognition, dependency syntax analysis, semantic role tagging and the like, and the dependency relationship among modules is shown in figure 4.
S5: according to the natural language processing result of the text corpus, the invention designs a brand-new information extraction rule by adopting a rule template-based method, can realize the joint extraction of entities and relations, and performs attribute supplementation on the extraction result to obtain structured triple knowledge, and specifically comprises the following steps:
s51: as shown in fig. 5, in the process of extracting the relationship, the language technology platform LTP is first used to perform dependency syntactic analysis on the linguistic data of the natural language text, and in the result, the Token identified by the keyword HED is the core word of the linguistic data, and the Token identified by the keyword COO is the relevant word having the same effect as the core word, and both are called feature words;
s52: according to the degree of correlation with the assembly process, words with low degree of correlation are removed from the extracted feature words, and the retained feature words can be used as alternative feature words, also called relation words;
s53: as shown in fig. 6, in the process of entity extraction, semantic role labeling is performed on the basis of relationship extraction, and a corresponding subject and predicate, that is, a head entity and a tail entity of a triple, are matched for each candidate feature word from a result;
s54: as shown in fig. 7, in the process of extracting the entity attribute, if the "thrust plate" is identified by the keyword ATT and the "oil groove surface" is used as the head entity of the triplet, one attribute of the entity "oil groove surface" can be extracted as { belonging to: thrust plate }; similarly, if "two pieces" are identified by the keyword ATT and "the thrust plate" is used as the head entity of the triplet, one attribute of the entity "oil groove face" can be extracted as { number: two pieces };
s55: as shown in fig. 8, in the process of extracting the relationship attribute, "first" and "directly" are identified by the keyword ARGM-ADV, and the modified word "set" is a candidate feature word, two attributes of the relationship word "set" can be extracted as { sequence number: first, mode: direct }; similarly, the 'slow' is marked by a keyword ARGM-ADV, and the modified word 'insert' is an alternative characteristic word, so that an attribute of the relation word 'insert' can be extracted as { degree: slow };
s6: after triple knowledge is extracted from the natural language text corpus, carrying out knowledge fusion on redundant knowledge in the triple knowledge, including entity disambiguation and reference resolution, and specifically comprising the following steps:
s61: as shown in fig. 9, the entity names with the same semantic category are classified by using a clustering method to obtain an entity set; and training entities in the knowledge graph by adopting a BERT pre-training model to obtain corresponding word vectors, wherein the same entity reference item has different high-dimensional word vector representations in different semantic environments. And calculating the cosine similarity between the two to solve the problem of entity disambiguation.
S62: as shown in fig. 10, before the knowledge graph is constructed, a entity dictionary library is created in advance according to encyclopedia data, and is composed of a plurality of key value pairs, wherein the key value of each key value pair is an organ of the entity a, and the value is a list library, and a plurality of partial names of the entity a are contained in the list library. And then, when the knowledge graph is constructed, matching each entity in a dictionary library, if the extracted entity B is in a value list of a certain key value pair, representing the entity B by using a corresponding key value, otherwise, representing that the entity B only has one entity index of the entity B, and not needing to carry out coreference resolution. According to the thought, the problem of diversity of entity designation items is solved, and a plurality of entity designation items are connected to the same node in the knowledge graph.
S7: as shown in fig. 11, after clustering the extracted entities, it is necessary to abstract the concepts of the extracted entities, and the extracted entities are classified according to the categories by looking up the relevant data and the professional books in the assembly field. Standard components such as a bolt and a nut belong to fasteners in the field of machine manufacturing, so that the node types of the bolt node and the nut node are represented by the fasteners when the knowledge graph is constructed, and the node types are conceptual abstractions of the bolt node and the nut node. The representation is that in the ontology library, the fastener is the superior word, and the bolt and the nut are the corresponding inferior words, and there is a strict hierarchical relationship between the two. Similarly, "dust ring" and "sealing washer" all belong to the sealing member, so the sealing member is the epilogue, and "dust ring" and "sealing washer" are the corresponding hypologue. As shown in fig. 12, the ontology library is constructed in Prot g software according to the schema layer data, and it shows the association between the data layer and the schema layer in the form of a graph structure, and spreads from the center to the outer layer in the form of a tree structure.
S8: as shown in fig. 13, the quality of the structured triple knowledge of the data layer and the mode layer is evaluated and then stored in the Neo4j database, and is visualized in the form of a graph structure, so as to finally obtain an assembly knowledge graph facing the human-computer cooperation assembly task.

Claims (5)

1. A knowledge graph construction method facing a man-machine cooperation assembly task is characterized by being implemented according to the following steps:
s1: analyzing an actual human-computer cooperation assembly task to obtain a specific knowledge graph construction flow chart;
s2: acquiring original assembly data with different modes and different structures from different channels;
s3: uniformly preprocessing the collected multi-source heterogeneous assembly data into a natural language text form;
s4: performing syntactic analysis on the preprocessed natural language text corpus by adopting an LTP language technology platform, wherein the syntactic analysis comprises word segmentation, part of speech tagging, named entity identification, dependency syntactic analysis and semantic role tagging;
s5: designing a new rule by adopting a method based on a rule template, and extracting information from the result of the syntactic analysis, wherein the information extraction comprises entity extraction, relation extraction and attribute extraction;
s6: carrying out knowledge fusion on triple data obtained by information extraction by adopting an entity matching method so as to eliminate problems of diversity and ambiguity of entity nominal items, wherein the problems comprise entity disambiguation and coreference resolution;
s7: conceptualizing and abstracting knowledge in a data layer to obtain mode layer knowledge, and managing the knowledge by using an ontology base;
s8: and storing the data layer knowledge and the mode layer knowledge in a graph database Neo4j in an attribute graph model mode, and visualizing in a graph structure mode to obtain a knowledge graph facing man-machine cooperation assembly.
2. The knowledge graph construction method for human-computer cooperation assembly tasks according to claim 1, characterized in that: in the step S3, the specific process is as follows:
s31: extracting a semi-structured natural language text contained in enterprise production data, performing syntax adjustment to enable the text to be smooth and ordered, and then putting the text in a document A in a text form according to an assembly sequence;
s32: obtaining semantic information contained in picture data acquired from a man-machine cooperation assembly scene in a natural language description mode, and supplementing the semantic information into a document A;
s33: obtaining attribute characteristics and assembly characteristics of the part from a three-dimensional CAD model of the part, wherein the data is directly used as structured data to be additionally stored without being supplemented into a document A;
s34: and directly supplementing relevant knowledge resources collected from Internet resources and professional books into the document A.
3. The knowledge graph construction method for human-computer cooperation assembly tasks according to claim 1, characterized in that: in step S5, the definition of the triplet is as follows: if the knowledge graph is G, the head entity is H, the tail entity is T, the relation between the head entity and the tail entity is R, the entity label is L, the entity attribute is V, and the relation attribute is P, the elements are organized together, and the expression of the structured triple is defined as follows:
G=[H:L{ATT:V}]-(R{ATT:P})-[T:L{ATT:V}]
the specific process is as follows:
s51: performing relationship extraction from the result of the dependency syntax analysis to obtain a core predicate and an association predicate which are contained in the corpus, namely the candidate feature words extracted by the entity, wherein the result of the dependency syntax analysis is represented in the form of (sequence number 1, sequence number 2, keyword), and the method specifically comprises the following steps:
s511: if the keyword is HED and the serial number 2 is represented by 0, the word indicated by the serial number 1 is used as a core predicate, and there is only one word;
s512: if the keyword is COO and the word denoted by the sequence number 2 is the core predicate, the word denoted by the sequence number 1 is used as the association predicate, and a plurality of association predicates can be provided;
s52: according to the design rule required in the assembly process, removing the characteristic words which do not meet the requirements from the alternative characteristic word set, and using the remaining words as the relation characteristic words R of the connector entity and the tail entity;
s53: according to the result of semantic role labeling, matching the subject and object corresponding to each relation characteristic word R from the corpus to obtain a head entity H and a tail entity T of the structured triple, wherein the result of semantic role labeling is represented in the form of (sequence number 1, [ (keyword, sequence number 2, sequence number 3), … … ]), the range of the word indicated by the keyword is in the [ sequence number 2, sequence number 3] interval, and the method specifically comprises the following substeps:
s531: if the word indicated by the sequence number 1 is a relation characteristic word R, and a word H identified by a keyword A0 or ARGM-MNR and a word T identified by a keyword A1 exist, the word H is extracted as a head entity corresponding to the relation characteristic word R, and the word T is extracted as a tail entity corresponding to the relation characteristic word R;
s532: if the word indicated by the sequence number 1 is the relation characteristic word R, and both the word H identified by the keyword A1 and the word T identified by the keyword A2 exist, the word H is extracted as a head entity corresponding to the relation characteristic word R, and the word T is extracted as a tail entity corresponding to the relation characteristic word R;
s533: if the word indicated by the sequence number 1 is a relation characteristic word R and only a word H identified by the keyword A2 exists, taking the word H as a head entity to be extracted;
s534: when an entity is extracted, additional triples are added, specifically as follows:
s5341: if the head entity H in the triple [ H, R, T ] is identified by the keyword ARGM-MNR and the verb of the modifier H is 'use', automatically adding the triple [ operator, use, H ] after the triple extraction is finished;
s5342: if the head entity H in the triplets [ H, R and T ] is identified by the keyword A2, automatically adding the triplets [ operator, Na and H ] after the extraction of the triplets is finished;
s54: in the entity attribute extraction process, the result of the dependency parsing is expressed in the form of (sequence number 1, sequence number 2, keyword), and the method specifically comprises the following steps:
s541: if the keyword is ATT and the word indicated by the sequence number 2 is included in the head entity or tail entity list, the relation between the word indicated by the sequence number 1 and the word indicated by the sequence number 2 is represented, and then the word indicated by the sequence number 1 is extracted as an attribute value V of the pronoun indicated by the sequence number 2;
s542: if the keyword is an SBV and the word indicated by the sequence number 2 is included in the head entity or tail entity list, the primary and secondary relationship between the word indicated by the sequence number 1 and the word indicated by the sequence number 2 is shown, and the word indicated by the sequence number 1 is extracted as an attribute value V of the pronoun indicated by the sequence number 2;
s55: in the process of extracting the relationship attribute, the result of semantic role labeling is expressed in the form of (sequence number 1, [ (keyword, sequence number 2, sequence number 3), … … ]), and specifically includes the following sub-steps:
s551: if the word indicated by the sequence number 1 is the relation characteristic word R and the keyword is ARGM-ADV, the part indicated by the interval [ sequence number 1, sequence number 2] is used as the shape language of the relation characteristic word R in the corpus, and the part indicated by the [ sequence number 1, sequence number 2] is extracted as an attribute value P of the relation characteristic word R;
s552: if the word indicated by the sequence number 1 is the relation characteristic word R and the keyword is ARGM-TMP, the word indicated by the sequence number 1 is taken as the time-like language of the relation characteristic word R in the corpus, and the word indicated by the sequence number 1 is extracted as an attribute value P of the relation characteristic word R.
4. The knowledge graph construction method for human-computer cooperation assembly tasks according to claim 1, characterized in that: in the step S6, the specific process is as follows:
s61: a clustering method is utilized to construct a classification set A of each entity, entities with similar semantic information are classified into one class, and then high-dimensional vector representation of each classification set is obtained;
s62: obtaining high-dimensional vector representation of an entity nominal item E by utilizing a word segmentation result, wherein the same entity nominal item has different vector representations in different context environments based on a BERT pre-training model;
s63: calculating cosine similarity cos (A, E) between each entity nominal item and the classification set, wherein the expression is as follows:
Figure FDA0003649893470000041
s64: carrying out entity disambiguation according to the cosine similarity value, and solving the ambiguity problem of the entity nominal item; if cos (A, E) > cos (B, E), the entity named item is classified as a class A set, otherwise, the entity named item is classified as a class B set;
s65: before the knowledge graph is constructed, an entity dictionary base is established in advance according to encyclopedic information and consists of a plurality of key value pairs, the key value of each key value pair is the name of an entity named item A, the value is a list base, and the partial name of the entity named item A is contained in the list base;
s66: and performing coreference resolution according to the result of entity matching so as to solve the problem of diversity of entity nominal items.
5. The knowledge graph construction method for human-computer cooperation assembly tasks according to claim 1, characterized in that: in the step S7, the specific process is as follows:
s71: dividing entities belonging to the same concept set by using clustering operation, and consulting data to abstract the concepts of the entities to obtain data of a mode layer;
s72: constructing a body library by taking the data of the mode layer as the superior word and taking the entity in the set as the inferior word, and managing the label of the entity according to the body library;
s73: and carrying out quality evaluation on the obtained structured triple knowledge by adopting a manual review mode, and correcting or eliminating error knowledge contained in the structured triple knowledge.
CN202210539195.1A 2022-05-18 2022-05-18 Knowledge graph construction method for man-machine cooperation assembly task Pending CN114911951A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210539195.1A CN114911951A (en) 2022-05-18 2022-05-18 Knowledge graph construction method for man-machine cooperation assembly task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210539195.1A CN114911951A (en) 2022-05-18 2022-05-18 Knowledge graph construction method for man-machine cooperation assembly task

Publications (1)

Publication Number Publication Date
CN114911951A true CN114911951A (en) 2022-08-16

Family

ID=82768641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210539195.1A Pending CN114911951A (en) 2022-05-18 2022-05-18 Knowledge graph construction method for man-machine cooperation assembly task

Country Status (1)

Country Link
CN (1) CN114911951A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116028571A (en) * 2023-03-31 2023-04-28 南京航空航天大学 Knowledge graph construction method and system based on thin-wall part
CN116049148A (en) * 2023-04-03 2023-05-02 中国科学院成都文献情报中心 Construction method of domain meta knowledge engine in meta publishing environment
CN116562275A (en) * 2023-06-09 2023-08-08 创意信息技术股份有限公司 Automatic text summarization method combined with entity attribute diagram
CN116720786A (en) * 2023-08-01 2023-09-08 中国科学院工程热物理研究所 KG and PLM fusion assembly quality stability prediction method, system and medium
CN117521792A (en) * 2023-11-22 2024-02-06 北京交通大学 Knowledge graph construction method based on man-machine cooperation type information extraction labeling tool
CN117592562A (en) * 2024-01-18 2024-02-23 卓世未来(天津)科技有限公司 Knowledge base automatic construction method based on natural language processing
CN117656082A (en) * 2024-01-29 2024-03-08 青岛创新奇智科技集团股份有限公司 Industrial robot control method and device based on multi-mode large model
CN117656082B (en) * 2024-01-29 2024-05-14 青岛创新奇智科技集团股份有限公司 Industrial robot control method and device based on multi-mode large model

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116028571A (en) * 2023-03-31 2023-04-28 南京航空航天大学 Knowledge graph construction method and system based on thin-wall part
CN116028571B (en) * 2023-03-31 2023-06-02 南京航空航天大学 Knowledge graph construction method and system based on thin-wall part
CN116049148A (en) * 2023-04-03 2023-05-02 中国科学院成都文献情报中心 Construction method of domain meta knowledge engine in meta publishing environment
CN116049148B (en) * 2023-04-03 2023-07-18 中国科学院成都文献情报中心 Construction method of domain meta knowledge engine in meta publishing environment
CN116562275B (en) * 2023-06-09 2023-09-15 创意信息技术股份有限公司 Automatic text summarization method combined with entity attribute diagram
CN116562275A (en) * 2023-06-09 2023-08-08 创意信息技术股份有限公司 Automatic text summarization method combined with entity attribute diagram
CN116720786A (en) * 2023-08-01 2023-09-08 中国科学院工程热物理研究所 KG and PLM fusion assembly quality stability prediction method, system and medium
CN116720786B (en) * 2023-08-01 2023-10-03 中国科学院工程热物理研究所 KG and PLM fusion assembly quality stability prediction method, system and medium
CN117521792A (en) * 2023-11-22 2024-02-06 北京交通大学 Knowledge graph construction method based on man-machine cooperation type information extraction labeling tool
CN117592562A (en) * 2024-01-18 2024-02-23 卓世未来(天津)科技有限公司 Knowledge base automatic construction method based on natural language processing
CN117592562B (en) * 2024-01-18 2024-04-09 卓世未来(天津)科技有限公司 Knowledge base automatic construction method based on natural language processing
CN117656082A (en) * 2024-01-29 2024-03-08 青岛创新奇智科技集团股份有限公司 Industrial robot control method and device based on multi-mode large model
CN117656082B (en) * 2024-01-29 2024-05-14 青岛创新奇智科技集团股份有限公司 Industrial robot control method and device based on multi-mode large model

Similar Documents

Publication Publication Date Title
CN114911951A (en) Knowledge graph construction method for man-machine cooperation assembly task
CN110399457B (en) Intelligent question answering method and system
Li et al. Database integration using neural networks: implementation and experiences
Shigarov et al. Rule-based spreadsheet data transformation from arbitrary to relational tables
CN102087669B (en) Intelligent search engine system based on semantic association
Zouaq et al. A survey of domain ontology engineering: Methods and tools
CN111177591B (en) Knowledge graph-based Web data optimization method for visual requirements
CN113987212A (en) Knowledge graph construction method for process data in numerical control machining field
CN114218472A (en) Intelligent search system based on knowledge graph
CN114090861A (en) Education field search engine construction method based on knowledge graph
CN116226349A (en) Question and answer method and system based on table semantic fasttet question analysis
Qin et al. Agriculture knowledge graph construction and application
Wang et al. Short text topic learning using heterogeneous information network
Kilias et al. Idel: In-database entity linking with neural embeddings
CN113392183A (en) Characterization and calculation method of children domain map knowledge
Loglisci et al. Toward geographic information harvesting: Extraction of spatial relational facts from Web documents
Song et al. Semantic query graph based SPARQL generation from natural language questions
CN116756266A (en) Clothing text abstract generation method based on external knowledge and theme information
CN115953117A (en) Knowledge graph-based dangerous chemical accident knowledge base construction method
CN115292515A (en) Knowledge graph construction method in sewing equipment modular design field
Xu et al. Semantic annotation of ontology by using rough concept lattice isomorphic model
CN113449038A (en) Mine intelligent question-answering system and method based on self-encoder
Tang et al. Ontology-based semantic retrieval for education management systems
Gujral et al. Knowledge Graphs: Connecting Information over the Semantic Web
Zhong et al. A cognition knowledge representation model based on multidimensional heterogeneous data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination