CN114722159A - Multi-source heterogeneous data processing method and system for numerical control machine tool manufacturing resources - Google Patents

Multi-source heterogeneous data processing method and system for numerical control machine tool manufacturing resources Download PDF

Info

Publication number
CN114722159A
CN114722159A CN202210614298.XA CN202210614298A CN114722159A CN 114722159 A CN114722159 A CN 114722159A CN 202210614298 A CN202210614298 A CN 202210614298A CN 114722159 A CN114722159 A CN 114722159A
Authority
CN
China
Prior art keywords
data
ontology
entity
numerical control
control machine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210614298.XA
Other languages
Chinese (zh)
Other versions
CN114722159B (en
Inventor
吴承科
杨之乐
谭家娟
李骁
魏国君
蒋锐
郭媛君
刘占省
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Hangmai CNC Software Shenzhen Co Ltd
Original Assignee
Zhongke Hangmai CNC Software Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Hangmai CNC Software Shenzhen Co Ltd filed Critical Zhongke Hangmai CNC Software Shenzhen Co Ltd
Priority to CN202210614298.XA priority Critical patent/CN114722159B/en
Publication of CN114722159A publication Critical patent/CN114722159A/en
Application granted granted Critical
Publication of CN114722159B publication Critical patent/CN114722159B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention discloses a multi-source heterogeneous data processing method and a system for manufacturing resources of a numerical control machine tool, wherein the method comprises the following steps: acquiring a pre-constructed numerical control machine tool data ontology model, wherein the numerical control machine tool data ontology model comprises a plurality of ontology categories and corresponding standard ontology data, ontology incidence relations and corresponding standard relation data, and the data formats of the standard ontology data and the standard relation data are in a target language format; acquiring a preset heuristic rule set and a named entity identification model; acquiring a plurality of data to be processed, and selecting a heuristic rule set or a named entity identification model according to the category of the data to be processed to extract entity information to obtain an entity; and acquiring each target body type and entity incidence relation corresponding to each entity, and converting the data format of each entity and each entity incidence relation into a target language format. The invention is beneficial to saving human resources and time cost and improving the retrieval, analysis and processing efficiency of numerical control machine data.

Description

Multi-source heterogeneous data processing method and system for numerical control machine tool manufacturing resources
Technical Field
The invention relates to the technical field of data processing of numerical control machines, in particular to a multi-source heterogeneous data processing method and system aiming at manufacturing resources of numerical control machines.
Background
With the development of scientific technology, especially the development of digital control technology, the application of numerical control machine tools is more and more extensive. In the using process of the numerical control machine, the numerical control machine needs to be controlled according to corresponding control information (such as process flow information, tool information, workpiece information, specific parameters and the like) in the numerical control machine manufacturing resource data.
In the prior art, the data of the manufacturing resources of the numerical control machine tool may be distributed in different databases and/or different non-structural texts, and the data to be processed may be from different sources and have different structures (i.e. multi-source heterogeneous data). The intelligent devices such as the computer cannot extract and obtain information which can be directly used from different databases and different non-structural texts, and multi-source heterogeneous data needs manual processing. The problem of the prior art lies in, rely on artifical to handle digit control machine tool manufacturing resource data, need consume a large amount of manpower resources and time cost, be unfavorable for improving the efficiency of data analysis and processing that corresponds to be unfavorable for improving the control and the work efficiency of digit control machine tool.
Thus, there is still a need for improvement and development of the prior art.
Disclosure of Invention
The invention mainly aims to provide a multi-source heterogeneous data processing method and system for manufacturing resources of a numerical control machine tool, and aims to solve the problems that in the prior art, the manufacturing resource data of the numerical control machine tool is processed manually, a large amount of manpower resources and time cost are consumed, the corresponding data analysis and processing efficiency is not improved, and the working efficiency of the numerical control machine tool is not improved.
In order to achieve the above object, a first aspect of the present invention provides a multi-source heterogeneous data processing method for manufacturing resources of a numerically controlled machine tool, wherein the method comprises:
acquiring a pre-constructed numerical control machine tool data ontology model, wherein the numerical control machine tool data ontology model comprises a plurality of ontology categories, ontology incidence relations among the ontology categories, standard ontology data corresponding to the ontology categories and standard relation data corresponding to the ontology incidence relations, one ontology category represents a corresponding concept in numerical control machine tool data, the data formats of the standard ontology data and the standard relation data are target language formats, and the target language format is any one of an extensible markup language format and a network ontology language format;
acquiring a preset heuristic rule set and a pre-trained named entity recognition model;
acquiring a plurality of data to be processed, selecting the heuristic rule set or the named entity identification model according to the type of each data to be processed to extract entity information of the data to be processed and acquiring an entity corresponding to the data to be processed, wherein one type corresponding to the data to be processed is any one of a relational data table type and a non-structural text type;
respectively obtaining each target ontology type corresponding to each entity in the numerical control machine tool data ontology model, obtaining entity incidence relation among the entities based on the ontology incidence relation among the target ontology types, and converting the data format corresponding to each entity and the data format corresponding to each entity incidence relation into the target language format.
Optionally, before the obtaining of the pre-constructed data ontology model of the numerical control machine, the method further includes:
acquiring a numerical control machine tool text data set, determining concepts in the numerical control machine tool text data set and concept association relations among the concepts, and respectively taking the concepts as the ontology type and the ontology association relations;
respectively carrying out standardized definition on the semantics of each concept according to the target language format to obtain corresponding standard ontology data;
respectively carrying out standardized definition on each concept incidence relation according to the target language format to obtain corresponding standard relation data;
and constructing the numerical control machine tool data ontology model according to the ontology type, the ontology incidence relation, the standard ontology data and the standard relation data.
Optionally, the numerical control machine tool text data set includes at least one of a technical manual, a research report and a processing scheme corresponding to the numerical control machine tool; the concepts include machining process, machining tool, workpiece name, workpiece type, workpiece geometry, and tool performance.
Optionally, the named entity recognition model is obtained by pre-training through the following steps:
obtaining a pre-training BERT model;
acquiring training corpus data corresponding to a numerical control machine tool, wherein the training corpus data comprises a plurality of text word vectors and entity marking information corresponding to the text word vectors;
and iteratively updating the BERT parameters of the pre-trained BERT model according to the training corpus data, a preset loss function and a preset loss threshold, and obtaining the trained named entity recognition model.
Optionally, the obtaining of the corpus data corresponding to the numerical control machine includes:
acquiring a training corpus text corresponding to the numerical control machine tool, wherein the training corpus text comprises a plurality of text sentences and entity marking information corresponding to each text sentence;
and inputting each text sentence into a vector conversion model trained in advance to obtain a text word vector corresponding to each text sentence, and constructing the training corpus data according to the text word vector and the entity marking information.
Optionally, the vector transformation model is obtained by training according to the following steps:
acquiring a corpus corresponding to the field of the numerical control machine tool, and generating a random initial semantic numerical value vector for each word in the corpus by using a word2vec model;
traversing each word in each sentence by taking the sentence in the corpus as a unit according to a sliding window sampling mode, using the words on two sides of each word as dependent variables for predicting a middle word, and performing self-training on the word2vec model according to a prediction result;
and taking the trained word2vec model as the vector conversion model.
Optionally, the iteratively updating the BERT parameters of the pre-trained BERT model according to the training corpus data, a preset loss function, and a preset loss threshold, and obtaining the trained named entity recognition model includes:
sequentially obtaining each BERT model to be judged, and judging whether each BERT model to be judged is trained or not according to a preset judgment processing flow until a BERT model to be judged which is trained is obtained, wherein the 1 st BERT model to be judged is the pretrained BERT model;
wherein, the preset judgment processing flow comprises: inputting the training corpus data into a BERT model to be judged currently and obtaining current prediction entity information corresponding to each text word vector; calculating a loss value corresponding to the current BERT model to be judged according to a preset loss function, the entity marking information and the current prediction entity information; when the loss value is larger than the loss threshold value, performing random gradient descent and back propagation based on an optimizer, and updating the BERT parameters of the current BERT model to be judged to obtain the next BERT model to be judged; and when the loss value is not greater than the loss threshold value, finishing training of the current BERT model to be judged, and taking the current BERT model to be judged as the named entity recognition model.
Optionally, the selecting the heuristic rule set or the named entity identification model according to the type of the to-be-processed data to extract entity information of the to-be-processed data and obtain an entity corresponding to the to-be-processed data includes:
traversing each row-column position corresponding to the relation data table in the data to be processed when the category corresponding to the data to be processed is the category of the relation data table;
for any current position in the relational data table, judging the information type of the current position according to the hypertext identifiers of the rows and the columns of the relational data table and the keyword information at the current position, wherein the information type is one of a header and content;
and extracting entity information of the data to be processed according to the information type, and acquiring an entity corresponding to the data to be processed.
Optionally, the selecting the heuristic rule set or the named entity identification model according to the type of the to-be-processed data to extract entity information of the to-be-processed data and obtain an entity corresponding to the to-be-processed data, further includes:
when the type corresponding to the data to be processed is a non-structural text type, inputting the data to be processed into the vector conversion model to obtain a vector to be processed corresponding to the data to be processed;
and inputting the vector to be processed into the named entity recognition model, and acquiring an entity corresponding to the vector to be processed according to the output of the named entity recognition model.
The second aspect of the present invention provides a multi-source heterogeneous data processing system for manufacturing resources of a numerically controlled machine tool, wherein the system comprises:
the system comprises an ontology model acquisition module, a network ontology language format generation module and a data processing module, wherein the ontology model acquisition module is used for acquiring a pre-constructed numerical control machine tool data ontology model, the numerical control machine tool data ontology model comprises a plurality of ontology categories, ontology incidence relations among the ontology categories, standard ontology data corresponding to the ontology categories and standard relation data corresponding to the ontology incidence relations, one ontology category represents a corresponding concept in numerical control machine tool data, the data formats of the standard ontology data and the standard relation data are target language formats, and the target language format is any one of an extensible markup language format and a network ontology language format;
the rule and entity recognition model acquisition module is used for acquiring a preset heuristic rule set and a pre-trained named entity recognition model;
the entity identification module is used for acquiring data to be processed, selecting the heuristic rule set or the named entity identification model according to the type of the data to be processed to extract entity information of the data to be processed, and acquiring an entity corresponding to the data to be processed, wherein one type corresponding to the data to be processed is any one of a relational data table type and a non-structural text type;
and the entity processing module is used for respectively acquiring each target ontology type corresponding to each entity in the numerical control machine tool data ontology model, acquiring entity incidence relations among the entities based on the ontology incidence relations among the target ontology types, and converting the data formats corresponding to the entities and the data formats corresponding to the entity incidence relations into the target language formats.
As can be seen from the above, in the solution of the present invention, a pre-constructed body model of data of a numerical control machine tool is obtained, where the body model of data of the numerical control machine tool includes a plurality of body categories, body association relations among the body categories, standard body data corresponding to the body categories, and standard relationship data corresponding to the body association relations, where one of the body categories represents a corresponding concept in data of the numerical control machine tool, a data format of the standard body data and the standard relationship data is a target language format, and the target language format is any one of an extensible markup language format and a network body language format; acquiring a preset heuristic rule set and a pre-trained named entity recognition model; acquiring a plurality of data to be processed, selecting the heuristic rule set or the named entity identification model according to the type of the data to be processed for extracting entity information of the data to be processed and acquiring an entity corresponding to the data to be processed, wherein one type corresponding to the data to be processed is any one of a relational data table type and a non-structural text type; respectively obtaining each target ontology type corresponding to each entity in the numerical control machine tool data ontology model, obtaining entity relations among the entities based on ontology relations among the target ontology types, and converting data formats corresponding to the entities and data formats corresponding to the entity association relations into the target language formats.
Compared with the existing scheme of manually searching manufacturing resource data of the numerical control machine, the multi-source heterogeneous data processing method for the manufacturing resources of the numerical control machine, provided by the invention, can realize automatic processing of the to-be-processed data corresponding to the numerical control machine, automatically identify the entity in the to-be-processed data and the entity incidence relation corresponding to each entity, and can obtain the entity corresponding to each concept corresponding to the numerical control machine and the incidence relation between each entity contained in the to-be-processed data without manual processing, so that the manpower resource and time cost are saved, and the analysis and processing efficiency corresponding to the numerical control machine is improved; and moreover, the data format corresponding to the entity and the data format of the entity incidence relation are converted into target language formats which can be directly identified and called by intelligent equipment or an intelligent control system such as a computer, a control chip and the like, so that the control of the numerical control machine tool is favorably realized, and the working efficiency of the numerical control machine tool is favorably improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a schematic flow chart of a multi-source heterogeneous data processing method for manufacturing resources of a numerically-controlled machine tool according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of building a data ontology model of a numerically-controlled machine tool according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a multi-source heterogeneous data processing system for manufacturing resources of a numerical control machine according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when …" or "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings of the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
With the development of scientific technology, especially the development of digital control technology, the application of numerical control machine tools is more and more extensive. During the use of the nc machine tool, the nc machine tool needs to be controlled according to corresponding control information (for example, process flow information, tool information, workpiece information, specific parameters, and the like) in the nc machine tool manufacturing resource data.
In the prior art, the data of the manufacturing resources of the numerical control machine tool may be distributed in different databases and/or different non-structural texts, and the data to be processed may be from different sources and have different structures (i.e. multi-source heterogeneous data). The intelligent devices such as the computer cannot extract and obtain information which can be directly used from different databases and different non-structural texts, and multi-source heterogeneous data needs manual processing. The problem in the prior art is that a large amount of human resources and time cost are consumed to process manufacturing resource data of the numerical control machine tool depending on manual work, and the corresponding data retrieval, analysis and sharing efficiency are not improved, so that the working efficiency of the numerical control machine tool is not improved.
The numerical control machine is one of basic devices in the manufacturing industry, and numerical control machine control software needs to efficiently process information related to manufacturing resources such as a process flow, a machining tool, a workpiece to be machined and the like in real time. The information is dispersed in multi-source heterogeneous data sources such as multi-type enterprise management systems and texts of upstream and downstream participants, and cannot be effectively and automatically extracted and cooperated. Specifically, the processing of data corresponding to manufacturing resources of the numerical control machine tool involves the cross combination of mechanical engineering, control engineering, computer science and artificial intelligence, and key manufacturing resource data (such as process flow information, tool information, workpiece information and the like) of machining operation of the numerical control machine tool is distributed in multi-source heterogeneous data. For example, the process flow information may be described in a technical manual or a customer requirement by a non-structural text, while the tool and workpiece information is stored in a relational database (Oracle, etc.) of an application system such as an enterprise resource planning system (ERP), a computer aided process design system (CAPP), a Manufacturing Execution System (MES), etc., the storage location (i.e., source) and the corresponding data format (i.e., data structure) of various data information may not be the same, so that a data island may be formed, automatic collaboration between various information may not be possible, and a computer or a control system may not directly use the above various data information. In the actual use process, the required processing information needs to be manually retrieved and then input into the numerical control software system, so that the efficiency is low and the error rate is high.
In an application scenario, information can be extracted by manually screening the query languages of the databases such as Structured Query Language (SQL) and then input into the numerical control software, but the method needs to master the number of SQL languages, so that the efficiency is low for operators, the error rate is high, the structures of the SQL databases are different (namely, key value tables are designed differently), and the information collaboration difficulty in multi-database query is further increased.
In another application scenario, a digital twin may be created for a numerically controlled machine tool, generating a parameterized machine tool model. However, the information sharing mode based on the digital twin does not carry out standardized definition on the input information, and the information ambiguity is large.
The ontology model can be used to correlate multiple relational databases, reducing data islanding, which is implemented by relying on two transformation criteria published by W3C. The first is Direct Mapping (Direct Mapping), in the Direct Mapping process, a simple conversion rule set is defined, semantics hidden in a relational data table are expressed explicitly, a relational database table structure and data are directly output to a body bottom layer Resource Description Framework (RDF) graph format, and the conversion rule mainly comprises: the base table is converted into a class of an ontology, the columns of the table are converted into attributes, the rows are converted into resources, the cell values of the table are converted into attribute values, and the like.
The second type is R2RML (RDB to RDF Mapping Language), where R2RML is a custom Mapping rule Language that represents Mapping from a relational database to an RDF dataset, and a systematic logical framework is defined for Mapping through the concept of a logical table. A logical table represents a logical structure in a relational database, and can be a table, a view, or a valid SQL query definition. The logical table mode breaks through the physical structure limitation of the relational database table, and forms the specification of flexibly generating RDF data according to the requirement. Before generating RDF triple data, the original structure of a relational database is not required to be changed, and the data can be calculated, screened, processed, integrated and cleaned, so that the method has customizability and flexibility. However, the existing ontology technology is not used for the cooperative processing process of the processing information with a bottom layer and fine granularity (namely, more specific), and in the invention, the technology such as ontology, deep learning, heuristic rules and the like is combined to realize the cooperative processing of the manufacturing resource information of the numerical control machine tool with multi-source isomerism.
Specifically, in the invention, the concept category and the association relation in the field of the numerical control machine tool are defined by using an Ontology, the Ontology category is subjected to universal coding by using Extensible Markup Language (XML) or Web Ontology Language (OWL), and the Ontology category is converted into a data format (namely a target Language format) which can be analyzed and interacted by a computer; entity information involved in machining of the numerical control machine tool is recognized and extracted from each multi-source heterogeneous data to be processed by combining a preset heuristic rule set and a named entity recognition model, semantic similarity calculation is used for matching with the category of the entity, a relation is automatically established based on the prior association of the entity (namely the entity association relation in the entity model), and the corresponding entity and the established entity association relation between the entities are converted into the target language format, so that information which can be directly used by the computer is obtained, an information island between data sources is eliminated, the requirement of manual information query is reduced to the maximum extent, the information interaction efficiency of each system of the numerical control machine tool is improved, and the working efficiency of the numerical control machine tool is improved.
Exemplary method
As shown in fig. 1, an embodiment of the present invention provides a multi-source heterogeneous data processing method for manufacturing resources of a numerical control machine tool, and specifically, the method includes the following steps:
step S100, obtaining a pre-constructed data ontology model of the numerical control machine, where the data ontology model of the numerical control machine includes a plurality of ontology classes, an ontology association relationship between the ontology classes, standard ontology data corresponding to each ontology class, and standard relationship data corresponding to each ontology association relationship, where one ontology class represents a corresponding concept in the data of the numerical control machine, a data format of the standard ontology data and the standard relationship data is a target language format, and the target language format is any one of an xml format and a network ontology language format.
The data ontology model of the numerical control machine tool is a general ontology model constructed according to data in the field of the numerical control machine tool, and can embody the association relationship between each core concept and abstract concept in the field of the numerical control machine tool. Specifically, the corresponding ontology categories may include qualitative concepts such as names/types of abstract categories of a main process, a machining tool, and a workpiece of the numerical control machine tool, and may also include quantitative concepts such as geometric shapes/properties of the machining tool and the machining workpiece. The association relationship may be a usage correspondence or a hierarchical relationship between concepts, for example, a hierarchical relationship between different coarse-grained concepts (where the coarse-grained concepts are abstract generic categories, and the fine-grained concepts are aspect extensions of the abstract concepts, such as tool categories that can be subdivided into finer-grained categories, such as turning tools and hole machining tools), and an empirical fitting relationship between the tools and the process flow.
Specifically, in this embodiment, as shown in fig. 2, before the obtaining of the pre-constructed data ontology model of the numerical control machine tool, the method further includes the following steps:
step A100, obtaining a numerical control machine tool text data set, and determining concepts in the numerical control machine tool text data set and concept association relations among the concepts.
In an application scenario, concepts corresponding to a corresponding field (for example, a field of a numerical control machine tool) and association relations between the concepts can be combed in advance in a manual manner. In this way, entities extracted subsequently through heuristic rules or BERT can be respectively used as entities corresponding to the ontology classes, and after the entities and the ontology classes are aligned, a mapping relation of 'entity-membership-ontology classes' is formed, so that the relationship between the entity classes is established by using the association relation between the aligned ontology classes, namely if a certain relationship exists between the aligned ontology classes, a certain relationship is also established between the entities; after the relations exist, XML or OWL language is used for converting the relations into a computer-readable general data format, so that semantic retrieval is conveniently carried out among processing resource entities related to a plurality of machine tools, for example, the semantic retrieval is conveniently inquired which machine tools use a certain resource, and corresponding processes and process characteristics are used each time. Therefore, the relation between the multi-machine tool and multi-source heterogeneous processing resource information is established, information retrieval is facilitated, and information isolated islands between different current machine tools and different data management systems are broken through.
Step A200, respectively carrying out standardized definition on the semantics of each concept according to the target language format to obtain corresponding standard ontology data.
Step A300, respectively defining the concept association relations in a standardized manner according to the target language format to obtain corresponding standard relation data.
Step A400, constructing the data ontology model of the numerical control machine according to the ontology type, the ontology association relationship, the standard ontology data and the standard relationship data.
The numerical control machine tool text data set comprises at least one of a technical manual, a research report and a processing scheme corresponding to the numerical control machine tool; the concepts include machining process, machining tool, workpiece name, workpiece type, workpiece geometry, and tool performance. It should be noted that the above numerical control machine tool text data set may further include other data, and the above concept may further include other specific information, which may be set and adjusted according to actual requirements, and is not limited specifically herein.
Specifically, the numerical control machine tool text data set is data formed by combining a plurality of text data in the numerical control machine tool field, various general concepts corresponding to the numerical control machine tool field can be analyzed and obtained according to the numerical control machine tool text data set, and the analyzed concepts can be used as core concepts corresponding to the numerical control machine tool.
In this embodiment, based on data such as a technical manual of a numerical control machine tool in the field of a numerical control machine tool, a processing scheme, and the like, the abstract concepts and the association relationships between the concepts in the processing process of the numerical control machine tool are combed, the abstracted concepts obtained by combing can be used as an ontology category in an ontology model, and the corresponding association relationships can be used as an ontology association relationship.
Further, in this embodiment, in order to facilitate the computer or the numerical control system to call and process the finally obtained data, standard definition may be performed on the obtained ontology type and the ontology association relationship, that is, the ontology type and the ontology association relationship in the natural language format are respectively converted according to the target language format, so as to obtain corresponding standard ontology data and standard relationship data. In an application scenario, the ontology type and the ontology association relationship of the corresponding semantic vector format can be converted respectively according to the target format language to obtain corresponding standard ontology data and standard relationship data, wherein the vector is only used for converting a text containing an entity to facilitate entity extraction by BERT, the target language format is used for converting the natural language category, relationship name and entity name, namely after the entity is extracted, the entity name is converted, and the conversion process does not involve the vector.
It should be noted that, in this embodiment, the format of the association relationship between the ontology categories and the ontology may be a natural language format, which is convenient for a user to view at any time; or the format of the semantic vector obtained by converting the corresponding natural language through a vector conversion model (or a natural language-semantic vector dictionary) trained in advance, so that the processing by a computer is facilitated. The target language format may be in an extensible markup language format or a network ontology language format that can be directly parsed, called, interacted, and shared by the intelligent device such as a computer and the digital control system, and may also be in other language formats that can be directly parsed, called, interacted, and shared by the digital control system, such as a SHOE, an OIL, and the like.
In this embodiment, preferably, the format of each ontology type and the ontology incidence relation is a natural language format, a pre-trained vector conversion model is used to create a standardized semantic vector for all the ontology types and the ontology incidence relations, and XML/OWL is used to perform general coding, which is convenient for a computer to call.
And step S200, acquiring a preset heuristic rule set and a pre-trained named entity recognition model.
The heuristic rule set is used for processing the data relation table and extracting entity information in the data relation table. Specifically, in the field of numerically controlled machines, the formats of the data relationship tables are basically similar, for example, each data relationship table has a row-column identifier and a corresponding line-change symbol, the table header usually represents an entity name (concept name), and the values of each row under the table header represent the content (value) of a corresponding concept, so that a set of general heuristic rule set can be preset according to the characteristics of the data relationship table for extracting table type information.
In this embodiment, the preset heuristic rule set may be adapted to a main management system of the numerically-controlled machine tool enterprise, such as CAPP, ERP, MES, and the like, to extract information of the numerically-controlled machine tool machining entity from the main management system. The key steps are as follows: receiving a relation table, and using the relation table row hypertext mark and the key word containing condition of the current position of the table to judge the position is a table head (key) or content (value), namely judging the information type corresponding to the position, thereby realizing the information extraction of the universal self-adapting relation table according to the judged information type. For example, a heuristic rule set with general adaptation is used for processing a data relation table in a database such as CAPP, ERP, MES and the like, distinguishing key value elements are identified from the data relation table, and entity information such as a processing technology, a processing tool, a processing component and the like is extracted.
The pre-trained named entity recognition model is obtained by training a pre-trained BERT model in advance according to related training corpus data in the field of numerical control machines. The pre-training BERT model may be an existing model that can be used for named entity recognition, but is not trained in the field of numerically-controlled machine tools, so that in this embodiment, the model can be accurately trained and updated according to related corpus data in the field of numerically-controlled machine tools, and named entity recognition can be better performed on data in the field of numerically-controlled machine tools.
In this embodiment, the named entity recognition model is obtained by pre-training through the following steps: obtaining a pre-training BERT model; acquiring training corpus data corresponding to a numerical control machine tool, wherein the training corpus data comprises a plurality of text word vectors and entity marking information corresponding to the text word vectors; and iteratively updating the BERT parameters of the pre-trained BERT model according to the training corpus data, a preset loss function and a preset loss threshold, and obtaining the trained named entity recognition model.
Specifically, in this embodiment, the obtaining of the corpus data corresponding to the numerical control machine tool includes: acquiring a training corpus text corresponding to the numerical control machine tool, wherein the training corpus text comprises a plurality of text sentences and entity marking information corresponding to each text sentence; and inputting each text sentence into a vector conversion model trained in advance to obtain a text word vector corresponding to each text sentence, and constructing the training corpus data according to the text word vector and the entity marking information.
In this embodiment, in order to reduce the repeated data processing process, the text data set of the numerical control machine tool may be directly processed (e.g., sentence segmentation, word segmentation, stop word removal processing, and the like) to obtain each text word vector, and after entity labeling is performed on each text word vector, corresponding training corpus data is obtained, where the training corpus text is a text in the text data set of the numerical control machine tool. In an application scenario, the corpus data after the labeling and other processing has been completed by other people may also be used, and is not specifically limited herein.
In another application scenario, after the corresponding vector conversion model is obtained, a vector dictionary in the field of the numerical control machine tool, that is, a set composed of natural language words, word semantic vectors (that is, text word vectors) and corresponding relations thereof, may also be obtained in advance directly according to the training corpus data and the vector conversion model. Therefore, in the using process, for a word in a natural language format, the corresponding text word vector can be directly searched and obtained in the dictionary without calling a model for calculation, and the data processing efficiency is further improved.
In this embodiment, the vector transformation model is obtained by training according to the following steps: acquiring a corpus corresponding to the field of the numerical control machine tool, and generating a random initial semantic numerical value vector for each word in the corpus by using a word2vec model; traversing each word in each sentence by taking the sentence in the corpus as a unit according to a sliding window sampling mode, using the words on two sides of each word as dependent variables for predicting a middle word, and performing self-training on the word2vec model according to a prediction result; and taking the trained word2vec model as the vector conversion model.
Specifically, text corpora in the field of numerical control machine tools are sorted and collected from data such as technical manuals, research reports, processing schemes and the like to form a corpus, the corpus is received by using a word2vec model through an unsupervised training method, and random initial semantic numerical value vectors are generated for all words in the corpus. And traversing each word in each sentence by taking the sentence in the corpus as a unit and adopting a sliding window sampling mode, wherein the words on two sides of each word are used as dependent variables for predicting the middle word, so that self-training is carried out, and the obtained word2vec model after training can output word vectors which effectively reflect word semantics in the field of the numerical control machine according to the input text sentences.
In this embodiment, the trained BERT model is a deep learning model, and specifically, the trained BERT model is used to extract information in the unstructured text and identify a corresponding named entity. In this embodiment, for the corpus training text, each entity is labeled in advance in the form of an entity start (B-), an entity middle section (I-), and a non-entity (O) in the text as corresponding entity labeling information, and then the training of the named entity is implemented according to the entity labeling information.
Specifically, the iteratively updating the BERT parameters of the pre-trained BERT model according to the training corpus data, a preset loss function and a preset loss threshold, and obtaining the trained named entity recognition model includes:
sequentially obtaining each BERT model to be judged, and judging whether each BERT model to be judged is trained or not according to a preset judgment processing flow until a BERT model to be judged which is trained is obtained, wherein the 1 st BERT model to be judged is the pretrained BERT model;
wherein, the preset judgment processing flow comprises: inputting the training corpus data into a BERT model to be judged currently and obtaining current prediction entity information corresponding to each text word vector; calculating a loss value corresponding to the current BERT model to be judged according to a preset loss function, the entity marking information and the current prediction entity information; when the loss value is larger than the loss threshold value, performing random gradient descent and back propagation based on an optimizer and updating BERT parameters of the current BERT model to be judged to obtain a next BERT model to be judged; and when the loss value is not greater than the loss threshold value, finishing training of the current BERT model to be judged, and taking the current BERT model to be judged as the named entity recognition model.
For example, the 1 st BERT model to be judged is the pre-trained BERT model, the pre-trained BERT model is processed and judged according to a preset judgment processing flow, if the loss value is greater than the loss threshold, the training is not completed, the updated BERT model is obtained after being updated and is used as the 2 nd BERT model to be judged, and the like, until the obtained loss value of the ith BERT model to be judged is not greater than the loss threshold, the training of the ith BERT model to be judged is completed, and the ith BERT model to be judged is used as a named entity recognition model.
Specifically, a BERT model to be judged is trained according to the following steps: receiving a text word vector; automatically capturing semantic features of the user; preliminarily predicting the entity position; defining a loss function and an optimizer; comparing the predicted entity position with the marked entity position (namely entity marking information), calculating a corresponding loss value according to a loss function, performing random gradient descent and back propagation on the basis of an optimizer according to the calculated current loss value, and updating a BERT parameter; and circulating the steps until the training of the BERT model to be judged is finished (namely the calculated loss value is not greater than the preset loss threshold), obtaining the trained BERT model to be judged, and using the trained BERT model as a named entity recognition model. The text word vector is a semantic vector corresponding to the text word, that is, a word semantic vector.
Therefore, the BERT model capable of automatically extracting the processing entity information from the numerical control machine tool process descriptive natural language text can be obtained through the acquired data in the numerical control machine tool corpus, and the extraction of the entity is realized.
Step S300, acquiring a plurality of data to be processed, selecting the heuristic rule set or the named entity identification model according to the type of the data to be processed for extracting entity information of the data to be processed for each data to be processed, and acquiring an entity corresponding to the data to be processed, wherein one type corresponding to the data to be processed is any one of a relational data table type and a non-structural text type.
The sources of the data to be processed may be different (i.e., multiple sources), and the data formats of the data to be processed may also be different (i.e., heterogeneous), so that the data processing method provided by the embodiment may process the data to be processed with heterogeneous multiple sources. Specifically, whether the newly input data is a relational data table or a non-structural text is judged, and entity information corresponding to the numerical control machine tool is obtained according to a heuristic rule set or a named entity identification model for different types of data.
In an application scenario, the selecting the heuristic rule set or the named entity identification model according to the category of the to-be-processed data to extract entity information of the to-be-processed data and obtain an entity corresponding to the to-be-processed data includes: traversing each row-column position corresponding to the relation data table in the data to be processed when the category corresponding to the data to be processed is the category of the relation data table; for any current position in the relational data table, judging the information type of the current position according to the column hypertext mark of the relational data table and the keyword information at the current position, wherein the information type is one of a table header and content; and extracting entity information of the data to be processed according to the information type, and acquiring an entity corresponding to the data to be processed. It should be noted that the directly extracted entity information is in a natural language format, and may be converted into a corresponding text word vector (for example, by using a vector conversion model) and then used as a corresponding entity, so that semantic matching and alignment between the entity and the ontology category are facilitated.
In another application scenario, the selecting the heuristic rule set or the named entity identification model according to the type of the to-be-processed data to extract entity information of the to-be-processed data and obtain an entity corresponding to the to-be-processed data further includes: when the type corresponding to the data to be processed is a non-structural text type, inputting the data to be processed into the vector conversion model to obtain a vector to be processed corresponding to the data to be processed; and inputting the vector to be processed into the named entity recognition model, and acquiring an entity corresponding to the vector to be processed according to the output of the named entity recognition model. It should be noted that the entity output by the named entity recognition model is already a text word vector, so that no vector conversion is needed.
Step S400, respectively obtaining each target ontology type corresponding to each entity in the data ontology model of the numerical control machine, obtaining an entity association relationship between each entity based on the ontology association relationship between each target ontology type, and converting the data format corresponding to each entity and the data format corresponding to each entity association relationship into the target language format.
Specifically, the obtained entity is matched with each ontology type in the data ontology model of the numerical control machine tool, an ontology type matched with the entity is obtained and is used as a target ontology type corresponding to the entity, and then an ontology incidence relation corresponding to the target ontology type can also be used as an entity incidence relation corresponding to the entity. It should be noted that, in this embodiment, a plurality of entities are obtained, and therefore, a plurality of target ontology categories correspond to the entities, and only the relationship among the plurality of entities needs to be obtained in the actual use process, so that only the ontology association relationship among the corresponding plurality of target ontology categories needs to be obtained and used as the corresponding entity association relationship, and the association relationship unrelated to the obtained ontology does not need to be considered.
In this embodiment, based on each extracted entity, the ontology class in the ontology model is traversed, semantic similarity between the entity and the ontology class is calculated (the similarity between corresponding text word vectors may be calculated as semantic similarity), and automatic alignment between the entity and the ontology class is achieved based on the similarity. Specifically, the ontology class with the highest similarity (and the similarity is higher than a preset threshold) is selected as a target ontology class corresponding to the current entity, and then the relationship between the entities is automatically established according to semantic association between the ontology classes, that is, the relationship between the entity objects is automatically established based on the association relationship between the abstract ontology classes.
Furthermore, standard ontology data and standard relation data corresponding to the aligned target ontology type are called to replace the current entity and entity incidence relation, and corresponding data which can be directly called by a computer or a numerical control system can be obtained. For example, an XML code or OWL code corresponding to the aligned target ontology type may be called to replace the current entity and the entity association relationship thereof, so as to achieve information standardization.
It should be noted that, after the entity and the entity association relationship after being converted into the target language format are obtained, the numerical control machine tool may also be controlled according to the entity and the entity association relationship. Specifically, the entity and the entity association relationship after being converted into the target language format are input into a numerical control system of the computer or the numerical control machine tool, so that the computer or the numerical control system can call corresponding data conveniently and control the numerical control machine tool is realized.
Specifically, the multi-source heterogeneous data processing method for the manufacturing resources of the numerical control machine tool provided in this embodiment may be divided into a construction stage of tools required by models, rules and the like and a use stage of the method, the construction stage includes ontology model construction, corpus collection of the numerical control machine tool, vector conversion model training, heuristic rule design and training of a named entity recognition model, and the steps in the construction stage may be completed in advance, and corresponding results are stored and applied to a processing process of different data to be processed, so that the steps in the construction stage do not need to be repeatedly executed when the multi-source heterogeneous data to be processed is confronted, and the data processing efficiency is favorably improved. The using stage comprises receiving and classifying the data to be processed, extracting entity information required by processing, aligning the entity with the body and encoding the general semantics, and according to the steps, the entity corresponding to the key concept in the data to be processed and the relation between the entities can be automatically extracted and converted into a data format which can be directly processed by a computer, so that the subsequent control operation of the numerical control machine tool is facilitated, manual participation is not needed, and the intelligent level of the numerical control machine tool is improved. Specifically, in the construction stage, a knowledge body of manufacturing resources of the numerical control machine tool is constructed to cover the semantic association among the main process flow, the cutter, the workpiece elements and the elements; establishing standardized semantics for all categories and relations in the ontology and performing universal coding by using XML/OWL; descriptive texts such as non-structural process manuals and the like are collected as corporations, word semantic vectors in the field of numerical control machines are obtained by utilizing a vector conversion model, and universal multi-adaptive numerical control machine manufacturing resource named entity extraction is realized by integrating heuristic rules and BERT aiming at the expression form of main process description data. In the using stage, a relational data table and a process description text in an ERP (enterprise resource planning), MES (manufacturing execution system), CAPP (control and accounting) system are received, resource entity information is extracted from the relational data table and the process description text, semantic vector similarity is calculated and aligned with the body type, a semantic relation is automatically established based on the association defined by the body priori, an XML/OWL (extensible markup language/ontology language) general format of the body type is called to replace original information, standard definition of process information of the multi-source numerical control machine tool is realized, a data isolated island between multi-source heterogeneous data is eliminated, and information interaction between numerical control machine tool control systems is enhanced.
It should be noted that the data processing method provided in this embodiment may also be used for processing multi-source heterogeneous data in other scenarios, and information matching and interaction based on the ontology general semantic definition, the entity identification, and the ontology alignment process are implemented.
As can be seen from the above, in the multi-source heterogeneous data processing method for manufacturing resources of a numerically-controlled machine tool, a pre-constructed data ontology model of the numerically-controlled machine tool is obtained, where the data ontology model of the numerically-controlled machine tool includes a plurality of ontology categories, ontology association relationships among the ontology categories, standard ontology data corresponding to each ontology category, and standard relationship data corresponding to each ontology association relationship, where one ontology category represents a corresponding concept in data of the numerically-controlled machine tool, data formats of the standard ontology data and the standard relationship data are in a target language format, and the target language format is any one of an extensible markup language format and a network ontology language format; acquiring a preset heuristic rule set and a pre-trained named entity recognition model; acquiring a plurality of data to be processed, selecting the heuristic rule set or the named entity identification model according to the type of each data to be processed to extract entity information of the data to be processed and acquiring an entity corresponding to the data to be processed, wherein one type corresponding to the data to be processed is any one of a relational data table type and a non-structural text type; respectively obtaining each target ontology type corresponding to each entity in the numerical control machine tool data ontology model, obtaining entity incidence relation among the entities based on the ontology incidence relation among the target ontology types, and converting the data format corresponding to each entity and the data format corresponding to each entity incidence relation into the target language format.
Compared with the scheme of processing numerical control machine manufacturing resource data manually in the prior art, the multi-source heterogeneous data processing method for numerical control machine manufacturing resources provided by the invention can realize automatic processing of data to be processed corresponding to a numerical control machine, automatically identify entities in the data to be processed and entity incidence relations corresponding to the entities, and obtain the entities corresponding to concepts corresponding to the numerical control machine and the incidence relations among the entities without manual processing, thereby being beneficial to saving human resources and time cost and improving the efficiency of corresponding data analysis and processing; and moreover, the data format corresponding to the entity and the data format of the entity incidence relation are converted into target language formats which can be directly identified and called by intelligent equipment or an intelligent control system such as a computer, a control chip and the like, so that the control of the numerical control machine tool is favorably realized, and the working efficiency of the numerical control machine tool is favorably improved.
Specifically, the heuristic rule designed in advance in this embodiment can be universally adapted to different types of relationship table data including different hierarchical structures in systems such as ERP, CAPP, MES and the like mainly used by a numerical control machine enterprise, and the manufacturing resource entity is automatically extracted without designing an extraction rule for each type of table. And the method can automatically extract related manufacturing resource entity information from the description text data of the natural language processed by the numerical control machine tool by training the BERT model, and solves the problem that the existing method can not effectively process the numerical control of the non-structural text. By defining the body type and the body association relationship, the extracted numerical control machine manufacturing resource entity can be automatically aligned with the body model based on semantic vector similarity, the relationship is automatically established according to the semantic association between the body types, OWL/XML general definitions are called to replace original information, the association between multi-source data is automatically established while information disambiguation is realized, a data isolated island is eliminated, information interaction of a numerical control machine control system is enhanced, and therefore the data processing efficiency and the working efficiency of a numerical control machine are improved.
Exemplary device
As shown in fig. 3, an embodiment of the present invention further provides a multi-source heterogeneous data processing system for manufacturing resources of a numerical control machine, corresponding to the above multi-source heterogeneous data processing method for manufacturing resources of a numerical control machine, where the system includes:
an ontology model obtaining module 510, configured to obtain a pre-constructed data ontology model of the numerical control machine tool, where the data ontology model of the numerical control machine tool includes a plurality of ontology categories, ontology associations among the ontology categories, standard ontology data corresponding to the ontology categories, and standard relationship data corresponding to the ontology associations, where one ontology category represents a corresponding concept in data of the numerical control machine tool, a data format of the standard ontology data and the standard relationship data is a target language format, and the target language format is any one of an extensible markup language format and a network ontology language format.
The data ontology model of the numerical control machine tool is a general ontology model constructed according to data in the field of the numerical control machine tool, and can embody the association relationship between each core concept and abstract concept in the field of the numerical control machine tool. Specifically, the corresponding ontology categories may include qualitative concepts such as names/types of abstract categories of a main process, a machining tool, and a workpiece of the numerical control machine tool, and may also include quantitative concepts such as geometric shapes/properties of the machining tool and the machining workpiece. The association relationship may be a usage correspondence or a hierarchical relationship between concepts, for example, a tree-like hierarchical relationship between different coarse and fine granularity concepts, an empirical adaptive relationship between a tool and a process flow, and the like.
And a rule and entity identification model obtaining module 520, configured to obtain a set of pre-set heuristic rules and a pre-trained named entity identification model.
The heuristic rule set is used for processing the data relation table and extracting entity information in the data relation table. The pre-trained named entity recognition model is obtained by training a pre-trained BERT model in advance according to related training corpus data in the field of numerical control machines.
An entity identification module 530, configured to obtain to-be-processed data, select the heuristic rule set or the named entity identification model according to the type of the to-be-processed data to perform entity information extraction on the to-be-processed data, and obtain an entity corresponding to the to-be-processed data, where a type corresponding to the to-be-processed data is any one of a relational data table type and a non-structural text type.
The sources of the data to be processed may be different (i.e., multiple sources), and the data formats of the data to be processed may also be different (i.e., heterogeneous), so that the data processing system provided by the embodiment can process the data to be processed with heterogeneous sources. Specifically, whether the newly input data is a relational data table or a non-structural text is judged, and entity information corresponding to the numerical control machine tool is obtained according to a heuristic rule set or a named entity identification model for different types of data.
And an entity processing module 540, configured to obtain each target ontology type corresponding to each entity in the data ontology model of the numerical control machine tool, obtain an entity association relationship between the entities based on the ontology association relationship between the target ontology types, and convert a data format corresponding to each entity and a data format corresponding to each entity association relationship into the target language format.
Specifically, the obtained entity is matched with each ontology type in the data ontology model of the numerical control machine tool, an ontology type matched with the entity is obtained and used as a target ontology type corresponding to the entity, and then an ontology incidence relation corresponding to the target ontology type can also be used as an entity incidence relation corresponding to the entity. It should be noted that, in this embodiment, a plurality of entities are obtained, and therefore, a plurality of target ontology categories correspond to the entities, and only the relationship among the plurality of entities needs to be obtained in the actual use process, so that only the ontology association relationship among the corresponding plurality of target ontology categories needs to be obtained and used as the corresponding entity association relationship, and the association relationship unrelated to the obtained ontology does not need to be considered.
Specifically, in this embodiment, the specific functions of the multi-source heterogeneous data processing system for the manufacturing resources of the numerical control machine and the modules thereof may refer to the corresponding descriptions in the multi-source heterogeneous data processing method for the manufacturing resources of the numerical control machine, and are not described herein again.
It should be noted that, the dividing manner of each module of the multi-source heterogeneous data processing system for the manufacturing resources of the numerical control machine tool is not unique, and is not limited herein.
Based on the embodiment, the invention further provides the intelligent terminal. The intelligent terminal comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. Wherein, the processor of the intelligent terminal is used for providing calculation and control capability. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a multi-source heterogeneous data processing program for manufacturing resources of the numerically controlled machine tool. The internal memory provides an environment for the operation system in the nonvolatile storage medium and the running of the multi-source heterogeneous data processing program aiming at the manufacturing resources of the numerical control machine tool. The network interface of the intelligent terminal is used for being connected and communicated with an external terminal through a network. When being executed by a processor, the multi-source heterogeneous data processing program for the manufacturing resources of the numerical control machine tool realizes the steps of any one of the above multi-source heterogeneous data processing methods for the manufacturing resources of the numerical control machine tool. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen.
The embodiment of the invention also provides a computer-readable storage medium, wherein the computer-readable storage medium is stored with a multi-source heterogeneous data processing program for manufacturing resources of the numerical control machine tool, and when the multi-source heterogeneous data processing program for manufacturing resources of the numerical control machine tool is executed by a processor, the steps of any multi-source heterogeneous data processing method for manufacturing resources of the numerical control machine tool provided by the embodiment of the invention are realized.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above functions may be distributed as different functional units and modules according to needs, that is, the internal structure of the system may be divided into different functional units or modules to implement all or part of the above described functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art would appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed system/terminal device and method can be implemented in other ways. For example, the above-described system/terminal device embodiments are merely illustrative, and for example, the division of the above modules or units is only one logical division, and the actual implementation may be implemented by another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
The integrated modules/units described above, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium and can implement the steps of the embodiments of the method when the computer program is executed by a processor. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file or some intermediate form. The computer readable medium may include: any entity or device capable of carrying the above-mentioned computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunication signal, software distribution medium, etc. It should be noted that the contents contained in the computer-readable storage medium can be increased or decreased as required by legislation and patent practice in the jurisdiction.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein.

Claims (10)

1. A multi-source heterogeneous data processing method for manufacturing resources of a numerical control machine tool is characterized by comprising the following steps:
acquiring a pre-constructed numerical control machine tool data ontology model, wherein the numerical control machine tool data ontology model comprises a plurality of ontology categories, ontology incidence relations among the ontology categories, standard ontology data corresponding to the ontology categories and standard relation data corresponding to the ontology incidence relations, one ontology category represents a corresponding concept in numerical control machine tool data, the data formats of the standard ontology data and the standard relation data are target language formats, and the target language format is any one of an extensible markup language format and a network ontology language format;
acquiring a preset heuristic rule set and a pre-trained named entity recognition model;
acquiring a plurality of data to be processed, selecting the heuristic rule set or the named entity identification model according to the category of the data to be processed for extracting entity information of the data to be processed and acquiring an entity corresponding to the data to be processed, wherein one category corresponding to the data to be processed is any one of a relational data table category and a non-structural text category;
respectively obtaining each target ontology type corresponding to each entity in the numerical control machine tool data ontology model, obtaining entity incidence relation among the entities based on the ontology incidence relation among the target ontology types, and converting data formats corresponding to the entities and data formats corresponding to the entity incidence relation into the target language format.
2. The multi-source heterogeneous data processing method for numerically-controlled machine tool manufacturing resources according to claim 1, wherein before the obtaining of the pre-built numerically-controlled machine tool data ontology model, the method further comprises:
acquiring a numerical control machine tool text data set, determining concepts in the numerical control machine tool text data set and concept incidence relations among the concepts, and respectively taking the concepts as the ontology type and the ontology incidence relations;
respectively carrying out standardized definition on the semantics of each concept according to the target language format to obtain corresponding standard ontology data;
respectively carrying out standardized definition on each concept incidence relation according to the target language format to obtain corresponding standard relation data;
and constructing the numerical control machine tool data ontology model according to the ontology type, the ontology incidence relation, the standard ontology data and the standard relation data.
3. The multi-source heterogeneous data processing method for the manufacturing resources of the numerical control machine according to claim 2, wherein the numerical control machine text data set comprises at least one of a technical manual, a research report and a processing scheme corresponding to the numerical control machine; the concepts include machining process, machining tool, workpiece name, workpiece type, workpiece geometry, and tool performance.
4. The multi-source heterogeneous data processing method for numerically-controlled machine tool manufacturing resources according to claim 1, wherein the named entity recognition model is obtained by pre-training through the following steps:
obtaining a pre-training BERT model;
obtaining training corpus data corresponding to a numerical control machine tool, wherein the training corpus data comprises a plurality of text word vectors and entity marking information corresponding to each text word vector;
and iteratively updating the BERT parameters of the pre-trained BERT model according to the training corpus data, a preset loss function and a preset loss threshold value, and obtaining the trained named entity recognition model.
5. The multi-source heterogeneous data processing method for the manufacturing resources of the numerical control machine tool according to claim 4, wherein the obtaining of the corpus data corresponding to the numerical control machine tool comprises:
acquiring a corpus text corresponding to the numerical control machine tool, wherein the corpus text comprises a plurality of text sentences and entity labeling information corresponding to each text sentence;
inputting each text statement into a vector conversion model trained in advance to obtain a text word vector corresponding to each text statement, and constructing the training corpus data according to the text word vector and the entity labeling information.
6. The multi-source heterogeneous data processing method for numerical control machine tool manufacturing resources according to claim 5, wherein the vector conversion model is obtained by training according to the following steps:
acquiring a corpus corresponding to the field of the numerical control machine tool, and generating a random initial semantic numerical value vector for each word in the corpus by using a word2vec model;
traversing each word in each sentence by taking the sentence in the corpus as a unit according to a sliding window sampling mode, taking the words on two sides of each word as dependent variables for predicting a middle word, and performing self-training on the word2vec model according to a prediction result;
and taking the trained word2vec model as the vector conversion model.
7. The multi-source heterogeneous data processing method for the manufacturing resources of the numerical control machine according to claim 4, wherein the iteratively updating the BERT parameters of the pre-trained BERT model according to the training corpus data, a preset loss function and a preset loss threshold and obtaining the trained named entity recognition model comprises:
sequentially obtaining each BERT model to be judged, and judging whether each BERT model to be judged is trained or not according to a preset judgment processing flow until a BERT model to be judged which is trained is obtained, wherein the 1 st BERT model to be judged is the pretrained BERT model;
wherein, the preset judgment processing flow comprises: inputting the training corpus data into a BERT model to be judged currently and obtaining current prediction entity information corresponding to each text word vector; calculating a loss value corresponding to the current BERT model to be judged according to a preset loss function, the entity marking information and the current prediction entity information; when the loss value is larger than the loss threshold value, performing random gradient descent and back propagation based on an optimizer, updating the BERT parameters of the current BERT model to be judged, and obtaining the next BERT model to be judged; and when the loss value is not greater than the loss threshold value, the training of the current BERT model to be judged is completed, and the current BERT model to be judged is used as the named entity recognition model.
8. The multi-source heterogeneous data processing method for the manufacturing resources of the numerical control machine tool according to claim 5, wherein the selecting the heuristic rule set or the named entity recognition model according to the category of the data to be processed to extract entity information of the data to be processed and obtain an entity corresponding to the data to be processed comprises:
when the category corresponding to the data to be processed is a relational data table category, traversing each row-column position corresponding to the relational data table in the data to be processed;
for any current position in the relational data table, judging the information type of the current position according to the column hypertext mark of the relational data table and the keyword information at the current position, wherein the information type is one of a table header and content;
and extracting entity information of the data to be processed according to the information category, and acquiring an entity corresponding to the data to be processed.
9. The multi-source heterogeneous data processing method for the manufacturing resources of the numerical control machine according to claim 8, wherein the selecting the heuristic rule set or the named entity recognition model according to the category of the data to be processed performs entity information extraction on the data to be processed and obtains an entity corresponding to the data to be processed, further comprising:
when the category corresponding to the data to be processed is a non-structural text category, inputting the data to be processed into the vector conversion model to obtain a vector to be processed corresponding to the data to be processed;
and inputting the vector to be processed into the named entity recognition model, and acquiring an entity corresponding to the vector to be processed according to the output of the named entity recognition model.
10. A multi-source heterogeneous data processing system for numerically controlled machine tool manufacturing resources, the system comprising:
the system comprises an ontology model acquisition module, a network ontology language format and a data processing module, wherein the ontology model acquisition module is used for acquiring a pre-constructed numerical control machine tool data ontology model, the numerical control machine tool data ontology model comprises a plurality of ontology categories, ontology incidence relations among the ontology categories, standard ontology data corresponding to the ontology categories and standard relation data corresponding to the ontology incidence relations, one ontology category represents a corresponding concept in numerical control machine tool data, the data formats of the standard ontology data and the standard relation data are target language formats, and the target language format is any one of an extensible markup language format and a network ontology language format;
the rule and entity recognition model acquisition module is used for acquiring a preset heuristic rule set and a pre-trained named entity recognition model;
the entity identification module is used for acquiring data to be processed, selecting the heuristic rule set or the named entity identification model according to the type of the data to be processed to extract entity information of the data to be processed, and acquiring an entity corresponding to the data to be processed, wherein one type corresponding to the data to be processed is any one of a relational data table type and a non-structural text type;
and the entity processing module is used for respectively acquiring each target ontology type corresponding to each entity in the data ontology model of the numerical control machine tool, acquiring an entity incidence relation between each entity based on the ontology incidence relation between each target ontology type, and converting a data format corresponding to each entity and a data format corresponding to each entity incidence relation into the target language format.
CN202210614298.XA 2022-06-01 2022-06-01 Multi-source heterogeneous data processing method and system for numerical control machine tool manufacturing resources Active CN114722159B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210614298.XA CN114722159B (en) 2022-06-01 2022-06-01 Multi-source heterogeneous data processing method and system for numerical control machine tool manufacturing resources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210614298.XA CN114722159B (en) 2022-06-01 2022-06-01 Multi-source heterogeneous data processing method and system for numerical control machine tool manufacturing resources

Publications (2)

Publication Number Publication Date
CN114722159A true CN114722159A (en) 2022-07-08
CN114722159B CN114722159B (en) 2022-08-23

Family

ID=82232445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210614298.XA Active CN114722159B (en) 2022-06-01 2022-06-01 Multi-source heterogeneous data processing method and system for numerical control machine tool manufacturing resources

Country Status (1)

Country Link
CN (1) CN114722159B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116244386A (en) * 2023-02-10 2023-06-09 北京友友天宇系统技术有限公司 Identification method of entity association relation applied to multi-source heterogeneous data storage system

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5345586A (en) * 1992-08-25 1994-09-06 International Business Machines Corporation Method and system for manipulation of distributed heterogeneous data in a data processing system
US20100182918A1 (en) * 2007-08-10 2010-07-22 Laurent Clevy Method and installation for classification of traffic in ip networks
US20170140625A1 (en) * 2012-01-06 2017-05-18 3M Innovative Properties Company Released offender geospatial location information trend analysis
CN109782689A (en) * 2019-01-10 2019-05-21 上海交通大学 A kind of tool management method and system of the numerical control processing based on big data technology
CN109902298A (en) * 2019-02-13 2019-06-18 东北师范大学 Domain Modeling and know-how estimating and measuring method in a kind of adaptive and learning system
CN110187687A (en) * 2019-06-10 2019-08-30 北京航空航天大学 The multi-source heterogeneous data fusion method in manufacturing shop and system based on Complex event processing
CN110489395A (en) * 2019-07-27 2019-11-22 西南电子技术研究所(中国电子科技集团公司第十研究所) Automatically the method for multi-source heterogeneous data knowledge is obtained
CN110598005A (en) * 2019-09-06 2019-12-20 中科院合肥技术创新工程院 Public safety event-oriented multi-source heterogeneous data knowledge graph construction method
CN111241837A (en) * 2020-01-04 2020-06-05 大连理工大学 Theft case legal document named entity identification method based on anti-migration learning
CN111444721A (en) * 2020-05-27 2020-07-24 南京大学 Chinese text key information extraction method based on pre-training language model
CN113009885A (en) * 2019-12-20 2021-06-22 中国科学院沈阳计算技术研究所有限公司 Digital mapping system and method for safety state of numerical control system
CN114371883A (en) * 2021-12-29 2022-04-19 天翼物联科技有限公司 Construction method and calling system of compound model of Internet of things

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5345586A (en) * 1992-08-25 1994-09-06 International Business Machines Corporation Method and system for manipulation of distributed heterogeneous data in a data processing system
US20100182918A1 (en) * 2007-08-10 2010-07-22 Laurent Clevy Method and installation for classification of traffic in ip networks
US20170140625A1 (en) * 2012-01-06 2017-05-18 3M Innovative Properties Company Released offender geospatial location information trend analysis
CN109782689A (en) * 2019-01-10 2019-05-21 上海交通大学 A kind of tool management method and system of the numerical control processing based on big data technology
CN109902298A (en) * 2019-02-13 2019-06-18 东北师范大学 Domain Modeling and know-how estimating and measuring method in a kind of adaptive and learning system
CN110187687A (en) * 2019-06-10 2019-08-30 北京航空航天大学 The multi-source heterogeneous data fusion method in manufacturing shop and system based on Complex event processing
CN110489395A (en) * 2019-07-27 2019-11-22 西南电子技术研究所(中国电子科技集团公司第十研究所) Automatically the method for multi-source heterogeneous data knowledge is obtained
CN110598005A (en) * 2019-09-06 2019-12-20 中科院合肥技术创新工程院 Public safety event-oriented multi-source heterogeneous data knowledge graph construction method
CN113009885A (en) * 2019-12-20 2021-06-22 中国科学院沈阳计算技术研究所有限公司 Digital mapping system and method for safety state of numerical control system
CN111241837A (en) * 2020-01-04 2020-06-05 大连理工大学 Theft case legal document named entity identification method based on anti-migration learning
CN111444721A (en) * 2020-05-27 2020-07-24 南京大学 Chinese text key information extraction method based on pre-training language model
CN114371883A (en) * 2021-12-29 2022-04-19 天翼物联科技有限公司 Construction method and calling system of compound model of Internet of things

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
PENGPENG YE 等: "Rectangular Impulsive Consensus of Multi-agent Systems with Heterogeneous Control Widths", 《2019 12TH ASIAN CONTROL CONFERENCE (ASCC)》 *
Y. HUANG 等: "Interactive Strategy for Adaptive Belt Grinding Heterogeneous Data for an Aero-Engine Blade", 《IEEE ACCESS》 *
姜毅: "基于XML的异质数据库数据共享与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王亚辉 等: "云架构下基于标准语义模型和复杂事件处理的制造车间数据采集与融合", 《计算机集成制造系统》 *
王美清 等: "面向数控加工过程智能管控的多源异构数据管理方法", 《航空制造技术》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116244386A (en) * 2023-02-10 2023-06-09 北京友友天宇系统技术有限公司 Identification method of entity association relation applied to multi-source heterogeneous data storage system

Also Published As

Publication number Publication date
CN114722159B (en) 2022-08-23

Similar Documents

Publication Publication Date Title
CN110298032B (en) Text classification corpus labeling training system
CN111428054B (en) Construction and storage method of knowledge graph in network space security field
CN110825882B (en) Knowledge graph-based information system management method
CN104361127B (en) The multilingual quick constructive method of question and answer interface based on domain body and template logic
CN107562919B (en) Multi-index integrated software component retrieval method and system based on information retrieval
CN101710343A (en) Body automatic build system and method based on text mining
CN110287482B (en) Semi-automatic participle corpus labeling training device
CN111325029A (en) Text similarity calculation method based on deep learning integration model
CN110597844B (en) Unified access method for heterogeneous database data and related equipment
CN112199512B (en) Scientific and technological service-oriented case map construction method, device, equipment and storage medium
CN111274817A (en) Intelligent software cost measurement method based on natural language processing technology
CN113032418B (en) Method for converting complex natural language query into SQL (structured query language) based on tree model
CN114625748A (en) SQL query statement generation method and device, electronic equipment and readable storage medium
CN114722159B (en) Multi-source heterogeneous data processing method and system for numerical control machine tool manufacturing resources
CN111400449A (en) Regular expression extraction method and device
CN111178080A (en) Named entity identification method and system based on structured information
CN117648093A (en) RPA flow automatic generation method based on large model and self-customized demand template
CN114239828A (en) Supply chain affair map construction method based on causal relationship
CN112541070A (en) Method and device for excavating slot position updating corpus, electronic equipment and storage medium
CN112183110A (en) Artificial intelligence data application system and application method based on data center
CN113157887A (en) Knowledge question-answering intention identification method and device and computer equipment
CN112307767A (en) Bi-LSTM technology-based regulation and control knowledge modeling method
Zhang et al. A knowledge reuse-based computer-aided fixture design framework
CN111831624A (en) Data table creating method and device, computer equipment and storage medium
CN111460114A (en) Retrieval method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant