CN116362245A - OPC UA information model construction method based on unstructured text data - Google Patents

OPC UA information model construction method based on unstructured text data Download PDF

Info

Publication number
CN116362245A
CN116362245A CN202211656684.1A CN202211656684A CN116362245A CN 116362245 A CN116362245 A CN 116362245A CN 202211656684 A CN202211656684 A CN 202211656684A CN 116362245 A CN116362245 A CN 116362245A
Authority
CN
China
Prior art keywords
model
information
text
opc
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211656684.1A
Other languages
Chinese (zh)
Inventor
刘洋
史治国
贺诗波
顾超杰
陈彩莲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202211656684.1A priority Critical patent/CN116362245A/en
Publication of CN116362245A publication Critical patent/CN116362245A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an OPC UA information model construction method based on unstructured text data, which mainly solves the problems that the modeling of the existing OPC UA information model often depends on structured data and lacks description information from the text data, and the like, and comprises the following implementation steps: designing a labeling label which is based on corpus text characteristics and meets the industrial field modeling requirement, and designing the organization relation of the label; performing BIO serialization labeling on the extracted sample text material to obtain a training sample; constructing a Chinese pre-training model based on the BERT model to extract word vectors; based on CRF technology, processing, classifying and labeling word vectors; and extracting effective entities in the text by using the trained model, and organizing the model according to a predefined organization rule. The invention constructs the industrial unstructured text sample data, designs the entity tag extraction method using the text corpus and is used for constructing the information model, widens the application scene of the information model construction method, has simple and clear operation method and has practical value.

Description

OPC UA information model construction method based on unstructured text data
Technical Field
The invention belongs to the technical field of industrial automation, in particular relates to a construction method of an OPC UA information model, and particularly relates to a construction method of an OPC UA information model based on unstructured text data, which can be used for industrial field environment intelligent model abstract construction to realize industrial intellectualization.
Background
Digital transformation of production line equipment is becoming increasingly necessary. OPC is a data security exchange interoperability standard applied in the automation industry and other industries, and is commonly formulated by industry suppliers, end users and software developers. These specifications define interfaces between clients and servers and between servers, such as accessing real-time data, monitoring alarms and events, accessing historical data and other applications, etc. OPC standards were first released in 1996, and the purpose of the OPC standards is to abstract a specific protocol (e.g., modbus, profibus, etc.) of a programmable logic controller (programmable logic controller, PLC) into a standardized interface, and as a "man-in-the-middle" role, convert a generic OPC "read-write" request into a specific device protocol to interface directly with an HMI/SCADA system, and vice versa. In this regard, a complete product industry exists by which end users can optimize products to achieve seamless interaction of the system via OPC protocols. Currently, an OPC unified architecture (OPC UA) information model has been used as an effective method for constructing digital images of physical devices in various industrial fields, so that sensing and control through virtual copies can be achieved.
Since the OPC UA information model is object-oriented, a common method of constructing the information model is to define nodes manually by experienced engineers themselves. Because their knowledge of the entire product line is known only by the nature of the equipment on the production line. Furthermore, the use of model building tools is also an option. There have also been some studies attempting to build models by mapping other format models to OPC UA information models. Still other studies utilize industrial databases of other models of industrial information to build information models.
However, all of the conventional approaches described above ignore the use of unstructured text information that is abundant in the industry. Existing research is always focused on structured data, such as knowledge maps, other models, etc., when constructing a design feasible solution for an information model. While the presence of some redundant information in the text of valuable unstructured data (e.g., text) may interfere with the extraction of useful information. It is therefore important to find a novel way to obtain our desired information from complex text. In this process, it is also necessary to design physical structure rules with a priori knowledge. At present, the research of various domestic enterprises and research institutions on the OPC UA information model modeling technology is still insufficient and is in a relatively primary stage, and no mature application solution of the OPC UA information model modeling technology can be used for transformation of industrial Internet of things, and particularly, the research on the establishment of corpus sources on the OPC UA information model is lacking.
In addition, in practical applications, it is difficult for engineers who build information models to grasp all industrial equipment mechanisms, and when there is an unknown equipment, it is often difficult to obtain accurate model information. There is a need for a highly versatile, easy to practice information model building method.
In the face of the situation, the application research of the method for constructing the OPC UA information model based on the unstructured text has great research significance, can well promote the intelligent transformation progress of the industrial field, and reduces the intelligent transformation difficulty of the industrial field.
Disclosure of Invention
The present invention focuses on the text of the field device and proposes an information model building process that can make full use of text data that was not used in previous processes. The invention uses the BERT model in combination with the conditional random field CRF to extract the tagged entities from the original text and constructs an information model from the extracted entities using predefined tagged relationships. According to the invention, a text entity identification method is realized in the industrial field, and a sample set of industrial sequence marks is established to fine tune the BERT model, so that a good extraction result is obtained. The invention designs a new information model construction method: the text is analyzed using the language processing model BERT and entities are extracted from the text by running sequence tags. The entities are organized in a predetermined manner by predefined tag relationships to meet OPC UA criteria.
The aim of the invention is realized by the following technical scheme: an OPC UA information model construction method based on unstructured text data comprises the following steps:
(1) Designing a labeling label which is based on corpus text characteristics and meets the requirement of industrial field modeling, and designing the organization relation of the labels, namely using description labels of different levels, wherein the highest level is a model description object, and the lower level is the attribute and attribute value of the object;
(2) Performing BIO serialization labeling on sample text materials extracted from the equipment description document based on description tags of different levels to obtain training samples;
(3) Constructing a Chinese pre-training model based on the BERT model, and extracting word vectors from the sample text marked in the step (2);
(4) Based on CRF technology, processing, classifying and labeling the word vector extracted in the step (3);
(5) Training the text processing model combining BERT and CRF by using the training sample obtained in the step (2);
(6) Extracting effective entities in the text by using the trained text processing model, and performing model organization according to a predefined organization rule, namely extracting entity texts in labels by using the text processing model, performing entity organization according to a predefined label relation, and finally constructing an object-oriented OPC UA information model aiming at related entity labels.
Further, in step (1), the information model node of the object-oriented OPC UA is abstracted into an entity label, and the entity label relationship is defined according to the original organization structure of the information model node, and then the model can be built according to the predefined relationship.
Further, in step (1), for the information model of OPC UA, a label containing three levels of objects, object attribute characteristics and attribute values is designed, specifically, four types of labels including model description objects OBJ, component COM, attribute ATT and attribute values VAL are included.
Further, labeling of the BIO labeling sample data described in the step (2) on the industrial field data is specifically as follows:
(1) performing BIO labeling on unstructured text data in the industrial field by using a label conforming to modeling requirements in the industrial field;
(2) the label self considers the label organization relation of the information model, defines the label relation meeting the structural requirement of the OPC UA information model, and the model label can directly correspond to the object, the node and the variable of the OPC UA information model according to three layers of designs of the object, the attribute characteristic and the related attribute value.
Further, in the step (2), the text material adopts the designed model to describe four types of labels, namely an object OBJ, a component COM, an attribute ATT and an attribute value VAL, to label the text material one by one, redundant words of non-target information are divided into O labels, first words of target information are labeled as B, and subsequent words are labeled as I.
Further, in the step (2), the original text is further enhanced after the labeling is completed: translating the labeled text by a method for translating the text containing the target information, and inserting the translated text into a proper position in the original text according to the distribution condition of the O label, wherein the inserting method comprises the following steps: the method comprises the steps that through calculating the occurrence frequency of an O label in a certain paragraph, if the frequency value exceeds a set threshold value, inserting a translation text to be inserted into the current paragraph position; after all paragraph traversals are completed, if there is still a translation residue, the residue is inserted at the end of the whole sample.
Further, in the steps (3) and (4), the label labeling and entity extraction method is applied to the training process of a small number of samples:
(1) the BERT model is a pre-training model, the hidden information extraction of Chinese text information is realized by utilizing large-scale operation resources in the pre-training process, a fine adjustment method is adopted in the research during the careless training, the training of a small number of rounds (3 to 6 rounds) is carried out to realize the migration of the model, and the requirements of industrial sites and actual operation are met;
(2) the BERT model can better utilize the context information of the text, cover part of words and predict the covered words by using a covering method to obtain a model containing rich context information;
(3) and (3) processing word vector information output by the BERT model by using the CRF, outputting a serialization labeling result, realizing the optimization of the whole sequence, and meeting the requirement of the industrial scene on the whole performance.
Further, in the step (6), after the trained model is obtained, inputting a text which actually appears in the industrial production environment to be processed, and correspondingly processing the text to obtain keywords, namely entities, in the text; on the basis, the construction of the OPC UA information model is finally completed by utilizing a predefined relationship organization model structure among entities.
Further, in the step (6), after the label is obtained, mapping the label into an abstract model according to the organization relation; the abstract model itself does not contain any specification information, and its structure only contains an object-oriented hierarchical structure, specifically: words with labels of OBJ are regarded as objects of model description, are positioned at the top layer of an abstract model and are the main body of the model, and when a plurality of words with labels of OBJ exist in a language segment describing the same equipment, only one of the words is used as the model description object; words labeled COM are part of the model, in which they are located around and connected to the model body-model description object, representing the relationship between them as containing and belonging to; words labeled ATT are descriptions of the model reflecting the performance of the model, and these words also exist around and connect with the descriptive object, but the connection relationship is that of the attribute; finally, VAL-labeled words are numerical values that are materialized in describing the model, and appear in the abstract model at locations around and connected to ATT-labeled words.
Further, in the step (6), the nodes in the OPC UA information model are actually described in an XML format, and the file writing mode is subjected to strict definition requirements, so that the abstract model determined by the structural framework needs to be converted into a model file meeting the definition requirements through a certain corresponding method, and the conversion from the object model to the node model is realized; the corresponding method of the abstract model and the OPC UA information model is as follows: the UA ObjectType element represents an object type node, corresponds to a main body of the abstract model and a component part of the abstract model, wherein the OPC UA node reference information of the abstract model also comprises a connection relation between the UA node reference information and the model main body; UA Variable represents Variable nodes and corresponds to model description in an abstract model; the specific value of the UA Variable node is determined by the specific value when describing the model; and constructing an OPC UA information model in an XML format according to the corresponding method.
Compared with the prior art, the invention has the following advantages:
1. the invention fully utilizes unstructured text data in an industrial environment and widens the material range available for modeling of the information model. Existing research is always focused on structured data, such as knowledge maps, other models, etc., and valuable unstructured data (such as text) is often discarded due to the presence of some redundant information. The present invention finds a novel approach-entity extraction to obtain the desired information from complex text and reorganize the information obtained from the text to form a structured model.
2. According to the invention, an industrial data entity extraction sample is designed and constructed, and the collected original data is marked by adopting a BIO marking method in combination with a label required by an information model, so that a reproducible road is created for the text utilization of an industrial scene.
3. The invention designs a model construction method based on BERT and CRF under the scene of information model construction in industry, fully considers the characteristic that only small samples can be used in the industrial environment, realizes word segmentation and accurate word vectorization of input text by a Chinese pre-training model, and avoids the problems of word vector information loss, inaccurate serialization labeling and the like caused by too few samples. In addition, the invention is based on the characteristics of BERT and CRF model design, greatly reduces the training consumption in practical application, makes the training not depend on the support of large-scale operation equipment any more, and has important promotion on the universality of the method.
Drawings
Fig. 1 is a general flow diagram of the present invention.
FIG. 2 is a schematic representation of BIO annotation of sample text data according to the present invention.
FIG. 3 is a schematic diagram of the information extraction process BERT+CRF model of the present invention.
Fig. 4 is a schematic diagram of an OPC UA information model for a simulation example of the present invention.
Detailed Description
The technical scheme and effect of the present invention will be further described in detail below with reference to the accompanying drawings.
Before introducing the method for constructing the OPC UA information model based on unstructured text data, the concepts of the entity extraction neural network model BERT+CRF and the OPC UA information model are briefly introduced.
BERT (Bidirectional Encoder Representations from Transformers) the pre-training model is a pre-training model framework proposed by google research team, which has been pre-trained by the team in a huge corpus to obtain better parameters of the codec neural network in the model. In addition, the BERT model adopts a Mask (Mask) training method, a part of words in a training text are masked in advance, and then the words are input into the model to perform vocabulary vector representation, so that an effect similar to the operation of complete filling is formed, and a better effect which cannot be achieved by a two-way long and short term memory neural network (Bidirectional Long-Short Term Memory, biLSTM) in the past is obtained. The CRF is then a serialization annotation algorithm (sequence labeling algorithm) that receives an input sequence and outputs target sequence annotations, also known as a seq2seq model. For example, in the present invention, the input sequence recognized by the named entity is a string of words, and the output sequence is the corresponding entity tag, namely an Object (OBJ), a Component (COM), an Attribute (ATT) and an attribute Value (VAL).
As a key technology for applying OPC-UA, information modeling has been one of the key points in academic research. The OPC-UA information model adopts an object-oriented technology to describe equipment objects, defines attribute variables, operation methods and relations with other objects of equipment, can more effectively show the semantics of data and promote interoperability. The OPC-UA information model is divided into a plurality of nodes such as an object, an object type, a variable type and the like, and can be respectively classified into three layers of model description object, object attribute characteristics and attribute values according to the layers.
In the application aspect of OPC UA information model modeling technology, especially in the information model construction technology facing industrial production scenes, structured data is often used as modeling material sources, and a series of challenges such as single modeling means, complex modeling method, low industrial information utilization rate and the like are faced. In order to solve the above challenges, the present invention proposes a method for constructing an OPC UA information model based on unstructured text data by creating a sample for labeling small-scale industrial data and performing semantic analysis using a pre-training model, and referring to fig. 1, the implementation steps of the present invention are as follows:
step one: the method comprises the steps of designing a label which is based on corpus text characteristics, meets the requirement of industrial field modeling and is reasonable, and designing the organization relation of the label, namely using description labels of different levels, wherein the highest level is a model description object, and the lower level is the attribute and attribute value of the object. For the OPC UA information model, a label including three layers of an object, an object attribute characteristic and an attribute value needs to be designed, and for the example of a mechanical arm device, there are four types of labels of a model description Object (OBJ), a Component (COM), an Attribute (ATT) and an attribute Value (VAL). The label creation of the model description Object (OBJ) is based on the following considerations: a certain document is only for one specific model, and all the words of the OBJ class may correspond to the same model description object. On this basis, the model description object can be abstracted to a certain overall device class, so that the class information completely corresponds to the object node type in the OPC UA information model. A Component (COM) is also an object in nature, but because of its lower actual hierarchy, this type of entity is combined with a composition relationship. The Attributes (ATT) and the attribute Values (VAL) are used to describe the characteristics of the data aspect that the mechanical device presents as a physical object. The label labeling system comprises information of a plurality of layers of objects, object attribute characteristics and attribute values, wherein the highest layer is a model description Object (OBJ), the next layer is a Component (COM) and Attribute (ATT) for describing the object attribute characteristics, and the last layer is an attribute Value (VAL).
Step two: and BIO (Begin Inside Outside) serialization labeling is carried out on sample text materials extracted from the device description document (which can be in an HTML or PDF format) based on description tags of different levels, so that a training sample is obtained. Industrial sites and mechanical devices themselves contain much unstructured text data, for example, mechanical arm devices often provide device specifications for parametric performance descriptions, and related descriptions and introductions are also available in operating software. From these document materials, sample text material can be extracted more conveniently. The text material is marked one by adopting four types of labels designed in the first step, redundant words of non-target information are divided into O labels, the first word of target information is marked as B, the subsequent word is marked as I, for example, the maximum load is marked as B-ATT (used for marking the first "most") and I-ATT (used for marking the subsequent "heavy load") respectively. And after the labeling is finished, the original text is further enhanced: translating the labeled text by a method for translating the text containing the target information, and inserting the translated text into a proper position in the original text according to the distribution condition of the O label, wherein the inserting method comprises the following steps: the method comprises the steps that through calculating the occurrence frequency of an O label in a certain paragraph, if the frequency value exceeds a set threshold value, inserting a translation text to be inserted into the current paragraph position; after all paragraph traversals are completed, if there is still a translation residue, the residue is inserted at the end of the whole sample.
Step three: constructing a Chinese pre-training model based on a BERT (Bidirectional Encoder Representations from Transformers) model, and extracting word vectors from the sample text marked in the step two: the Chinese pre-training model provided by google is utilized to convert text into word vectors, subsequent model training is carried out, context information is better extracted, and the accuracy of subsequent label classification is improved. Word segmentation and word vectorization can be better achieved using bi-directional encoder representation in BERT. The BERT model itself contains implicit information of chinese text to process the input text, and also obtains context information of the input text.
Step four: based on CRF (conditional random field) technology, word vectors are processed, classified and labeled: and (3) carrying out serialization labeling on the word vectors obtained in the step (III) through CRF, and using CRF to process the word vectors can better utilize rule information such as grammar and the like. After the word vector output by the BERT model is subjected to the serialization labeling of the CRF layer, a target entity with a corresponding label of each word can be obtained, and the label category comprises an Object (OBJ), a Component (COM), an Attribute (ATT) and an attribute Value (VAL).
Step five: and training the models (BERT+CRF) in the third step and the fourth step by using the training sample obtained in the second step to achieve higher accuracy, and subsequently, performing actual labeling and entity recognition on unlabeled texts. The training process is deployed under cloud GPU or local GPU computing environments: by writing the python program and the corresponding operation script, the sample database is called, the operation model is trained in the Linux environment, and the BERT+CRF model can achieve better identification accuracy through 3 to 6 rounds of fine adjustment.
Step six: extracting effective entities in the text by using the trained model, and organizing the model according to a predefined organization rule, namely extracting entity texts in the labels by using the model in the step five, and organizing the entities according to a predefined label relation. And finally constructing an object-oriented OPC UA information model aiming at the related entity labels.
After the labels are obtained, they can be mapped into abstract models according to their organization. The abstract model itself does not contain any specification information, and only contains an object-oriented hierarchical structure in its structure. Specifically: words with labels of OBJ are regarded as objects of model description, are positioned at the top layer of an abstract model and are the main body of the model, and when a plurality of words with labels of OBJ exist in a language segment describing the same equipment, only one of the words is used as the model description object; words labeled COM are components of the model, and in the abstract model, the words are located around and connected with the model main body, namely the model description object, and represent the relationship between the words and the model description object; words with ATT labels are descriptions of models, reflect the performance of the models, are similar to words marked by COM, and exist around the descriptive object and are connected with the descriptive object, but the connected relation is a relation of attributes, and not comprises a relation with the attribute; finally, VAL-labeled words are numerical values that are materialized in describing the model, and therefore, in the abstract model, appear at locations around and in connection with ATT-labeled words.
The nodes in the OPC UA information model are actually described in an XML format, and the file writing mode is strictly defined, so that the abstract model determined by the structural framework needs to be converted into a model file meeting the definition requirements through a certain corresponding method, and the conversion from the object model to the node model is realized.
The corresponding method of the abstract model and the OPC UA information model is as follows: the UA ObjectType element represents an object type node, corresponds to a main body of the abstract model and a component part of the abstract model, wherein the OPC UA node reference information of the abstract model also comprises a connection relation between the UA node reference information and the model main body; UA Variable represents Variable nodes and corresponds to model description in an abstract model; the specific value of the UA Variable node is determined by the value specified when describing the model.
According to the corresponding method, an OPC UA information model in an XML format can be constructed.
On one hand, the invention fully utilizes the advantages of processing the text based on the language processing neural network model, and on the premise of fully utilizing the industrial unstructured data, realizes that the entity extraction method is used for acquiring the wanted information from the complex text, and reorganizes the information acquired from the text to form the structured model, so that compared with the traditional method, the method can utilize the unstructured data existing in large quantity on the industrial site to acquire more information; on the other hand, the method utilizes a self-built small sample corpus database in an industrial environment, and utilizes the pre-training model to realize word segmentation and accurate word vectorization of the input text based on small samples and small training expenses according to the characteristics of Chinese texts, so that the problems of word vector information deletion, inaccurate serialization labeling and the like caused by too few samples are avoided.
The effects of the proposed method are further described below in connection with simulation examples.
Simulation example 1:
the text content of the robot arm device will be taken as an example for the labeling and subsequent processing. Fig. 2 shows a labeling condition of mechanical arm equipment description information obtained by extracting equipment information disclosed on a network, and a user can more simply migrate the method to other industrial text data. As can be seen from fig. 2, the unstructured text data of the industrial scene contains more information, needs to be subjected to more detailed cleaning labeling, and finally, the labeled entity comprises four types of labels of an Object (OBJ), a Component (COM), an Attribute (ATT) and an attribute Value (VAL) and an information-free label (O), wherein the label is further subdivided into B and I.
The information model construction also requires the establishment of a text processing model, namely a bert+crf model, and fig. 3 shows a neural network information extraction model used in the present invention. The bottom up of the model structure is respectively:
information input and word segmentation, wherein the word segmentation of Chinese is directly divided according to characters;
a BERT model formed by taking a bidirectional converter as a cell, wherein the model is divided into an encoder part and a decoder part, and the encoder part and the decoder part are both formed by the bidirectional converter;
and (3) a CRF sequence labeling model is used for processing word vectors output by BERT, and processing the word vectors according to the condition of conditional probability distribution so as to realize the conversion process from the vectors to the labels.
Model training is performed under a cloud environment provided by GOOGLE, firstly, scripts are written, training rounds and positions where output neural network model parameters are stored are defined, how the scripts call python files of the models to perform operation are defined, then the scripts are performed on the cloud of the Linux environment to perform operation, and finally node files with excellent training results are found out to serve as the neural network model for final use.
Inputting a section of mechanical arm description text in a new industrial scene into a neural network model, and obtaining the text label after operation and labeling, wherein the words labeled as the Object (OBJ), the Component (COM), the Attribute (ATT) and the attribute Value (VAL) are target words used for modeling. The style of the input text after the whole annotation is similar to the sample text data of fig. 2.
And organizing relevant label texts according to the defined node relation, wherein the object is a word marked as an object, and other nodes and the like, so that the construction of the OPC UA information model is finally completed. An example constructed OPC UA information model is shown in fig. 4.
The foregoing is merely a preferred embodiment of the present invention, and the present invention has been disclosed in the above description of the preferred embodiment, but is not limited thereto. Any person skilled in the art can make many possible variations and modifications to the technical solution of the present invention or modifications to equivalent embodiments using the methods and technical contents disclosed above, without departing from the scope of the technical solution of the present invention. Therefore, any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims (10)

1. An OPC UA information model construction method based on unstructured text data is characterized by comprising the following steps:
(1) Designing a labeling label which is based on corpus text characteristics and meets the requirement of industrial field modeling, and designing the organization relation of the labels, namely using description labels of different levels, wherein the highest level is a model description object, and the lower level is the attribute and attribute value of the object;
(2) Performing BIO serialization labeling on sample text materials extracted from the equipment description document based on description tags of different levels to obtain training samples;
(3) Constructing a Chinese pre-training model based on the BERT model, and extracting word vectors from the sample text marked in the step (2);
(4) Based on CRF technology, processing, classifying and labeling the word vector extracted in the step (3);
(5) Training the text processing model combining BERT and CRF by using the training sample obtained in the step (2);
(6) Extracting effective entities in the text by using the trained text processing model, and performing model organization according to a predefined organization rule, namely extracting entity texts in labels by using the text processing model, performing entity organization according to a predefined label relation, and finally constructing an object-oriented OPC UA information model aiming at related entity labels.
2. The method for constructing an OPC UA information model based on unstructured text data according to claim 1, wherein in step (1), an information model node of an object-oriented OPC UA is abstracted to be a label of an entity, and definition of a relationship of the entity label is performed according to an original organization structure of the information model node, and then model construction can be performed according to a predefined relationship.
3. The method according to claim 1, wherein in the step (1), for the information model of OPC UA, tags including three layers of objects, object attribute characteristics, and attribute values are designed, specifically, four types of tags including model description object OBJ, component COM, attribute ATT, and attribute value VAL.
4. The method for constructing an OPC UA information model based on unstructured text data according to claim 1, wherein the marking of the industrial field data by the BIO marking sample data described in the step (2) is specifically as follows:
(1) performing BIO labeling on unstructured text data in the industrial field by using a label conforming to modeling requirements in the industrial field;
(2) the label self considers the label organization relation of the information model, defines the label relation meeting the structural requirement of the OPC UA information model, and the model label can directly correspond to the object, the node and the variable of the OPC UA information model according to three layers of designs of the object, the attribute characteristic and the related attribute value.
5. The method for constructing an OPC UA information model based on unstructured text data according to claim 3, wherein in the step (2), the text material is labeled one by using four types of labels of a designed model description object OBJ, a component COM, an attribute ATT and an attribute value VAL, redundant words of non-target information are divided into O labels, first words of target information are labeled as B, and subsequent words are labeled as I.
6. The method for constructing an OPC UA information model based on unstructured text data according to claim 5, wherein in step (2), the original text is further enhanced after the labeling is completed: translating the labeled text by a method for translating the text containing the target information, and inserting the translated text into a proper position in the original text according to the distribution condition of the O label, wherein the inserting method comprises the following steps: the method comprises the steps that through calculating the occurrence frequency of an O label in a certain paragraph, if the frequency value exceeds a set threshold value, inserting a translation text to be inserted into the current paragraph position; after all paragraph traversals are completed, if there is still a translation residue, the residue is inserted at the end of the whole sample.
7. The method for constructing an OPC UA information model based on unstructured text data according to claim 1, wherein in steps (3) and (4), the label labeling and entity extraction method is applied to a training process of a small number of samples:
(1) the BERT model is a pre-training model, the hidden information extraction of Chinese text information is realized by utilizing large-scale operation resources in the pre-training process, a fine adjustment method is adopted in the research during the careless training, the model migration is realized by training for a small number of rounds, and the industrial site requirements and the actual operation requirements are met;
(2) the BERT model can better utilize the context information of the text, cover part of words and predict the covered words by using a covering method to obtain a model containing rich context information;
(3) and (3) processing word vector information output by the BERT model by using the CRF, outputting a serialization labeling result, realizing the optimization of the whole sequence, and meeting the requirement of the industrial scene on the whole performance.
8. The method for constructing an OPC UA information model based on unstructured text data according to claim 1, wherein in step (6), after the training is completed, a text actually appearing in an industrial production environment to be processed is input, and then the text can be correspondingly processed to obtain keywords, namely entities, in the text; on the basis, the construction of the OPC UA information model is finally completed by utilizing a predefined relationship organization model structure among entities.
9. The method for constructing an OPC UA information model based on unstructured text data according to claim 1, wherein in step (6), after obtaining the tag, it is mapped into an abstract model according to its organization relation; the abstract model itself does not contain any specification information, and its structure only contains an object-oriented hierarchical structure, specifically: words with labels of OBJ are regarded as objects of model description, are positioned at the top layer of an abstract model and are the main body of the model, and when a plurality of words with labels of OBJ exist in a language segment describing the same equipment, only one of the words is used as the model description object; words labeled COM are part of the model, in which they are located around and connected to the model body-model description object, representing the relationship between them as containing and belonging to; words labeled ATT are descriptions of the model reflecting the performance of the model, and these words also exist around and connect with the descriptive object, but the connection relationship is that of the attribute; finally, VAL-labeled words are numerical values that are materialized in describing the model, and appear in the abstract model at locations around and connected to ATT-labeled words.
10. The method for constructing an OPC UA information model based on unstructured text data according to claim 9, wherein in step (6), nodes in the OPC UA information model are actually described in XML format, the file writing mode of which is strictly defined, and an abstract model with a determined structure frame needs to be converted into a model file meeting the defined requirement by a certain corresponding method, so that the conversion from an object model to a node model is realized; the corresponding method of the abstract model and the OPC UA information model is as follows: the UA ObjectType element represents an object type node, corresponds to a main body of the abstract model and a component part of the abstract model, wherein the OPC UA node reference information of the abstract model also comprises a connection relation between the UA node reference information and the model main body; UA Variable represents Variable nodes and corresponds to model description in an abstract model; the specific value of the UA Variable node is determined by the specific value when describing the model; and constructing an OPC UA information model in an XML format according to the corresponding method.
CN202211656684.1A 2022-12-22 2022-12-22 OPC UA information model construction method based on unstructured text data Pending CN116362245A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211656684.1A CN116362245A (en) 2022-12-22 2022-12-22 OPC UA information model construction method based on unstructured text data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211656684.1A CN116362245A (en) 2022-12-22 2022-12-22 OPC UA information model construction method based on unstructured text data

Publications (1)

Publication Number Publication Date
CN116362245A true CN116362245A (en) 2023-06-30

Family

ID=86905794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211656684.1A Pending CN116362245A (en) 2022-12-22 2022-12-22 OPC UA information model construction method based on unstructured text data

Country Status (1)

Country Link
CN (1) CN116362245A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117195110A (en) * 2023-11-07 2023-12-08 湖南大学 OPC_UA node perception self-adaptive priority classification method
CN117252514A (en) * 2023-11-20 2023-12-19 中铁四局集团有限公司 Building material library data processing method based on deep learning and model training

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117195110A (en) * 2023-11-07 2023-12-08 湖南大学 OPC_UA node perception self-adaptive priority classification method
CN117195110B (en) * 2023-11-07 2024-01-26 湖南大学 OPC_UA node perception self-adaptive priority classification method
CN117252514A (en) * 2023-11-20 2023-12-19 中铁四局集团有限公司 Building material library data processing method based on deep learning and model training
CN117252514B (en) * 2023-11-20 2024-01-30 中铁四局集团有限公司 Building material library data processing method based on deep learning and model training

Similar Documents

Publication Publication Date Title
CN112199511B (en) Cross-language multi-source vertical domain knowledge graph construction method
JP7468929B2 (en) How to acquire geographical knowledge
CN104361127B (en) The multilingual quick constructive method of question and answer interface based on domain body and template logic
CN110232186A (en) The knowledge mapping for merging entity description, stratification type and text relation information indicates learning method
CN116362245A (en) OPC UA information model construction method based on unstructured text data
CN109214068A (en) Bottom assembled architecture information extracting method based on BIM
CN109189943B (en) Method for extracting capability knowledge and constructing capability knowledge map
WO2023155303A1 (en) Webpage data extraction method and apparatus, computer device, and storage medium
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
CN113032418A (en) Method for converting complex natural language query into SQL (structured query language) based on tree model
CN114911893A (en) Method and system for automatically constructing knowledge base based on knowledge graph
Liu et al. Cross-media intelligent perception and retrieval analysis application technology based on deep learning education
CN114168615A (en) Method and system for querying SCD (substation configuration description) file of intelligent substation by natural language
CN117292146A (en) Industrial scene-oriented method, system and application method for constructing multi-mode large language model
CN109977514B (en) Method for generating model scheduling sequence of radar synchronous data flow graph
CN116523041A (en) Knowledge graph construction method, retrieval method and system for equipment field and electronic equipment
Huang et al. Combining Deep Learning with Knowledge Graph for Design Knowledge Acquisition in Conceptual Product Design.
CN115964468A (en) Rural information intelligent question-answering method and device based on multilevel template matching
CN111581815B (en) XML-based process model ontology construction method
Yin et al. Data Visualization Analysis Based on Explainable Artificial Intelligence: A Survey
Wu et al. A summary of the latest research on knowledge graph technology
Liao et al. An Automatic and Unified Consistency Verification Rule and Method of SG-CIM Model
Sonje et al. draw2code: AI based Auto Web Page Generation from Hand-drawn Page Mock-up
Yang et al. Service component recommendation based on LSTM
CN117252201B (en) Knowledge-graph-oriented discrete manufacturing industry process data extraction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination