CN113515930B - Heterogeneous device ontology matching method integrating semantic information - Google Patents

Heterogeneous device ontology matching method integrating semantic information Download PDF

Info

Publication number
CN113515930B
CN113515930B CN202110530094.3A CN202110530094A CN113515930B CN 113515930 B CN113515930 B CN 113515930B CN 202110530094 A CN202110530094 A CN 202110530094A CN 113515930 B CN113515930 B CN 113515930B
Authority
CN
China
Prior art keywords
instruction
mapping matrix
instruction fragment
steps
ontology
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110530094.3A
Other languages
Chinese (zh)
Other versions
CN113515930A (en
Inventor
孙海峰
庄子睿
成岱璇
赵津宇
戚琦
王敬宇
李炜
廖建新
王晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202110530094.3A priority Critical patent/CN113515930B/en
Publication of CN113515930A publication Critical patent/CN113515930A/en
Application granted granted Critical
Publication of CN113515930B publication Critical patent/CN113515930B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

A heterogeneous device ontology matching method integrating semantic information comprises the following steps: (1) Instruction fragment set sk 1 And a generic semantic ontology set sk 2 Inputting all elements in the instruction understanding model one by one to generate an instruction intention characterization vector; (2) Screening a small set of instruction fragments sk from a training dataset 1 'for making an exact match dataset F'; (3) calculating a similarity matrix S; (4) calculating a mapping matrix F; (5) calculating an objective function f1; (6) updating the mapping matrix F, and calculating an objective function F2; (7) Cycling steps (2) to (6) for a plurality of times until the mapping matrix F realizes accurate matching sk '' 1 The resulting mapping matrix F will be used to generate an instruction intent dictionary.

Description

Heterogeneous device ontology matching method integrating semantic information
Technical Field
The invention relates to a heterogeneous device ontology matching method fusing semantic information, belongs to the technical field of Internet of things, and particularly belongs to the technical field of heterogeneous device ontology matching.
Background
In the field of internet of things, different devices use different instruction languages to express the same instruction intent, because suppliers intentionally design instruction syntax that is quite different from competitors to increase the switching cost of customers. In addition, the instruction syntax of the device is also strictly protected by the patent. Therefore, there is no clear one-to-one correspondence of instruction statements between different vendors, and even the same terms have different expressions. This results in an extremely difficult management of the network when heterogeneous devices are included in the network. Therefore, how to match the instruction fragments of different devices with a general instruction meaning set (namely, semantic ontology) reduces the difficulty of managing heterogeneous device networks, and becomes a technical problem which needs to be solved in the technical field of the Internet of things at present.
Disclosure of Invention
In view of the above, the invention aims to invent a method for realizing the aim of matching the instruction segment of the internet of things equipment with the general semantic ontology based on a deep learning model. In order to achieve the purpose, the invention provides a heterogeneous equipment body matching method for fusing semantic information, which is used for matching instruction fragments of Internet of things equipment with a general semantic body; the semantic ontology refers to a general instruction meaning set, and the instruction fragments are specific instructions when the Internet of things equipment executes an intention; provided with N 1 Instruction fragment set of individual elements
Figure BDA0003067273180000011
And has N 2 Personal element general semantic ontology set->
Figure BDA0003067273180000012
Figure BDA0003067273180000013
The method finds sk 1 And sk 2 Pairs of elements with the same intent in between, each element can only be matched once;
the method specifically comprises the following operation steps:
step S100, instruction fragment set sk 1 And a generic semantic ontology set sk 2 Inputting all elements in the instruction understanding model one by one, and generating respective instruction intention characterization vectors; the input instruction understanding model comprises an intention description encoder and an instruction fragment encoderA coder;
step S200, screening a small instruction fragment set sk from the whole training data set 1 'for making an exact match dataset F';
step S300, calculating a similarity matrix based on the instruction intention characterization vector
Figure BDA0003067273180000021
Step S400, calculating a mapping matrix based on the similarity matrix S
Figure BDA0003067273180000022
Step S500, calculating an objective function F1 based on the similarity matrix S and the mapping matrix F, wherein the calculation result is used for back propagation to update the mapping matrix F;
step S600, updating the corresponding part of the mapping matrix F by the accurate matching data set F', calculating an objective function F2, and using the calculation result for back propagation to update the instruction understanding model;
step S700, repeatedly cycling steps S200 to S600 until the mapping matrix F realizes accurate matching sk' 1 The method comprises the steps of carrying out a first treatment on the surface of the The resulting mapping matrix F will be used to generate an instruction intent dictionary.
The step S100 includes the following sub-steps:
step S110, for the instruction fragment set sk 1 And a generic semantic ontology set sk 2 Inputting descriptive text content and instruction content in the element into an intention description encoder and an instruction fragment encoder respectively;
step S120, concatenating the encoding results of the two encoders into a vector, as an instruction intent representation vector of the element.
The step S200 specifically includes the following operation substeps:
step S210, calculating the instruction fragment set according to the following formula
Figure BDA0003067273180000023
Each element of (3)Information entropy E (p) i ):
Figure BDA0003067273180000024
Wherein p is i,j Is instruction fragment p i The frequency of occurrence of word j in the whole instruction fragment set;
step S220, calculating the instruction fragment set sk according to the following formula 1 All elements in the list and the universal semantic ontology set sk 2 Redundancy of all elements in (a):
R i,j =max(0,cos(p i ,q j ))
in the above formula, cos (p i ,q j ) Representing the instruction fragment set sk 1 The medium element p i With the generic semantic ontology set sk 2 Medium element q j The instruction intents of the two represent cosine distances between vectors;
step S230, using the ratio of the information entropy to the redundancy as the quantized sample selection value index RE, the calculation formula is as follows:
Figure BDA0003067273180000031
wherein B is a collection of data;
step S240, selecting a partial data sk according to the quantized sample selection value index RE 1 ' an exact match dataset F ' is made, which dataset F ' contains sk 1 ' exact match results for each instruction fragment in.
The step S300 includes the following sub-steps:
step S310, calculating the instruction fragment set sk 1 Instruction intent token vector and generic semantic ontology set sk for each element in a library 2 The instruction intent of each element of (a) characterizes the Euclidean distance between vectors;
step S320, all calculation results form a similarity matrix
Figure BDA0003067273180000032
Wherein element s i,j E S represents the instruction fragment set sk 1 Element p of (a) i And general semantic ontology set sk 2 Medium element q j Euclidean distance between them.
The specific content of the step S400 is as follows: the optimal mapping matrix F is solved using the classical algorithm sink horn in the optimal transmission theory, sink horn algorithm defining a mapping matrix f=diag (u) ·k·diag (v), where K: =e -λS Wherein λ is a super parameter, and is used to control the difference between the elements of the mapping matrix F, where the larger λ is, the smaller the difference is; s is the similarity matrix, diag (u) and diag (v) respectively refer to the square matrix with vector u and vector v as diagonal lines and the rest of 0. The vector u and the vector v are calculated based on an iterative mode, and initialized to be all 1 vectors, and the calculation method comprises the following steps:
Figure BDA0003067273180000033
Figure BDA0003067273180000034
wherein the method comprises the steps of
Figure BDA0003067273180000035
The term-wise division of vector elements is represented, and a and b respectively represent normalized vectors of the similarity matrix S calculated according to rows and columns; the sink horn solving process will iteratively solve F until convergence or maximum number of steps is reached; mapping matrix
Figure BDA0003067273180000036
For element f i,j E F, if the instruction fragment set sk 1 Element p of (a) i And general semantic ontology set sk 2 Medium element q j Should align then f i,j =1, otherwise is f i,j =0。
The calculation formula of the objective function f1 in the step S500 is as follows:
Figure BDA0003067273180000041
wherein the first item
Figure BDA0003067273180000042
In order to minimize the sum of the Euclidean distances between the matched segments; second term λ (-f) i,j logf i,j ) Is the additional entropy regularization target, so even f i,j The non-integer is: f (f) i,j ∈[0,1]It is still possible to make its value as close to 0 or 1 as possible.
The step S600 includes the following sub-steps:
step S610, updating the mapping matrix F with the exact matching result in the exact matching data F', and comparing the corresponding F in the mapping matrix F i,j Setting 1, and setting 0 for other tags;
step S620, a triplet is constructed for each instruction fragment, i.e. for instruction fragment p i Construct triplet (p i ,q pos ,q neg ) Wherein p is i And q pos Are alignable positive samples obtained based on the mapping matrix F, and q neg Is from the generic semantic ontology set sk 2 Sampled negative samples, i.e. from the generic semantic ontology set sk 2 Is randomly extracted a non-q pos Fragments;
step S630, based on the triples, by minimizing the target f 2 Training the intention description encoder and the instruction fragment encoder simultaneously, the objective function f 2 The calculation formula of (2) is as follows:
Figure BDA0003067273180000043
where dis (·) is a measure of similarity between vector representations, which can be calculated using euclidean distance.
Step S640, updating the instruction understanding model according to the calculation result.
The instruction intention dictionary in step S700 contains a one-to-one correspondence between general instructions in the semantic ontology and specific instruction fragments of a single internet of things device.
The intent description encoder is constructed from a Transformer layer; each transducer layer comprises a self-attention layer and a feedforward neural network layer, and the attention mechanism of the self-attention layer is utilized to automatically learn the characteristic with resolution so as to meet the training target; the input is a sequence of descriptors, and a placeholder "[ cls ]" is added at the forefront of the sequence; for each word input, the output is a vectorized representation thereof obtained by the graph description encoder; vector representations of "[ cls ]" are used as descriptive characterizations;
the instruction fragment encoder uses the same settings and operations as the intent description encoder, the only difference being that the instruction fragment content cannot use a pre-trained language model; the sentence contains many special characters and abbreviations and therefore cannot be parsed directly using a pre-trained language model.
The invention has the beneficial effects that: according to the invention, the instruction fragments of various heterogeneous devices are matched with the general semantic ontology by taking the intention description as a benchmark, so that the difficulty in managing heterogeneous device networks is reduced; according to the invention, through training the instruction understanding model, the similarity of vector representations among elements which can be matched is increased, and meanwhile, the similarity of element vector representations which cannot be matched is reduced; the method is used for manufacturing the matching data set by screening a small amount of data, so that the manufacturing difficulty is reduced, and the matching performance of the model is ensured.
Drawings
FIG. 1 is a flow chart of a heterogeneous device ontology matching method fusing semantic information;
FIG. 2 is a graph comparing experimental results of the influence of different sample selection methods on the matching accuracy in the embodiment of the present invention;
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.
Referring to fig. 1, the invention provides a heterogeneous device ontology matching method for fusing semantic information, which matches instruction fragments of internet of things devices with a general semantic ontology; the semantic ontology refers to a general instruction meaning set, and the instruction fragments are specific instructions when the Internet of things equipment executes an intention; provided with N 1 Instruction fragment set of individual elements
Figure BDA0003067273180000051
And has N 2 Personal element general semantic ontology set->
Figure BDA0003067273180000052
The method finds sk 1 And sk 2 Pairs of elements with the same intent in between, each element can only be matched once; table 1 is one example of a set of instruction fragments and a set of generic semantic ontologies.
TABLE 1
Figure BDA0003067273180000053
Figure BDA0003067273180000061
The method specifically comprises the following operation steps:
step S100, instruction fragment set sk 1 And a generic semantic ontology set sk 2 Inputting all elements in the instruction understanding model one by one, and generating respective instruction intention characterization vectors; the input instruction understanding model comprises an intention description encoder and an instruction fragment encoder;
step S200, screening a small instruction fragment set sk from the whole training data set 1 'for making an exact match dataset F'; the implementation details are as follows: sample data of B batches are extracted from all instruction fragments, B is set to 10, that is, we extract only 10 instruction fragments forManufacturing an accurate matching data set; according to candidate configuration matches in the generic configuration tree (common config tree, CCT), a generic semantic ontology is found that matches each instruction fragment.
Step S300, calculating a similarity matrix based on the instruction intention characterization vector
Figure BDA0003067273180000062
Step S400, calculating a mapping matrix based on the similarity matrix S
Figure BDA0003067273180000063
Step S500, calculating an objective function F1 based on the similarity matrix S and the mapping matrix F, wherein the calculation result is used for back propagation to update the mapping matrix F;
step S600, updating the corresponding part of the mapping matrix F by the accurate matching data set F', calculating an objective function F2, and using the calculation result for back propagation to update the instruction understanding model;
step S700, repeatedly cycling steps S200 to S600 until the mapping matrix F realizes accurate matching sk' 1 The method comprises the steps of carrying out a first treatment on the surface of the The resulting mapping matrix F will be used to generate an instruction intent dictionary.
The step S100 includes the following sub-steps:
step S110, for the instruction fragment set sk 1 And a generic semantic ontology set sk 2 Inputting descriptive text content and instruction content in the element into an intention description encoder and an instruction fragment encoder respectively;
step S120, concatenating the encoding results of the two encoders into a vector, as an instruction intent representation vector of the element.
The step S200 specifically includes the following operation substeps:
step S210, calculating the instruction fragment set according to the following formula
Figure BDA0003067273180000071
Letter of each element in (a)Entropy E (p) i ):
Figure BDA0003067273180000072
Wherein p is i,j Is instruction fragment p i The frequency of occurrence of word j in the whole instruction fragment set;
step S220, calculating the instruction fragment set sk according to the following formula 1 All elements in the list and the universal semantic ontology set sk 2 Redundancy of all elements in (a):
R i,j =max(0,cos(p i ,q j ))
in the above formula, cos (p i ,q j ) Representing the instruction fragment set sk 1 The medium element p i With the generic semantic ontology set sk 2 Medium element q j The instruction intents of the two represent cosine distances between vectors;
step S230, using the ratio of the information entropy to the redundancy as the quantized sample selection value index RE, the calculation formula is as follows:
Figure BDA0003067273180000073
wherein B is a collection of data;
step S240, selecting a partial data sk according to the quantized sample selection value index RE 1 ' an exact match dataset F ' is made, which dataset F ' contains sk 1 ' exact match results for each instruction fragment in.
The step S300 includes the following sub-steps:
step S310, calculating the instruction fragment set sk 1 Instruction intent token vector and generic semantic ontology set sk for each element in a library 2 The instruction intent of each element of (a) characterizes the Euclidean distance between vectors;
step S320, all calculation results form a similarity matrix
Figure BDA0003067273180000074
Wherein element s i,j E S represents the instruction fragment set sk 1 Element p of (a) i And general semantic ontology set sk 2 Medium element q j Euclidean distance between them.
TABLE 2
Figure BDA0003067273180000075
Figure BDA0003067273180000081
According to the instruction fragments and the general semantic ontology in table 1, the similarity matrix calculated according to the above steps is shown in table 2, the number of rows of the matrix is 10, the number of columns is 10, and the number of columns is equal to the number of semantic ontologies. The element value s of the ith row and jth column in Table 2 i,j Representing the euclidean distance between the ith instruction fragment and the jth semantic ontology. The smallest distance value in each row is indicated in bold.
The specific content of the step S400 is as follows: the optimal mapping matrix F is solved using the classical algorithm sink horn in the optimal transmission theory, sink horn algorithm defining a mapping matrix f=diag (u) ·k·diag (v), where K: =e -λS Wherein λ is a super parameter, and is used to control the difference between the elements of the mapping matrix F, where the larger λ is, the smaller the difference is; the super parameter λ is set to 0.1 in the present embodiment.
S is the similarity matrix, and diag (u) and diag (v) respectively refer to square matrixes with the vector u and the vector v as diagonal lines and the rest being 0. The vector u and the vector v are calculated based on an iterative mode, and initialized to be all 1 vectors, and the calculation method comprises the following steps:
Figure BDA0003067273180000082
Figure BDA0003067273180000083
wherein the method comprises the steps of
Figure BDA0003067273180000084
The term-wise division of vector elements is represented, and a and b respectively represent normalized vectors of the similarity matrix S calculated according to rows and columns; the sink horn solving process will iteratively solve F until convergence or maximum number of steps is reached; mapping matrix
Figure BDA0003067273180000085
For element f i,j E F, if the instruction fragment set sk 1 Element p of (a) i And general semantic ontology set sk 2 Medium element q j Should align then f i,j =1, otherwise is f i,j =0。
The calculation formula of the objective function f1 in the step S500 is as follows:
Figure BDA0003067273180000086
wherein the first item
Figure BDA0003067273180000087
In order to minimize the sum of the Euclidean distances between the matched segments; second term λ (-f) i,j logf i,j ) Is the additional entropy regularization target, so even f i,j The non-integer is: f (f) i,j ∈[0,1]It is still possible to make its value as close to 0 or 1 as possible.
The step S600 includes the following sub-steps:
step S610, updating the mapping matrix F with the exact matching result in the exact matching data F', and comparing the corresponding F in the mapping matrix F i,j Setting 1, and setting 0 for other tags;
step S620, a triplet is constructed for each instruction fragment, i.e. for instruction fragment p i Construct triplet (p i ,q pos ,q neg ) Wherein p is i And q pos Are alignable positive samples obtained based on the mapping matrix F, and q neg Is from the generic semantic ontology set sk 2 Sampled negative samples, i.e. from the generic semantic ontology set sk 2 Is randomly extracted a non-q pos Fragments;
step S630, based on the triples, by minimizing the target f 2 Training the intention description encoder and the instruction fragment encoder simultaneously, the objective function f 2 The calculation formula of (2) is as follows:
Figure BDA0003067273180000091
wherein dis (·) is a similarity measure between vector representations, which can be calculated using euclidean distance;
step S640, updating the instruction understanding model according to the calculation result.
The instruction intention dictionary in step S700 contains a one-to-one correspondence between general instructions in the semantic ontology and specific instruction fragments of a single internet of things device.
Until the mapping matrix F realizes accurate matching sk' 1
TABLE 3 Table 3
0 0 0 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0 0 0
Mapping matrix of the present invention
Figure BDA0003067273180000092
Shows the matching relation between the instruction fragment and the general semantic ontology, and the element f of the ith row and the jth column i,j =1 indicates that the ith instruction fragment matches the jth general semantic ontology, f i,j =0 indicates that the ith instruction fragment does not match the jth generic semantic ontology. Since each element can only be matched once, there can only be a value of 1 for each row. For the instruction fragments and the general semantic ontology in table 1, the mapping matrix F obtained by final calculation is shown in table 3, where each row of the matrix corresponds to an instruction fragment, and each column corresponds to a semantic ontology, and 10 rows and 10 columns are used.
In step S700, the resulting mapping matrix F will be used to generate an instruction intention dictionary. In this embodiment, the instruction intention dictionary in step S700 includes a one-to-one correspondence between general instructions in a general semantic ontology and specific instruction fragments of a single device.
TABLE 4 Table 4
Figure BDA0003067273180000101
Obtaining instruction intention dictionary according to table 3 as shown in table 4, the instruction fragments of each row in table 4 are matched with the general semantic ontology on the right side of the instruction fragments, and the accuracy of the matching result is 100%.
The intent description encoder is constructed from a Transformer layer; each transducer layer comprises a self-attention layer and a feedforward neural network layer, and the attention mechanism of the self-attention layer is utilized to automatically learn the characteristic with resolution so as to meet the training target; the input is a sequence of descriptors, and a placeholder "[ cls ]" is added at the forefront of the sequence; for each word input, the output is a vectorized representation thereof obtained by the graph description encoder; vector representations of "[ cls ]" are used as descriptive characterizations; the intent description encoder uses a Pre-trained BERT model (BERT: pre-training of Deep Bidirectional Transformers forLanguage Understanding) BERT-small, with 4 transform layers, each layer having 512 heads.
The instruction fragment encoder uses the same settings and operations as the intent description encoder. The only difference is that the instruction fragment content cannot use a pre-trained language model. The sentence contains many special characters and abbreviations and therefore cannot be parsed directly using a pre-trained language model, the model variables being randomly initialized.
In the training process, an Adam optimizer with a learning rate of 10-5 was used.
The inventors used instruction fragments from 689 profiles from four suppliers (cisco, hua for, xinhua three, rui) as the instruction fragment set for training and testing. Of these, 304 were from cisco, 186 from hua-si, 151 from Xinhua-san, and 67 from rui. All vendor profiles come from data centers supporting the same service and the devices perform the same network architecture roles.
Experimental results show that the present invention can achieve 100% alignment accuracy for different suppliers, which illustrates the robustness of the method of the present invention to a variety of different environments. Because the invention considers the correlation between the intent description and the instruction fragments, the invention can still realize matching the instruction fragments of various heterogeneous devices with the universal semantic ontology based on the intent description even though the instruction fragments of different suppliers can be greatly different.
The present invention adopts a mechanism for automatically screening sample data in step S200, and thus, the inventors have studied the influence of different sample selection methods on the matching accuracy. In addition to the sample selection method of the present invention, we also evaluated the performance of the other two sample selection methods (i.e., the random method and the entropy-only based method) on four data sets. The random method is to randomly select samples in the learning process, and the method based on information amount only is to calculate the information entropy of all data of the batch according to the formula in the step S210, and then select samples with higher information entropy. The experimental results are shown in FIG. 2.
The accuracy in fig. 2 refers to the ratio of the number of pieces to the total number of pieces that match correctly, and the sample rate refers to the ratio of the number of samples to the total number of samples used to make an exact match data set. As can be seen from experimental results, compared with other methods, the accuracy of the method of the invention increases fastest with the increase of the number of samples, and can reach 100% accuracy only by 10% of labels on average. This shows that the samples of the present invention can be participated in achieving the best accuracy with the least sample tags.
Furthermore, from the experimental results, it can be seen that the increasing trend of the accuracy is consistent for different suppliers, further illustrating the robustness of the method of the present invention to various different environments.
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process or method.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims (9)

1. A heterogeneous device ontology matching method integrating semantic information is characterized by comprising the following steps of: matching the instruction segment of the Internet of things equipment with the general semantic ontology; the general semantic ontology refers to a general instruction meaning set, and the instruction fragments are specific instructions when the Internet of things equipment executes an intention; provided with N 1 Instruction fragment set of individual elements
Figure FDA0004185918240000011
And has N 2 Personal element general semantic ontology set->
Figure FDA0004185918240000012
The method finds sk 1 And sk 2 Pairs of elements with the same intent in between, each element can only be matched once;
the method specifically comprises the following operation steps:
step S100, instruction fragment set sk 1 And a generic semantic ontology set sk 2 Inputting all elements in the instruction understanding model one by one, and generating respective instruction intention characterization vectors; the input instruction understanding model comprises an intention description encoder and an instruction fragment encoder;
step S200, screening a small instruction fragment set sk from the whole training data set 1 'for making an exact match dataset F';
step S300, calculating a similarity matrix based on the instruction intention characterization vector
Figure FDA0004185918240000013
Step S400, calculating a mapping matrix based on the similarity matrix S
Figure FDA0004185918240000014
Step S500, calculating an objective function F1 based on the similarity matrix S and the mapping matrix F, wherein the calculation result is used for back propagation to update the mapping matrix F;
step S600, updating the corresponding part of the mapping matrix F by the accurate matching data set F', calculating an objective function F2, and using the calculation result for back propagation to update the instruction understanding model;
step S700, repeatedly cycling steps S200 to S600 until the mapping matrix F realizes accurate matching sk' 1 The method comprises the steps of carrying out a first treatment on the surface of the The resulting mapping matrix F will be used to generate an instruction intent dictionary.
2. The heterogeneous device ontology matching method of claim 1, wherein the step S100 includes the following sub-steps:
step S110, for the instruction fragment set sk 1 And a generic semantic ontology set sk 2 Inputting descriptive text content and instruction content in the element into an intention description encoder and an instruction fragment encoder respectively;
step S120, concatenating the encoding results of the two encoders into a vector, as an instruction intent representation vector of the element.
3. The heterogeneous device ontology matching method of claim 1, wherein the step S200 specifically includes the following sub-steps:
step S210, calculating the instruction fragment set according to the following formula
Figure FDA0004185918240000021
Information entropy E (p) i ):
Figure FDA0004185918240000022
Wherein p is i,j Is instruction fragment p i The frequency of occurrence of word j in the whole instruction fragment set;
step S220, calculating the instruction fragment set sk according to the following formula 1 All elements in the list and the universal semantic ontology set sk 2 Redundancy of all elements in (a):
R i,j =max(0,cos(p i ,q j ))
in the above formula, cos (p i ,q j ) Representing the instruction fragment set sk 1 The medium element p i With the generic semantic ontology set sk 2 Medium element q j The instruction intents of the two represent cosine distances between vectors;
step S230, using the ratio of the information entropy to the redundancy as the quantized sample selection value index RE, the calculation formula is as follows:
Figure FDA0004185918240000023
wherein B is a collection of data;
step S240, selecting a value index RE selecting part according to the quantized samplesData sk 1 ' an exact match dataset F ' is made, which dataset F ' contains sk 1 ' exact match results for each instruction fragment in.
4. The heterogeneous device ontology matching method of claim 1, wherein the step S300 includes the following sub-steps:
step S310, calculating the instruction fragment set sk 1 Instruction intent token vector and generic semantic ontology set sk for each element in a library 2 The instruction intent of each element of (a) characterizes the Euclidean distance between vectors;
step S320, all calculation results form a similarity matrix
Figure FDA0004185918240000024
Wherein element s i,j E S represents the instruction fragment set sk 1 Element p of (a) i And general semantic ontology set sk 2 Medium element q j Euclidean distance between them.
5. The heterogeneous device ontology matching method of claim 1, wherein the specific content of step S400 is as follows: the optimal mapping matrix F is solved using the classical algorithm sink horn in the optimal transmission theory, sink horn algorithm defining a mapping matrix f=diag (u) ·k·diag (v), where K: =e -λS Wherein λ is a super parameter, and is used to control the difference between the elements of the mapping matrix F, where the larger λ is, the smaller the difference is; s is the similarity matrix, and diag (u) and diag (v) respectively refer to square matrixes with vectors u and v as diagonal lines and the rest of 0; the vector u and the vector v are calculated based on an iterative mode, and initialized to be all 1 vectors, and the calculation method comprises the following steps:
Figure FDA0004185918240000031
Figure FDA0004185918240000032
wherein the method comprises the steps of
Figure FDA0004185918240000033
The term-wise division of vector elements is represented, and a and b respectively represent normalized vectors of the similarity matrix S calculated according to rows and columns; the sink horn solving process will iteratively solve F until convergence or maximum number of steps is reached; mapping matrix
Figure FDA0004185918240000034
For element f i,j E F, if the instruction fragment set sk 1 Element p of (a) i And general semantic ontology set sk 2 Medium element q j Should align then f i,j =1, otherwise is f i,j =0。
6. The heterogeneous device ontology matching method of claim 1, wherein the calculation formula of the objective function f1 in step S500 is:
Figure FDA0004185918240000035
wherein the first item
Figure FDA0004185918240000036
In order to minimize the sum of the Euclidean distances between the matched segments; second term λ (-f) i,j logf i,j ) Is the additional entropy regularization target, so even f i,j The non-integer is: f (f) i,j ∈[0,1]It is still possible to make its value as close to 0 or 1 as possible.
7. The heterogeneous device ontology matching method of claim 1, wherein the step S600 includes the following sub-steps:
step S610, updating the mapping matrix F with the exact matching result in the exact matching data F', and comparing the corresponding F in the mapping matrix F i,j Setting 1, and setting 0 for other tags;
step S620, a triplet is constructed for each instruction fragment, i.e. for instruction fragment p i Construct triplet (p i ,q pos ,q neg ) Wherein p is i And q pos Are alignable positive samples obtained based on the mapping matrix F, and q neg Is from the generic semantic ontology set sk 2 Sampled negative samples, i.e. from the generic semantic ontology set sk 2 Is randomly extracted a non-q pos Fragments;
step S630, on the basis of the triplets, training the intent description encoder and the instruction fragment encoder simultaneously by minimizing the target f2, wherein the calculation formula of the target function f2 is as follows:
Figure FDA0004185918240000041
wherein dis (·) is a similarity measure between vector representations, which can be calculated using euclidean distance;
step S640, updating the instruction understanding model according to the calculation result.
8. The heterogeneous device ontology matching method based on semantic information fusion according to claim 1, wherein the instruction intention dictionary in step S700 includes a one-to-one correspondence between general instructions in the semantic ontology and specific instruction fragments of a single internet of things device.
9. The heterogeneous device ontology matching method for fusing semantic information according to claim 1, wherein the heterogeneous device ontology matching method is characterized in that:
the intent description encoder is constructed from a Transformer layer; each transducer layer comprises a self-attention layer and a feedforward neural network layer, and the attention mechanism of the self-attention layer is utilized to automatically learn the characteristic with resolution so as to meet the training target; the input is a sequence of descriptors, and a placeholder "[ cls ]" is added at the forefront of the sequence; for each word input, the output is a vectorized representation thereof obtained by the graph description encoder; vector representations of "[ cls ]" are used as descriptive characterizations;
the instruction fragment encoder uses the same settings and operations as the intent description encoder, the only difference being that the instruction fragment content cannot use a pre-trained language model; the sentence contains many special characters and abbreviations and therefore cannot be parsed directly using a pre-trained language model.
CN202110530094.3A 2021-05-14 2021-05-14 Heterogeneous device ontology matching method integrating semantic information Active CN113515930B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110530094.3A CN113515930B (en) 2021-05-14 2021-05-14 Heterogeneous device ontology matching method integrating semantic information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110530094.3A CN113515930B (en) 2021-05-14 2021-05-14 Heterogeneous device ontology matching method integrating semantic information

Publications (2)

Publication Number Publication Date
CN113515930A CN113515930A (en) 2021-10-19
CN113515930B true CN113515930B (en) 2023-05-30

Family

ID=78064277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110530094.3A Active CN113515930B (en) 2021-05-14 2021-05-14 Heterogeneous device ontology matching method integrating semantic information

Country Status (1)

Country Link
CN (1) CN113515930B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114818700B (en) * 2022-05-10 2022-09-23 东南大学 Ontology concept matching method based on paired connectivity graph and graph neural network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032635A (en) * 2019-04-22 2019-07-19 齐鲁工业大学 One kind being based on the problem of depth characteristic fused neural network to matching process and device
CN111325028A (en) * 2020-02-20 2020-06-23 齐鲁工业大学 Intelligent semantic matching method and device based on deep hierarchical coding
CN112749566A (en) * 2019-10-31 2021-05-04 兰雨晴 English writing auxiliary oriented semantic matching method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8027948B2 (en) * 2008-01-31 2011-09-27 International Business Machines Corporation Method and system for generating an ontology

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032635A (en) * 2019-04-22 2019-07-19 齐鲁工业大学 One kind being based on the problem of depth characteristic fused neural network to matching process and device
CN112749566A (en) * 2019-10-31 2021-05-04 兰雨晴 English writing auxiliary oriented semantic matching method and device
CN111325028A (en) * 2020-02-20 2020-06-23 齐鲁工业大学 Intelligent semantic matching method and device based on deep hierarchical coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
用语谱图融合小波变换进行特定人二字汉语词汇识别;魏莹;王双维;潘迪;张玲;许廷发;梁士利;;计算机应用(S1);全文 *

Also Published As

Publication number Publication date
CN113515930A (en) 2021-10-19

Similar Documents

Publication Publication Date Title
Wang et al. Two are better than one: Joint entity and relation extraction with table-sequence encoders
Banerjee et al. Clustering with Bregman divergences.
WO2018051841A1 (en) Model learning device, method therefor, and program
CN112818676A (en) Medical entity relationship joint extraction method
CN114492363B (en) Small sample fine adjustment method, system and related device
CN111291165B (en) Method and device for embedding training word vector into model
Yang et al. Quadratic nonnegative matrix factorization
CN113268609A (en) Dialog content recommendation method, device, equipment and medium based on knowledge graph
CN111368542A (en) Text language association extraction method and system based on recurrent neural network
CN113515930B (en) Heterogeneous device ontology matching method integrating semantic information
Abernethy et al. A mechanism for sample-efficient in-context learning for sparse retrieval tasks
CN113535897A (en) Fine-grained emotion analysis method based on syntactic relation and opinion word distribution
Wu et al. WTMED at MEDIQA 2019: A hybrid approach to biomedical natural language inference
WO2020100738A1 (en) Processing device, processing method, and processing program
CN115271063A (en) Inter-class similarity knowledge distillation method and model based on feature prototype projection
CN111611395B (en) Entity relationship identification method and device
Frieder et al. Large language models for mathematicians
Czasonis et al. Relevance-Based Prediction: A Transparent and Adaptive Alternative to Machine Learning.
Jeon et al. Pet: Parameter-efficient knowledge distillation on transformer
Katayama et al. Robust and sparse Gaussian graphical modelling under cell‐wise contamination
CN112131363B (en) Automatic question and answer method, device, equipment and storage medium
WO2020100739A1 (en) Learning device, learning method and learning program
Iqbal et al. Computational Technique for an Efficient Classification of Protein Sequences With Distance‐Based Sequence Encoding Algorithm
Yu Latent Structure Estimation for Panel Data and Theoretical Guarantees for Stochastic Optimization
Chen et al. Multi‐layer features ensemble soft sensor regression model based on stacked autoencoder and vine copula

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant