CN113515930B

CN113515930B - Heterogeneous device ontology matching method integrating semantic information

Info

Publication number: CN113515930B
Application number: CN202110530094.3A
Authority: CN
Inventors: 孙海峰; 庄子睿; 成岱璇; 赵津宇; 戚琦; 王敬宇; 李炜; 廖建新; 王晶
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-05-14
Filing date: 2021-05-14
Publication date: 2023-05-30
Anticipated expiration: 2041-05-14
Also published as: CN113515930A

Abstract

A heterogeneous device ontology matching method integrating semantic information comprises the following steps: (1) Instruction fragment set sk ₁ And a generic semantic ontology set sk ₂ Inputting all elements in the instruction understanding model one by one to generate an instruction intention characterization vector; (2) Screening a small set of instruction fragments sk from a training dataset ₁ 'for making an exact match dataset F'; (3) calculating a similarity matrix S; (4) calculating a mapping matrix F; (5) calculating an objective function f1; (6) updating the mapping matrix F, and calculating an objective function F2; (7) Cycling steps (2) to (6) for a plurality of times until the mapping matrix F realizes accurate matching sk '' ₁ The resulting mapping matrix F will be used to generate an instruction intent dictionary.

Description

Heterogeneous device ontology matching method integrating semantic information

Technical Field

The invention relates to a heterogeneous device ontology matching method fusing semantic information, belongs to the technical field of Internet of things, and particularly belongs to the technical field of heterogeneous device ontology matching.

Background

In the field of internet of things, different devices use different instruction languages to express the same instruction intent, because suppliers intentionally design instruction syntax that is quite different from competitors to increase the switching cost of customers. In addition, the instruction syntax of the device is also strictly protected by the patent. Therefore, there is no clear one-to-one correspondence of instruction statements between different vendors, and even the same terms have different expressions. This results in an extremely difficult management of the network when heterogeneous devices are included in the network. Therefore, how to match the instruction fragments of different devices with a general instruction meaning set (namely, semantic ontology) reduces the difficulty of managing heterogeneous device networks, and becomes a technical problem which needs to be solved in the technical field of the Internet of things at present.

Disclosure of Invention

In view of the above, the invention aims to invent a method for realizing the aim of matching the instruction segment of the internet of things equipment with the general semantic ontology based on a deep learning model. In order to achieve the purpose, the invention provides a heterogeneous equipment body matching method for fusing semantic information, which is used for matching instruction fragments of Internet of things equipment with a general semantic body; the semantic ontology refers to a general instruction meaning set, and the instruction fragments are specific instructions when the Internet of things equipment executes an intention; provided with N ₁ Instruction fragment set of individual elements

And has N ₂ Personal element general semantic ontology set->

The method finds sk ₁ And sk ₂ Pairs of elements with the same intent in between, each element can only be matched once;

the method specifically comprises the following operation steps:

step S100, instruction fragment set sk ₁ And a generic semantic ontology set sk ₂ Inputting all elements in the instruction understanding model one by one, and generating respective instruction intention characterization vectors; the input instruction understanding model comprises an intention description encoder and an instruction fragment encoderA coder;

step S200, screening a small instruction fragment set sk from the whole training data set ₁ 'for making an exact match dataset F';

step S300, calculating a similarity matrix based on the instruction intention characterization vector

Step S400, calculating a mapping matrix based on the similarity matrix S

Step S500, calculating an objective function F1 based on the similarity matrix S and the mapping matrix F, wherein the calculation result is used for back propagation to update the mapping matrix F;

step S600, updating the corresponding part of the mapping matrix F by the accurate matching data set F', calculating an objective function F2, and using the calculation result for back propagation to update the instruction understanding model;

step S700, repeatedly cycling steps S200 to S600 until the mapping matrix F realizes accurate matching sk' ₁ The method comprises the steps of carrying out a first treatment on the surface of the The resulting mapping matrix F will be used to generate an instruction intent dictionary.

The step S100 includes the following sub-steps:

step S110, for the instruction fragment set sk ₁ And a generic semantic ontology set sk ₂ Inputting descriptive text content and instruction content in the element into an intention description encoder and an instruction fragment encoder respectively;

step S120, concatenating the encoding results of the two encoders into a vector, as an instruction intent representation vector of the element.

The step S200 specifically includes the following operation substeps:

step S210, calculating the instruction fragment set according to the following formula

Each element of (3)Information entropy E (p) _i )：

Wherein p is _i，j Is instruction fragment p _i The frequency of occurrence of word j in the whole instruction fragment set;

step S220, calculating the instruction fragment set sk according to the following formula ₁ All elements in the list and the universal semantic ontology set sk ₂ Redundancy of all elements in (a):

R _i，j ＝max(0，cos(p _i ，q _j ))

in the above formula, cos (p _i ，q _j ) Representing the instruction fragment set sk ₁ The medium element p _i With the generic semantic ontology set sk ₂ Medium element q _j The instruction intents of the two represent cosine distances between vectors;

step S230, using the ratio of the information entropy to the redundancy as the quantized sample selection value index RE, the calculation formula is as follows:

wherein B is a collection of data;

step S240, selecting a partial data sk according to the quantized sample selection value index RE ₁ ' an exact match dataset F ' is made, which dataset F ' contains sk ₁ ' exact match results for each instruction fragment in.

The step S300 includes the following sub-steps:

step S310, calculating the instruction fragment set sk ₁ Instruction intent token vector and generic semantic ontology set sk for each element in a library ₂ The instruction intent of each element of (a) characterizes the Euclidean distance between vectors;

step S320, all calculation results form a similarity matrix

Wherein element s _i，j E S represents the instruction fragment set sk ₁ Element p of (a) _i And general semantic ontology set sk ₂ Medium element q _j Euclidean distance between them.

The specific content of the step S400 is as follows: the optimal mapping matrix F is solved using the classical algorithm sink horn in the optimal transmission theory, sink horn algorithm defining a mapping matrix f=diag (u) ·k·diag (v), where K: =e ^-λS Wherein λ is a super parameter, and is used to control the difference between the elements of the mapping matrix F, where the larger λ is, the smaller the difference is; s is the similarity matrix, diag (u) and diag (v) respectively refer to the square matrix with vector u and vector v as diagonal lines and the rest of 0. The vector u and the vector v are calculated based on an iterative mode, and initialized to be all 1 vectors, and the calculation method comprises the following steps:

wherein the method comprises the steps of

The term-wise division of vector elements is represented, and a and b respectively represent normalized vectors of the similarity matrix S calculated according to rows and columns; the sink horn solving process will iteratively solve F until convergence or maximum number of steps is reached; mapping matrix

For element f _i，j E F, if the instruction fragment set sk ₁ Element p of (a) _i And general semantic ontology set sk ₂ Medium element q _j Should align then f _i，j =1, otherwise is f _i，j ＝0。

The calculation formula of the objective function f1 in the step S500 is as follows:

wherein the first item

In order to minimize the sum of the Euclidean distances between the matched segments; second term λ (-f) _i，j logf _i，j ) Is the additional entropy regularization target, so even f _i，j The non-integer is: f (f) _i，j ∈[0，1]It is still possible to make its value as close to 0 or 1 as possible.

The step S600 includes the following sub-steps:

step S610, updating the mapping matrix F with the exact matching result in the exact matching data F', and comparing the corresponding F in the mapping matrix F _i，j Setting 1, and setting 0 for other tags;

step S620, a triplet is constructed for each instruction fragment, i.e. for instruction fragment p _i Construct triplet (p _i ，q _pos ，q _neg ) Wherein p is _i And q _pos Are alignable positive samples obtained based on the mapping matrix F, and q _neg Is from the generic semantic ontology set sk ₂ Sampled negative samples, i.e. from the generic semantic ontology set sk ₂ Is randomly extracted a non-q _pos Fragments;

step S630, based on the triples, by minimizing the target f ₂ Training the intention description encoder and the instruction fragment encoder simultaneously, the objective function f ₂ The calculation formula of (2) is as follows:

where dis (·) is a measure of similarity between vector representations, which can be calculated using euclidean distance.

Step S640, updating the instruction understanding model according to the calculation result.

The instruction intention dictionary in step S700 contains a one-to-one correspondence between general instructions in the semantic ontology and specific instruction fragments of a single internet of things device.

The intent description encoder is constructed from a Transformer layer; each transducer layer comprises a self-attention layer and a feedforward neural network layer, and the attention mechanism of the self-attention layer is utilized to automatically learn the characteristic with resolution so as to meet the training target; the input is a sequence of descriptors, and a placeholder "[ cls ]" is added at the forefront of the sequence; for each word input, the output is a vectorized representation thereof obtained by the graph description encoder; vector representations of "[ cls ]" are used as descriptive characterizations;

the instruction fragment encoder uses the same settings and operations as the intent description encoder, the only difference being that the instruction fragment content cannot use a pre-trained language model; the sentence contains many special characters and abbreviations and therefore cannot be parsed directly using a pre-trained language model.

The invention has the beneficial effects that: according to the invention, the instruction fragments of various heterogeneous devices are matched with the general semantic ontology by taking the intention description as a benchmark, so that the difficulty in managing heterogeneous device networks is reduced; according to the invention, through training the instruction understanding model, the similarity of vector representations among elements which can be matched is increased, and meanwhile, the similarity of element vector representations which cannot be matched is reduced; the method is used for manufacturing the matching data set by screening a small amount of data, so that the manufacturing difficulty is reduced, and the matching performance of the model is ensured.

Drawings

FIG. 1 is a flow chart of a heterogeneous device ontology matching method fusing semantic information;

FIG. 2 is a graph comparing experimental results of the influence of different sample selection methods on the matching accuracy in the embodiment of the present invention;

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.

Referring to fig. 1, the invention provides a heterogeneous device ontology matching method for fusing semantic information, which matches instruction fragments of internet of things devices with a general semantic ontology; the semantic ontology refers to a general instruction meaning set, and the instruction fragments are specific instructions when the Internet of things equipment executes an intention; provided with N ₁ Instruction fragment set of individual elements

And has N ₂ Personal element general semantic ontology set->

The method finds sk ₁ And sk ₂ Pairs of elements with the same intent in between, each element can only be matched once; table 1 is one example of a set of instruction fragments and a set of generic semantic ontologies.

TABLE 1

The method specifically comprises the following operation steps:

step S100, instruction fragment set sk ₁ And a generic semantic ontology set sk ₂ Inputting all elements in the instruction understanding model one by one, and generating respective instruction intention characterization vectors; the input instruction understanding model comprises an intention description encoder and an instruction fragment encoder;

step S200, screening a small instruction fragment set sk from the whole training data set ₁ 'for making an exact match dataset F'; the implementation details are as follows: sample data of B batches are extracted from all instruction fragments, B is set to 10, that is, we extract only 10 instruction fragments forManufacturing an accurate matching data set; according to candidate configuration matches in the generic configuration tree (common config tree, CCT), a generic semantic ontology is found that matches each instruction fragment.

Step S400, calculating a mapping matrix based on the similarity matrix S

The step S100 includes the following sub-steps:

The step S200 specifically includes the following operation substeps:

Letter of each element in (a)Entropy E (p) _i )：

R _i，j ＝max(0，cos(p _i ，q _j ))

wherein B is a collection of data;

The step S300 includes the following sub-steps:

step S320, all calculation results form a similarity matrix

TABLE 2

According to the instruction fragments and the general semantic ontology in table 1, the similarity matrix calculated according to the above steps is shown in table 2, the number of rows of the matrix is 10, the number of columns is 10, and the number of columns is equal to the number of semantic ontologies. The element value s of the ith row and jth column in Table 2 _i，j Representing the euclidean distance between the ith instruction fragment and the jth semantic ontology. The smallest distance value in each row is indicated in bold.

The specific content of the step S400 is as follows: the optimal mapping matrix F is solved using the classical algorithm sink horn in the optimal transmission theory, sink horn algorithm defining a mapping matrix f=diag (u) ·k·diag (v), where K: =e ^-λS Wherein λ is a super parameter, and is used to control the difference between the elements of the mapping matrix F, where the larger λ is, the smaller the difference is; the super parameter λ is set to 0.1 in the present embodiment.

S is the similarity matrix, and diag (u) and diag (v) respectively refer to square matrixes with the vector u and the vector v as diagonal lines and the rest being 0. The vector u and the vector v are calculated based on an iterative mode, and initialized to be all 1 vectors, and the calculation method comprises the following steps:

wherein the method comprises the steps of

wherein the first item

The step S600 includes the following sub-steps:

wherein dis (·) is a similarity measure between vector representations, which can be calculated using euclidean distance;

Until the mapping matrix F realizes accurate matching sk' ₁ 。

TABLE 3 Table 3

0	0	0	1	0	0	0	0	0	0
										1	0	0	0	0	0	0	0	0	0
0	1	0	0	0	0	0	0	0	0
										0	0	0	0	0	0	1	0	0	0
0	0	1	0	0	0	0	0	0	0
										0	0	0	0	0	0	0	0	1	0
0	0	0	0	0	0	0	1	0	0
										0	0	0	0	0	0	0	0	0	1
0	0	0	0	0	1	0	0	0	0
										0	0	0	0	1	0	0	0	0	0

Mapping matrix of the present invention

Shows the matching relation between the instruction fragment and the general semantic ontology, and the element f of the ith row and the jth column _i，j =1 indicates that the ith instruction fragment matches the jth general semantic ontology, f _i，j =0 indicates that the ith instruction fragment does not match the jth generic semantic ontology. Since each element can only be matched once, there can only be a value of 1 for each row. For the instruction fragments and the general semantic ontology in table 1, the mapping matrix F obtained by final calculation is shown in table 3, where each row of the matrix corresponds to an instruction fragment, and each column corresponds to a semantic ontology, and 10 rows and 10 columns are used.

In step S700, the resulting mapping matrix F will be used to generate an instruction intention dictionary. In this embodiment, the instruction intention dictionary in step S700 includes a one-to-one correspondence between general instructions in a general semantic ontology and specific instruction fragments of a single device.

TABLE 4 Table 4

Obtaining instruction intention dictionary according to table 3 as shown in table 4, the instruction fragments of each row in table 4 are matched with the general semantic ontology on the right side of the instruction fragments, and the accuracy of the matching result is 100%.

The intent description encoder is constructed from a Transformer layer; each transducer layer comprises a self-attention layer and a feedforward neural network layer, and the attention mechanism of the self-attention layer is utilized to automatically learn the characteristic with resolution so as to meet the training target; the input is a sequence of descriptors, and a placeholder "[ cls ]" is added at the forefront of the sequence; for each word input, the output is a vectorized representation thereof obtained by the graph description encoder; vector representations of "[ cls ]" are used as descriptive characterizations; the intent description encoder uses a Pre-trained BERT model (BERT: pre-training of Deep Bidirectional Transformers forLanguage Understanding) BERT-small, with 4 transform layers, each layer having 512 heads.

The instruction fragment encoder uses the same settings and operations as the intent description encoder. The only difference is that the instruction fragment content cannot use a pre-trained language model. The sentence contains many special characters and abbreviations and therefore cannot be parsed directly using a pre-trained language model, the model variables being randomly initialized.

In the training process, an Adam optimizer with a learning rate of 10-5 was used.

The inventors used instruction fragments from 689 profiles from four suppliers (cisco, hua for, xinhua three, rui) as the instruction fragment set for training and testing. Of these, 304 were from cisco, 186 from hua-si, 151 from Xinhua-san, and 67 from rui. All vendor profiles come from data centers supporting the same service and the devices perform the same network architecture roles.

Experimental results show that the present invention can achieve 100% alignment accuracy for different suppliers, which illustrates the robustness of the method of the present invention to a variety of different environments. Because the invention considers the correlation between the intent description and the instruction fragments, the invention can still realize matching the instruction fragments of various heterogeneous devices with the universal semantic ontology based on the intent description even though the instruction fragments of different suppliers can be greatly different.

The present invention adopts a mechanism for automatically screening sample data in step S200, and thus, the inventors have studied the influence of different sample selection methods on the matching accuracy. In addition to the sample selection method of the present invention, we also evaluated the performance of the other two sample selection methods (i.e., the random method and the entropy-only based method) on four data sets. The random method is to randomly select samples in the learning process, and the method based on information amount only is to calculate the information entropy of all data of the batch according to the formula in the step S210, and then select samples with higher information entropy. The experimental results are shown in FIG. 2.

The accuracy in fig. 2 refers to the ratio of the number of pieces to the total number of pieces that match correctly, and the sample rate refers to the ratio of the number of samples to the total number of samples used to make an exact match data set. As can be seen from experimental results, compared with other methods, the accuracy of the method of the invention increases fastest with the increase of the number of samples, and can reach 100% accuracy only by 10% of labels on average. This shows that the samples of the present invention can be participated in achieving the best accuracy with the least sample tags.

Furthermore, from the experimental results, it can be seen that the increasing trend of the accuracy is consistent for different suppliers, further illustrating the robustness of the method of the present invention to various different environments.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process or method.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims

1. A heterogeneous device ontology matching method integrating semantic information is characterized by comprising the following steps of: matching the instruction segment of the Internet of things equipment with the general semantic ontology; the general semantic ontology refers to a general instruction meaning set, and the instruction fragments are specific instructions when the Internet of things equipment executes an intention; provided with N ₁ Instruction fragment set of individual elements

And has N ₂ Personal element general semantic ontology set->

the method specifically comprises the following operation steps:

Step S400, calculating a mapping matrix based on the similarity matrix S

2. The heterogeneous device ontology matching method of claim 1, wherein the step S100 includes the following sub-steps:

3. The heterogeneous device ontology matching method of claim 1, wherein the step S200 specifically includes the following sub-steps:

Information entropy E (p) _i )：

Wherein p is _i,j Is instruction fragment p _i The frequency of occurrence of word j in the whole instruction fragment set;

R _i,j ＝max(0,cos(p _i ,q _j ))

in the above formula, cos (p _i ,q _j ) Representing the instruction fragment set sk ₁ The medium element p _i With the generic semantic ontology set sk ₂ Medium element q _j The instruction intents of the two represent cosine distances between vectors;

wherein B is a collection of data;

step S240, selecting a value index RE selecting part according to the quantized samplesData sk ₁ ' an exact match dataset F ' is made, which dataset F ' contains sk ₁ ' exact match results for each instruction fragment in.

4. The heterogeneous device ontology matching method of claim 1, wherein the step S300 includes the following sub-steps:

step S320, all calculation results form a similarity matrix

Wherein element s _i,j E S represents the instruction fragment set sk ₁ Element p of (a) _i And general semantic ontology set sk ₂ Medium element q _j Euclidean distance between them.

5. The heterogeneous device ontology matching method of claim 1, wherein the specific content of step S400 is as follows: the optimal mapping matrix F is solved using the classical algorithm sink horn in the optimal transmission theory, sink horn algorithm defining a mapping matrix f=diag (u) ·k·diag (v), where K: =e ^-λS Wherein λ is a super parameter, and is used to control the difference between the elements of the mapping matrix F, where the larger λ is, the smaller the difference is; s is the similarity matrix, and diag (u) and diag (v) respectively refer to square matrixes with vectors u and v as diagonal lines and the rest of 0; the vector u and the vector v are calculated based on an iterative mode, and initialized to be all 1 vectors, and the calculation method comprises the following steps:

wherein the method comprises the steps of

For element f _i,j E F, if the instruction fragment set sk ₁ Element p of (a) _i And general semantic ontology set sk ₂ Medium element q _j Should align then f _i,j =1, otherwise is f _i,j ＝0。

6. The heterogeneous device ontology matching method of claim 1, wherein the calculation formula of the objective function f1 in step S500 is:

wherein the first item

In order to minimize the sum of the Euclidean distances between the matched segments; second term λ (-f) _i,j logf _i,j ) Is the additional entropy regularization target, so even f _i,j The non-integer is: f (f) _i,j ∈[0,1]It is still possible to make its value as close to 0 or 1 as possible.

7. The heterogeneous device ontology matching method of claim 1, wherein the step S600 includes the following sub-steps:

step S610, updating the mapping matrix F with the exact matching result in the exact matching data F', and comparing the corresponding F in the mapping matrix F _i,j Setting 1, and setting 0 for other tags;

step S630, on the basis of the triplets, training the intent description encoder and the instruction fragment encoder simultaneously by minimizing the target f2, wherein the calculation formula of the target function f2 is as follows:

8. The heterogeneous device ontology matching method based on semantic information fusion according to claim 1, wherein the instruction intention dictionary in step S700 includes a one-to-one correspondence between general instructions in the semantic ontology and specific instruction fragments of a single internet of things device.

9. The heterogeneous device ontology matching method for fusing semantic information according to claim 1, wherein the heterogeneous device ontology matching method is characterized in that: