CN113515930B - Heterogeneous device ontology matching method integrating semantic information - Google Patents
Heterogeneous device ontology matching method integrating semantic information Download PDFInfo
- Publication number
- CN113515930B CN113515930B CN202110530094.3A CN202110530094A CN113515930B CN 113515930 B CN113515930 B CN 113515930B CN 202110530094 A CN202110530094 A CN 202110530094A CN 113515930 B CN113515930 B CN 113515930B
- Authority
- CN
- China
- Prior art keywords
- instruction
- mapping matrix
- instruction fragment
- steps
- ontology
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
A heterogeneous device ontology matching method integrating semantic information comprises the following steps: (1) Instruction fragment set sk 1 And a generic semantic ontology set sk 2 Inputting all elements in the instruction understanding model one by one to generate an instruction intention characterization vector; (2) Screening a small set of instruction fragments sk from a training dataset 1 'for making an exact match dataset F'; (3) calculating a similarity matrix S; (4) calculating a mapping matrix F; (5) calculating an objective function f1; (6) updating the mapping matrix F, and calculating an objective function F2; (7) Cycling steps (2) to (6) for a plurality of times until the mapping matrix F realizes accurate matching sk '' 1 The resulting mapping matrix F will be used to generate an instruction intent dictionary.
Description
Technical Field
The invention relates to a heterogeneous device ontology matching method fusing semantic information, belongs to the technical field of Internet of things, and particularly belongs to the technical field of heterogeneous device ontology matching.
Background
In the field of internet of things, different devices use different instruction languages to express the same instruction intent, because suppliers intentionally design instruction syntax that is quite different from competitors to increase the switching cost of customers. In addition, the instruction syntax of the device is also strictly protected by the patent. Therefore, there is no clear one-to-one correspondence of instruction statements between different vendors, and even the same terms have different expressions. This results in an extremely difficult management of the network when heterogeneous devices are included in the network. Therefore, how to match the instruction fragments of different devices with a general instruction meaning set (namely, semantic ontology) reduces the difficulty of managing heterogeneous device networks, and becomes a technical problem which needs to be solved in the technical field of the Internet of things at present.
Disclosure of Invention
In view of the above, the invention aims to invent a method for realizing the aim of matching the instruction segment of the internet of things equipment with the general semantic ontology based on a deep learning model. In order to achieve the purpose, the invention provides a heterogeneous equipment body matching method for fusing semantic information, which is used for matching instruction fragments of Internet of things equipment with a general semantic body; the semantic ontology refers to a general instruction meaning set, and the instruction fragments are specific instructions when the Internet of things equipment executes an intention; provided with N 1 Instruction fragment set of individual elementsAnd has N 2 Personal element general semantic ontology set-> The method finds sk 1 And sk 2 Pairs of elements with the same intent in between, each element can only be matched once;
the method specifically comprises the following operation steps:
step S100, instruction fragment set sk 1 And a generic semantic ontology set sk 2 Inputting all elements in the instruction understanding model one by one, and generating respective instruction intention characterization vectors; the input instruction understanding model comprises an intention description encoder and an instruction fragment encoderA coder;
step S200, screening a small instruction fragment set sk from the whole training data set 1 'for making an exact match dataset F';
step S300, calculating a similarity matrix based on the instruction intention characterization vector
Step S500, calculating an objective function F1 based on the similarity matrix S and the mapping matrix F, wherein the calculation result is used for back propagation to update the mapping matrix F;
step S600, updating the corresponding part of the mapping matrix F by the accurate matching data set F', calculating an objective function F2, and using the calculation result for back propagation to update the instruction understanding model;
step S700, repeatedly cycling steps S200 to S600 until the mapping matrix F realizes accurate matching sk' 1 The method comprises the steps of carrying out a first treatment on the surface of the The resulting mapping matrix F will be used to generate an instruction intent dictionary.
The step S100 includes the following sub-steps:
step S110, for the instruction fragment set sk 1 And a generic semantic ontology set sk 2 Inputting descriptive text content and instruction content in the element into an intention description encoder and an instruction fragment encoder respectively;
step S120, concatenating the encoding results of the two encoders into a vector, as an instruction intent representation vector of the element.
The step S200 specifically includes the following operation substeps:
step S210, calculating the instruction fragment set according to the following formulaEach element of (3)Information entropy E (p) i ):
Wherein p is i,j Is instruction fragment p i The frequency of occurrence of word j in the whole instruction fragment set;
step S220, calculating the instruction fragment set sk according to the following formula 1 All elements in the list and the universal semantic ontology set sk 2 Redundancy of all elements in (a):
R i,j =max(0,cos(p i ,q j ))
in the above formula, cos (p i ,q j ) Representing the instruction fragment set sk 1 The medium element p i With the generic semantic ontology set sk 2 Medium element q j The instruction intents of the two represent cosine distances between vectors;
step S230, using the ratio of the information entropy to the redundancy as the quantized sample selection value index RE, the calculation formula is as follows:
wherein B is a collection of data;
step S240, selecting a partial data sk according to the quantized sample selection value index RE 1 ' an exact match dataset F ' is made, which dataset F ' contains sk 1 ' exact match results for each instruction fragment in.
The step S300 includes the following sub-steps:
step S310, calculating the instruction fragment set sk 1 Instruction intent token vector and generic semantic ontology set sk for each element in a library 2 The instruction intent of each element of (a) characterizes the Euclidean distance between vectors;
step S320, all calculation results form a similarity matrixWherein element s i,j E S represents the instruction fragment set sk 1 Element p of (a) i And general semantic ontology set sk 2 Medium element q j Euclidean distance between them.
The specific content of the step S400 is as follows: the optimal mapping matrix F is solved using the classical algorithm sink horn in the optimal transmission theory, sink horn algorithm defining a mapping matrix f=diag (u) ·k·diag (v), where K: =e -λS Wherein λ is a super parameter, and is used to control the difference between the elements of the mapping matrix F, where the larger λ is, the smaller the difference is; s is the similarity matrix, diag (u) and diag (v) respectively refer to the square matrix with vector u and vector v as diagonal lines and the rest of 0. The vector u and the vector v are calculated based on an iterative mode, and initialized to be all 1 vectors, and the calculation method comprises the following steps:
wherein the method comprises the steps ofThe term-wise division of vector elements is represented, and a and b respectively represent normalized vectors of the similarity matrix S calculated according to rows and columns; the sink horn solving process will iteratively solve F until convergence or maximum number of steps is reached; mapping matrixFor element f i,j E F, if the instruction fragment set sk 1 Element p of (a) i And general semantic ontology set sk 2 Medium element q j Should align then f i,j =1, otherwise is f i,j =0。
The calculation formula of the objective function f1 in the step S500 is as follows:
wherein the first itemIn order to minimize the sum of the Euclidean distances between the matched segments; second term λ (-f) i,j logf i,j ) Is the additional entropy regularization target, so even f i,j The non-integer is: f (f) i,j ∈[0,1]It is still possible to make its value as close to 0 or 1 as possible.
The step S600 includes the following sub-steps:
step S610, updating the mapping matrix F with the exact matching result in the exact matching data F', and comparing the corresponding F in the mapping matrix F i,j Setting 1, and setting 0 for other tags;
step S620, a triplet is constructed for each instruction fragment, i.e. for instruction fragment p i Construct triplet (p i ,q pos ,q neg ) Wherein p is i And q pos Are alignable positive samples obtained based on the mapping matrix F, and q neg Is from the generic semantic ontology set sk 2 Sampled negative samples, i.e. from the generic semantic ontology set sk 2 Is randomly extracted a non-q pos Fragments;
step S630, based on the triples, by minimizing the target f 2 Training the intention description encoder and the instruction fragment encoder simultaneously, the objective function f 2 The calculation formula of (2) is as follows:
where dis (·) is a measure of similarity between vector representations, which can be calculated using euclidean distance.
Step S640, updating the instruction understanding model according to the calculation result.
The instruction intention dictionary in step S700 contains a one-to-one correspondence between general instructions in the semantic ontology and specific instruction fragments of a single internet of things device.
The intent description encoder is constructed from a Transformer layer; each transducer layer comprises a self-attention layer and a feedforward neural network layer, and the attention mechanism of the self-attention layer is utilized to automatically learn the characteristic with resolution so as to meet the training target; the input is a sequence of descriptors, and a placeholder "[ cls ]" is added at the forefront of the sequence; for each word input, the output is a vectorized representation thereof obtained by the graph description encoder; vector representations of "[ cls ]" are used as descriptive characterizations;
the instruction fragment encoder uses the same settings and operations as the intent description encoder, the only difference being that the instruction fragment content cannot use a pre-trained language model; the sentence contains many special characters and abbreviations and therefore cannot be parsed directly using a pre-trained language model.
The invention has the beneficial effects that: according to the invention, the instruction fragments of various heterogeneous devices are matched with the general semantic ontology by taking the intention description as a benchmark, so that the difficulty in managing heterogeneous device networks is reduced; according to the invention, through training the instruction understanding model, the similarity of vector representations among elements which can be matched is increased, and meanwhile, the similarity of element vector representations which cannot be matched is reduced; the method is used for manufacturing the matching data set by screening a small amount of data, so that the manufacturing difficulty is reduced, and the matching performance of the model is ensured.
Drawings
FIG. 1 is a flow chart of a heterogeneous device ontology matching method fusing semantic information;
FIG. 2 is a graph comparing experimental results of the influence of different sample selection methods on the matching accuracy in the embodiment of the present invention;
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.
Referring to fig. 1, the invention provides a heterogeneous device ontology matching method for fusing semantic information, which matches instruction fragments of internet of things devices with a general semantic ontology; the semantic ontology refers to a general instruction meaning set, and the instruction fragments are specific instructions when the Internet of things equipment executes an intention; provided with N 1 Instruction fragment set of individual elementsAnd has N 2 Personal element general semantic ontology set->The method finds sk 1 And sk 2 Pairs of elements with the same intent in between, each element can only be matched once; table 1 is one example of a set of instruction fragments and a set of generic semantic ontologies.
TABLE 1
The method specifically comprises the following operation steps:
step S100, instruction fragment set sk 1 And a generic semantic ontology set sk 2 Inputting all elements in the instruction understanding model one by one, and generating respective instruction intention characterization vectors; the input instruction understanding model comprises an intention description encoder and an instruction fragment encoder;
step S200, screening a small instruction fragment set sk from the whole training data set 1 'for making an exact match dataset F'; the implementation details are as follows: sample data of B batches are extracted from all instruction fragments, B is set to 10, that is, we extract only 10 instruction fragments forManufacturing an accurate matching data set; according to candidate configuration matches in the generic configuration tree (common config tree, CCT), a generic semantic ontology is found that matches each instruction fragment.
Step S300, calculating a similarity matrix based on the instruction intention characterization vector
Step S500, calculating an objective function F1 based on the similarity matrix S and the mapping matrix F, wherein the calculation result is used for back propagation to update the mapping matrix F;
step S600, updating the corresponding part of the mapping matrix F by the accurate matching data set F', calculating an objective function F2, and using the calculation result for back propagation to update the instruction understanding model;
step S700, repeatedly cycling steps S200 to S600 until the mapping matrix F realizes accurate matching sk' 1 The method comprises the steps of carrying out a first treatment on the surface of the The resulting mapping matrix F will be used to generate an instruction intent dictionary.
The step S100 includes the following sub-steps:
step S110, for the instruction fragment set sk 1 And a generic semantic ontology set sk 2 Inputting descriptive text content and instruction content in the element into an intention description encoder and an instruction fragment encoder respectively;
step S120, concatenating the encoding results of the two encoders into a vector, as an instruction intent representation vector of the element.
The step S200 specifically includes the following operation substeps:
step S210, calculating the instruction fragment set according to the following formulaLetter of each element in (a)Entropy E (p) i ):
Wherein p is i,j Is instruction fragment p i The frequency of occurrence of word j in the whole instruction fragment set;
step S220, calculating the instruction fragment set sk according to the following formula 1 All elements in the list and the universal semantic ontology set sk 2 Redundancy of all elements in (a):
R i,j =max(0,cos(p i ,q j ))
in the above formula, cos (p i ,q j ) Representing the instruction fragment set sk 1 The medium element p i With the generic semantic ontology set sk 2 Medium element q j The instruction intents of the two represent cosine distances between vectors;
step S230, using the ratio of the information entropy to the redundancy as the quantized sample selection value index RE, the calculation formula is as follows:
wherein B is a collection of data;
step S240, selecting a partial data sk according to the quantized sample selection value index RE 1 ' an exact match dataset F ' is made, which dataset F ' contains sk 1 ' exact match results for each instruction fragment in.
The step S300 includes the following sub-steps:
step S310, calculating the instruction fragment set sk 1 Instruction intent token vector and generic semantic ontology set sk for each element in a library 2 The instruction intent of each element of (a) characterizes the Euclidean distance between vectors;
step S320, all calculation results form a similarity matrixWherein element s i,j E S represents the instruction fragment set sk 1 Element p of (a) i And general semantic ontology set sk 2 Medium element q j Euclidean distance between them.
TABLE 2
According to the instruction fragments and the general semantic ontology in table 1, the similarity matrix calculated according to the above steps is shown in table 2, the number of rows of the matrix is 10, the number of columns is 10, and the number of columns is equal to the number of semantic ontologies. The element value s of the ith row and jth column in Table 2 i,j Representing the euclidean distance between the ith instruction fragment and the jth semantic ontology. The smallest distance value in each row is indicated in bold.
The specific content of the step S400 is as follows: the optimal mapping matrix F is solved using the classical algorithm sink horn in the optimal transmission theory, sink horn algorithm defining a mapping matrix f=diag (u) ·k·diag (v), where K: =e -λS Wherein λ is a super parameter, and is used to control the difference between the elements of the mapping matrix F, where the larger λ is, the smaller the difference is; the super parameter λ is set to 0.1 in the present embodiment.
S is the similarity matrix, and diag (u) and diag (v) respectively refer to square matrixes with the vector u and the vector v as diagonal lines and the rest being 0. The vector u and the vector v are calculated based on an iterative mode, and initialized to be all 1 vectors, and the calculation method comprises the following steps:
wherein the method comprises the steps ofThe term-wise division of vector elements is represented, and a and b respectively represent normalized vectors of the similarity matrix S calculated according to rows and columns; the sink horn solving process will iteratively solve F until convergence or maximum number of steps is reached; mapping matrixFor element f i,j E F, if the instruction fragment set sk 1 Element p of (a) i And general semantic ontology set sk 2 Medium element q j Should align then f i,j =1, otherwise is f i,j =0。
The calculation formula of the objective function f1 in the step S500 is as follows:
wherein the first itemIn order to minimize the sum of the Euclidean distances between the matched segments; second term λ (-f) i,j logf i,j ) Is the additional entropy regularization target, so even f i,j The non-integer is: f (f) i,j ∈[0,1]It is still possible to make its value as close to 0 or 1 as possible.
The step S600 includes the following sub-steps:
step S610, updating the mapping matrix F with the exact matching result in the exact matching data F', and comparing the corresponding F in the mapping matrix F i,j Setting 1, and setting 0 for other tags;
step S620, a triplet is constructed for each instruction fragment, i.e. for instruction fragment p i Construct triplet (p i ,q pos ,q neg ) Wherein p is i And q pos Are alignable positive samples obtained based on the mapping matrix F, and q neg Is from the generic semantic ontology set sk 2 Sampled negative samples, i.e. from the generic semantic ontology set sk 2 Is randomly extracted a non-q pos Fragments;
step S630, based on the triples, by minimizing the target f 2 Training the intention description encoder and the instruction fragment encoder simultaneously, the objective function f 2 The calculation formula of (2) is as follows:
wherein dis (·) is a similarity measure between vector representations, which can be calculated using euclidean distance;
step S640, updating the instruction understanding model according to the calculation result.
The instruction intention dictionary in step S700 contains a one-to-one correspondence between general instructions in the semantic ontology and specific instruction fragments of a single internet of things device.
Until the mapping matrix F realizes accurate matching sk' 1 。
TABLE 3 Table 3
0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Mapping matrix of the present inventionShows the matching relation between the instruction fragment and the general semantic ontology, and the element f of the ith row and the jth column i,j =1 indicates that the ith instruction fragment matches the jth general semantic ontology, f i,j =0 indicates that the ith instruction fragment does not match the jth generic semantic ontology. Since each element can only be matched once, there can only be a value of 1 for each row. For the instruction fragments and the general semantic ontology in table 1, the mapping matrix F obtained by final calculation is shown in table 3, where each row of the matrix corresponds to an instruction fragment, and each column corresponds to a semantic ontology, and 10 rows and 10 columns are used.
In step S700, the resulting mapping matrix F will be used to generate an instruction intention dictionary. In this embodiment, the instruction intention dictionary in step S700 includes a one-to-one correspondence between general instructions in a general semantic ontology and specific instruction fragments of a single device.
TABLE 4 Table 4
Obtaining instruction intention dictionary according to table 3 as shown in table 4, the instruction fragments of each row in table 4 are matched with the general semantic ontology on the right side of the instruction fragments, and the accuracy of the matching result is 100%.
The intent description encoder is constructed from a Transformer layer; each transducer layer comprises a self-attention layer and a feedforward neural network layer, and the attention mechanism of the self-attention layer is utilized to automatically learn the characteristic with resolution so as to meet the training target; the input is a sequence of descriptors, and a placeholder "[ cls ]" is added at the forefront of the sequence; for each word input, the output is a vectorized representation thereof obtained by the graph description encoder; vector representations of "[ cls ]" are used as descriptive characterizations; the intent description encoder uses a Pre-trained BERT model (BERT: pre-training of Deep Bidirectional Transformers forLanguage Understanding) BERT-small, with 4 transform layers, each layer having 512 heads.
The instruction fragment encoder uses the same settings and operations as the intent description encoder. The only difference is that the instruction fragment content cannot use a pre-trained language model. The sentence contains many special characters and abbreviations and therefore cannot be parsed directly using a pre-trained language model, the model variables being randomly initialized.
In the training process, an Adam optimizer with a learning rate of 10-5 was used.
The inventors used instruction fragments from 689 profiles from four suppliers (cisco, hua for, xinhua three, rui) as the instruction fragment set for training and testing. Of these, 304 were from cisco, 186 from hua-si, 151 from Xinhua-san, and 67 from rui. All vendor profiles come from data centers supporting the same service and the devices perform the same network architecture roles.
Experimental results show that the present invention can achieve 100% alignment accuracy for different suppliers, which illustrates the robustness of the method of the present invention to a variety of different environments. Because the invention considers the correlation between the intent description and the instruction fragments, the invention can still realize matching the instruction fragments of various heterogeneous devices with the universal semantic ontology based on the intent description even though the instruction fragments of different suppliers can be greatly different.
The present invention adopts a mechanism for automatically screening sample data in step S200, and thus, the inventors have studied the influence of different sample selection methods on the matching accuracy. In addition to the sample selection method of the present invention, we also evaluated the performance of the other two sample selection methods (i.e., the random method and the entropy-only based method) on four data sets. The random method is to randomly select samples in the learning process, and the method based on information amount only is to calculate the information entropy of all data of the batch according to the formula in the step S210, and then select samples with higher information entropy. The experimental results are shown in FIG. 2.
The accuracy in fig. 2 refers to the ratio of the number of pieces to the total number of pieces that match correctly, and the sample rate refers to the ratio of the number of samples to the total number of samples used to make an exact match data set. As can be seen from experimental results, compared with other methods, the accuracy of the method of the invention increases fastest with the increase of the number of samples, and can reach 100% accuracy only by 10% of labels on average. This shows that the samples of the present invention can be participated in achieving the best accuracy with the least sample tags.
Furthermore, from the experimental results, it can be seen that the increasing trend of the accuracy is consistent for different suppliers, further illustrating the robustness of the method of the present invention to various different environments.
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process or method.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.
Claims (9)
1. A heterogeneous device ontology matching method integrating semantic information is characterized by comprising the following steps of: matching the instruction segment of the Internet of things equipment with the general semantic ontology; the general semantic ontology refers to a general instruction meaning set, and the instruction fragments are specific instructions when the Internet of things equipment executes an intention; provided with N 1 Instruction fragment set of individual elementsAnd has N 2 Personal element general semantic ontology set->The method finds sk 1 And sk 2 Pairs of elements with the same intent in between, each element can only be matched once;
the method specifically comprises the following operation steps:
step S100, instruction fragment set sk 1 And a generic semantic ontology set sk 2 Inputting all elements in the instruction understanding model one by one, and generating respective instruction intention characterization vectors; the input instruction understanding model comprises an intention description encoder and an instruction fragment encoder;
step S200, screening a small instruction fragment set sk from the whole training data set 1 'for making an exact match dataset F';
step S300, calculating a similarity matrix based on the instruction intention characterization vector
Step S500, calculating an objective function F1 based on the similarity matrix S and the mapping matrix F, wherein the calculation result is used for back propagation to update the mapping matrix F;
step S600, updating the corresponding part of the mapping matrix F by the accurate matching data set F', calculating an objective function F2, and using the calculation result for back propagation to update the instruction understanding model;
step S700, repeatedly cycling steps S200 to S600 until the mapping matrix F realizes accurate matching sk' 1 The method comprises the steps of carrying out a first treatment on the surface of the The resulting mapping matrix F will be used to generate an instruction intent dictionary.
2. The heterogeneous device ontology matching method of claim 1, wherein the step S100 includes the following sub-steps:
step S110, for the instruction fragment set sk 1 And a generic semantic ontology set sk 2 Inputting descriptive text content and instruction content in the element into an intention description encoder and an instruction fragment encoder respectively;
step S120, concatenating the encoding results of the two encoders into a vector, as an instruction intent representation vector of the element.
3. The heterogeneous device ontology matching method of claim 1, wherein the step S200 specifically includes the following sub-steps:
step S210, calculating the instruction fragment set according to the following formulaInformation entropy E (p) i ):
Wherein p is i,j Is instruction fragment p i The frequency of occurrence of word j in the whole instruction fragment set;
step S220, calculating the instruction fragment set sk according to the following formula 1 All elements in the list and the universal semantic ontology set sk 2 Redundancy of all elements in (a):
R i,j =max(0,cos(p i ,q j ))
in the above formula, cos (p i ,q j ) Representing the instruction fragment set sk 1 The medium element p i With the generic semantic ontology set sk 2 Medium element q j The instruction intents of the two represent cosine distances between vectors;
step S230, using the ratio of the information entropy to the redundancy as the quantized sample selection value index RE, the calculation formula is as follows:
wherein B is a collection of data;
step S240, selecting a value index RE selecting part according to the quantized samplesData sk 1 ' an exact match dataset F ' is made, which dataset F ' contains sk 1 ' exact match results for each instruction fragment in.
4. The heterogeneous device ontology matching method of claim 1, wherein the step S300 includes the following sub-steps:
step S310, calculating the instruction fragment set sk 1 Instruction intent token vector and generic semantic ontology set sk for each element in a library 2 The instruction intent of each element of (a) characterizes the Euclidean distance between vectors;
5. The heterogeneous device ontology matching method of claim 1, wherein the specific content of step S400 is as follows: the optimal mapping matrix F is solved using the classical algorithm sink horn in the optimal transmission theory, sink horn algorithm defining a mapping matrix f=diag (u) ·k·diag (v), where K: =e -λS Wherein λ is a super parameter, and is used to control the difference between the elements of the mapping matrix F, where the larger λ is, the smaller the difference is; s is the similarity matrix, and diag (u) and diag (v) respectively refer to square matrixes with vectors u and v as diagonal lines and the rest of 0; the vector u and the vector v are calculated based on an iterative mode, and initialized to be all 1 vectors, and the calculation method comprises the following steps:
wherein the method comprises the steps ofThe term-wise division of vector elements is represented, and a and b respectively represent normalized vectors of the similarity matrix S calculated according to rows and columns; the sink horn solving process will iteratively solve F until convergence or maximum number of steps is reached; mapping matrixFor element f i,j E F, if the instruction fragment set sk 1 Element p of (a) i And general semantic ontology set sk 2 Medium element q j Should align then f i,j =1, otherwise is f i,j =0。
6. The heterogeneous device ontology matching method of claim 1, wherein the calculation formula of the objective function f1 in step S500 is:
wherein the first itemIn order to minimize the sum of the Euclidean distances between the matched segments; second term λ (-f) i,j logf i,j ) Is the additional entropy regularization target, so even f i,j The non-integer is: f (f) i,j ∈[0,1]It is still possible to make its value as close to 0 or 1 as possible.
7. The heterogeneous device ontology matching method of claim 1, wherein the step S600 includes the following sub-steps:
step S610, updating the mapping matrix F with the exact matching result in the exact matching data F', and comparing the corresponding F in the mapping matrix F i,j Setting 1, and setting 0 for other tags;
step S620, a triplet is constructed for each instruction fragment, i.e. for instruction fragment p i Construct triplet (p i ,q pos ,q neg ) Wherein p is i And q pos Are alignable positive samples obtained based on the mapping matrix F, and q neg Is from the generic semantic ontology set sk 2 Sampled negative samples, i.e. from the generic semantic ontology set sk 2 Is randomly extracted a non-q pos Fragments;
step S630, on the basis of the triplets, training the intent description encoder and the instruction fragment encoder simultaneously by minimizing the target f2, wherein the calculation formula of the target function f2 is as follows:
wherein dis (·) is a similarity measure between vector representations, which can be calculated using euclidean distance;
step S640, updating the instruction understanding model according to the calculation result.
8. The heterogeneous device ontology matching method based on semantic information fusion according to claim 1, wherein the instruction intention dictionary in step S700 includes a one-to-one correspondence between general instructions in the semantic ontology and specific instruction fragments of a single internet of things device.
9. The heterogeneous device ontology matching method for fusing semantic information according to claim 1, wherein the heterogeneous device ontology matching method is characterized in that:
the intent description encoder is constructed from a Transformer layer; each transducer layer comprises a self-attention layer and a feedforward neural network layer, and the attention mechanism of the self-attention layer is utilized to automatically learn the characteristic with resolution so as to meet the training target; the input is a sequence of descriptors, and a placeholder "[ cls ]" is added at the forefront of the sequence; for each word input, the output is a vectorized representation thereof obtained by the graph description encoder; vector representations of "[ cls ]" are used as descriptive characterizations;
the instruction fragment encoder uses the same settings and operations as the intent description encoder, the only difference being that the instruction fragment content cannot use a pre-trained language model; the sentence contains many special characters and abbreviations and therefore cannot be parsed directly using a pre-trained language model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110530094.3A CN113515930B (en) | 2021-05-14 | 2021-05-14 | Heterogeneous device ontology matching method integrating semantic information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110530094.3A CN113515930B (en) | 2021-05-14 | 2021-05-14 | Heterogeneous device ontology matching method integrating semantic information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113515930A CN113515930A (en) | 2021-10-19 |
CN113515930B true CN113515930B (en) | 2023-05-30 |
Family
ID=78064277
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110530094.3A Active CN113515930B (en) | 2021-05-14 | 2021-05-14 | Heterogeneous device ontology matching method integrating semantic information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113515930B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114818700B (en) * | 2022-05-10 | 2022-09-23 | 东南大学 | Ontology concept matching method based on paired connectivity graph and graph neural network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110032635A (en) * | 2019-04-22 | 2019-07-19 | 齐鲁工业大学 | One kind being based on the problem of depth characteristic fused neural network to matching process and device |
CN111325028A (en) * | 2020-02-20 | 2020-06-23 | 齐鲁工业大学 | Intelligent semantic matching method and device based on deep hierarchical coding |
CN112749566A (en) * | 2019-10-31 | 2021-05-04 | 兰雨晴 | English writing auxiliary oriented semantic matching method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8027948B2 (en) * | 2008-01-31 | 2011-09-27 | International Business Machines Corporation | Method and system for generating an ontology |
-
2021
- 2021-05-14 CN CN202110530094.3A patent/CN113515930B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110032635A (en) * | 2019-04-22 | 2019-07-19 | 齐鲁工业大学 | One kind being based on the problem of depth characteristic fused neural network to matching process and device |
CN112749566A (en) * | 2019-10-31 | 2021-05-04 | 兰雨晴 | English writing auxiliary oriented semantic matching method and device |
CN111325028A (en) * | 2020-02-20 | 2020-06-23 | 齐鲁工业大学 | Intelligent semantic matching method and device based on deep hierarchical coding |
Non-Patent Citations (1)
Title |
---|
用语谱图融合小波变换进行特定人二字汉语词汇识别;魏莹;王双维;潘迪;张玲;许廷发;梁士利;;计算机应用(S1);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113515930A (en) | 2021-10-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Two are better than one: Joint entity and relation extraction with table-sequence encoders | |
Banerjee et al. | Clustering with Bregman divergences. | |
WO2018051841A1 (en) | Model learning device, method therefor, and program | |
CN112818676A (en) | Medical entity relationship joint extraction method | |
CN114492363B (en) | Small sample fine adjustment method, system and related device | |
CN111291165B (en) | Method and device for embedding training word vector into model | |
Yang et al. | Quadratic nonnegative matrix factorization | |
CN113268609A (en) | Dialog content recommendation method, device, equipment and medium based on knowledge graph | |
CN111368542A (en) | Text language association extraction method and system based on recurrent neural network | |
CN113515930B (en) | Heterogeneous device ontology matching method integrating semantic information | |
Abernethy et al. | A mechanism for sample-efficient in-context learning for sparse retrieval tasks | |
CN113535897A (en) | Fine-grained emotion analysis method based on syntactic relation and opinion word distribution | |
Wu et al. | WTMED at MEDIQA 2019: A hybrid approach to biomedical natural language inference | |
WO2020100738A1 (en) | Processing device, processing method, and processing program | |
CN115271063A (en) | Inter-class similarity knowledge distillation method and model based on feature prototype projection | |
CN111611395B (en) | Entity relationship identification method and device | |
Frieder et al. | Large language models for mathematicians | |
Czasonis et al. | Relevance-Based Prediction: A Transparent and Adaptive Alternative to Machine Learning. | |
Jeon et al. | Pet: Parameter-efficient knowledge distillation on transformer | |
Katayama et al. | Robust and sparse Gaussian graphical modelling under cell‐wise contamination | |
CN112131363B (en) | Automatic question and answer method, device, equipment and storage medium | |
WO2020100739A1 (en) | Learning device, learning method and learning program | |
Iqbal et al. | Computational Technique for an Efficient Classification of Protein Sequences With Distance‐Based Sequence Encoding Algorithm | |
Yu | Latent Structure Estimation for Panel Data and Theoretical Guarantees for Stochastic Optimization | |
Chen et al. | Multi‐layer features ensemble soft sensor regression model based on stacked autoencoder and vine copula |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |