CN113515930A - Heterogeneous equipment body matching method fusing semantic information - Google Patents

Heterogeneous equipment body matching method fusing semantic information Download PDF

Info

Publication number
CN113515930A
CN113515930A CN202110530094.3A CN202110530094A CN113515930A CN 113515930 A CN113515930 A CN 113515930A CN 202110530094 A CN202110530094 A CN 202110530094A CN 113515930 A CN113515930 A CN 113515930A
Authority
CN
China
Prior art keywords
instruction
instruction fragment
mapping matrix
fragment
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110530094.3A
Other languages
Chinese (zh)
Other versions
CN113515930B (en
Inventor
孙海峰
庄子睿
成岱璇
赵津宇
戚琦
王敬宇
李炜
廖建新
王晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202110530094.3A priority Critical patent/CN113515930B/en
Publication of CN113515930A publication Critical patent/CN113515930A/en
Application granted granted Critical
Publication of CN113515930B publication Critical patent/CN113515930B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

A heterogeneous equipment body matching method fusing semantic information comprises the following steps: (1) set sk of instruction fragments1And a common semantic ontology set sk2Inputting all elements in the instruction understanding model one by one to generate an instruction intention characterization vector; (2) screening small part of instruction fragment set sk from training data set1'for making an exact match dataset F'; (3) calculating a similarity matrix S; (4) calculating a mapping matrix F; (5) calculating an objective function f 1; (6) updating the mapping matrix F, and calculating an objective function F2; (7) and (5) repeating the steps (2) to (6) for multiple times until the mapping matrix F achieves accurate matching sk'1The resulting mapping matrix F will be used to generate the instruction intent dictionary.

Description

Heterogeneous equipment body matching method fusing semantic information
Technical Field
The invention relates to a heterogeneous equipment body matching method fusing semantic information, belongs to the technical field of Internet of things, and particularly belongs to the technical field of heterogeneous equipment body matching.
Background
In the field of internet of things, different devices use different instruction languages to express the same instruction intention, because suppliers intentionally design instruction grammars that are quite different from competitors to increase the switching cost of customers. In addition, the instruction syntax of the device is also strictly protected by the patent. Thus, there is no clear one-to-one correspondence between instruction statements from different vendors, and even different representations for the same term. This results in that it is extremely difficult to manage the network when heterogeneous devices are included in the network. Therefore, how to match the instruction fragments of different devices with a general instruction intention set (namely a semantic ontology) reduces the difficulty in managing heterogeneous device networks, and becomes a technical problem which needs to be solved urgently in the technical field of the internet of things at present.
Disclosure of Invention
In view of this, the invention aims to provide a method for matching an instruction fragment of internet of things equipment with a general semantic ontology based on a deep learning model. In order to achieve the purpose, the invention provides a heterogeneous equipment body matching method fusing semantic information, which is used for matching an instruction fragment of Internet of things equipment with a general semantic body; the semantic ontology refers to a general instruction intention set, and the instruction fragment is a specific instruction when the Internet of things equipment executes an intention; is provided with N1Instruction fragment set of individual elements
Figure BDA0003067273180000011
And has N2Individual element universal semantic ontology set
Figure BDA0003067273180000012
Figure BDA0003067273180000013
The method finds sk1And sk2Pairs of elements with the same intent in between, each element can only be matched once;
the method specifically comprises the following operation steps:
step S100, set sk of instruction fragment1And a common semantic ontology set sk2All elements in the system are input into an instruction understanding model one by one to generate respective instruction intention characterization vectors; the input instruction understanding model comprises an intention description coder and an instruction fragment coder;
step S200, screening a small part of instruction fragment sets sk from the whole training data set1'for making an exact match dataset F';
step S300, calculating a similarity matrix based on the instruction intention characterization vector
Figure BDA0003067273180000021
Step S400, calculating a mapping matrix based on the similarity matrix S
Figure BDA0003067273180000022
Step S500, calculating an objective function F1 based on the similarity matrix S and the mapping matrix F, wherein the calculation result is used for back propagation to update the mapping matrix F;
step S600, updating a corresponding part of a mapping matrix F by using the accurate matching data set F', calculating an objective function F2, and using a calculation result for back propagation to update an instruction understanding model;
step S700, the steps S200 to S600 are repeated for multiple times until the mapping matrix F realizes accurate matching sk'1(ii) a The resulting mapping matrix F will be used to generate an instruction intent dictionary.
The step S100 includes the following operation sub-steps:
step S110, for the fingerLet the fragment set sk1And a common semantic ontology set sk2Each element in the element, and the description text content and the instruction content in the element are respectively input into an intention description coder and an instruction fragment coder;
and step S120, connecting the coding results of the two coders into a vector, wherein the vector is used as an instruction intention characterization vector of the element.
The step S200 specifically includes the following operation substeps:
step S210, calculating the instruction fragment set according to the following formula
Figure BDA0003067273180000023
Information entropy E (p) of each element in (1)i):
Figure BDA0003067273180000024
Wherein p isi,jIs an instruction fragment piThe frequency of occurrence of the word j in (b) in the whole instruction fragment set;
step S220, calculating the instruction fragment set sk according to the following formula1All elements in and the general semantic ontology set sk2Redundancy of all elements in (d):
Ri,j=max(0,cos(pi,qj))
in the above formula, cos (p)i,qj) Represents the instruction fragment set sk1Middle element piWith the general semantic ontology set sk2Middle element qjThe instructions of the two are intended to represent the cosine distance between the vectors;
step S230, selecting a value quantity index RE using the ratio of the information entropy to the redundancy as a quantization sample, wherein a calculation formula is as follows:
Figure BDA0003067273180000031
wherein B is a collection of data;
step S240, selecting partial data sk according to the quantized sample selection value index RE1' an exact match dataset F ' is made, the dataset F ' including the sk1' exact match result for each instruction fragment in.
The step S300 includes the following operation sub-steps:
step S310, calculating the instruction fragment set sk1The instruction intention characterization vector and the general semantic ontology set sk of each element2The instructions of each element in (a) are intended to characterize the euclidean distance between the vectors;
step S320, all calculation results form a similarity matrix
Figure BDA0003067273180000032
Wherein the element si,je.S represents the instruction fragment set sk1Element p in (1)iWith the general semantic ontology set sk2Middle element qjThe euclidean distance between them.
The specific content of step S400 is as follows: solving the optimal mapping matrix F using a classical algorithm Sinkhorn in optimal transmission theory, the Sinkhom algorithm defining a mapping matrix F ═ diag (u) K · diag (v), where K: e ═ e-λSIn the formula, λ is a hyper-parameter used for controlling the difference between the elements of the mapping matrix F, and the larger λ is, the smaller the difference is; s is the similarity matrix, diag (u) and diag (v) refer to the remaining 0 squares with vector u and vector v as diagonals, respectively. The vector u and the vector v are calculated based on an iterative mode, and are initialized to be vectors with all 1, and the calculation method comprises the following steps:
Figure BDA0003067273180000033
Figure BDA0003067273180000034
wherein
Figure BDA0003067273180000035
Representing the division of vector elements item by item, a and b respectively representing the normalized vectors of the sum of the similarity matrix S calculated according to rows and columns; the Sinkhorn solution process will iteratively solve for F until convergence or maximum number of steps is reached; mapping matrix
Figure BDA0003067273180000036
For the element fi,jE.g. F, if the instruction fragment set sk1Element p in (1)iWith the general semantic ontology set sk2Middle element qjShould be aligned then fi,j1, otherwise fi,j=0。
The calculation formula of the objective function f1 in step S500 is:
Figure BDA0003067273180000041
wherein the first item
Figure BDA0003067273180000042
Is to minimize the sum of euclidean distances between matched segments; second term lambda (-f)i,jlogfi,j) Is an additional entropy regularization target, such that even fi,jNot an integer, namely: f. ofi,j∈[0,1]It can still be made as close to 0 or 1 as possible.
The step S600 includes the following operation sub-steps:
step S610, updating the mapping matrix F by the accurate matching result in the accurate matching data F', and updating the corresponding F in the mapping matrix Fi,jSet to 1, set the other tags to 0;
in step S620, a triple is constructed for each instruction fragment, i.e. for instruction fragment piConstruction of a triplet (p)i,qpos,qneg) Wherein p isiAnd q isposAre alignable positive samples obtained based on a mapping matrix F, and qnegIs from a general semantic ontology set sk2Negative samples sampled from, i.e. from, the general semantic ontologyCollection sk2In order to randomly extract a non-qposA fragment;
step S630, on the basis of the triplet, by minimizing the target f2Training the intention description encoder and the instruction fragment encoder simultaneously, the objective function f2The calculation formula of (2) is as follows:
Figure BDA0003067273180000043
where dis (·) is a similarity measure between vector representations, which can be computed using euclidean distances.
And step S640, updating the instruction understanding model according to the calculation result.
The instruction intention dictionary in step S700 includes a one-to-one correspondence between the general instructions in the semantic ontology and the specific instruction fragments of a single internet of things device.
The intention description encoder is constructed by a transform layer; each Transformer layer comprises a self-attention layer and a feedforward neural network layer, and the attention mechanism of the self-attention layer is utilized to automatically learn the characteristics with the resolving power so as to meet the training target; the input is a sequence of descriptors with a placeholder "[ cls ]" added at its forefront; for each input word, the output is its vectorized representation obtained by the graph description encoder; the vector representation using "[ cls ]" is characterized as a description;
the instruction fragment encoder uses the same settings and operations as the intent description encoder, the only difference being that the instruction fragment content cannot use a pre-trained language model; the sentence contains many special characters and abbreviations and therefore cannot be parsed directly using a pre-trained language model.
The invention has the beneficial effects that: according to the invention, the instruction fragments of various heterogeneous devices are matched with the universal semantic ontology on the basis of the intention description, so that the difficulty in managing the heterogeneous device network is reduced; according to the method, the model is understood through the training instruction, the similarity of vector representations among elements which can be matched is increased, and the similarity of vector representations of elements which cannot be matched is reduced; the method is used for manufacturing the matching data set by screening a small amount of data, so that the matching performance of the model is ensured while the manufacturing difficulty is reduced.
Drawings
Fig. 1 is a flowchart of a heterogeneous device ontology matching method fusing semantic information according to the present invention;
FIG. 2 is a comparison graph of experimental results of the impact of different sample selection methods on matching accuracy in an embodiment of the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings.
Referring to fig. 1, the invention provides a heterogeneous device ontology matching method fusing semantic information, which matches an instruction fragment of an internet of things device with a general semantic ontology; the semantic ontology refers to a general instruction intention set, and the instruction fragment is a specific instruction when the Internet of things equipment executes an intention; is provided with N1Instruction fragment set of individual elements
Figure BDA0003067273180000051
And has N2Individual element universal semantic ontology set
Figure BDA0003067273180000052
The method finds sk1And sk2Pairs of elements with the same intent in between, each element can only be matched once; table 1 is an example of an instruction fragment set and a common semantic ontology set.
TABLE 1
Figure BDA0003067273180000053
Figure BDA0003067273180000061
The method specifically comprises the following operation steps:
step S100, set sk of instruction fragment1And a common semantic ontology set sk2All elements in the system are input into an instruction understanding model one by one to generate respective instruction intention characterization vectors; the input instruction understanding model comprises an intention description coder and an instruction fragment coder;
step S200, screening a small part of instruction fragment sets sk from the whole training data set1'for making an exact match dataset F'; the implementation details are as follows: b batches of sample data are extracted from all instruction fragments, wherein B is set to be 10, namely only 10 instruction fragments are extracted for manufacturing an accurate matching data set; and finding out a universal semantic ontology matched with each instruction fragment according to candidate configuration matching in a Common Configuration Tree (CCT).
Step S300, calculating a similarity matrix based on the instruction intention characterization vector
Figure BDA0003067273180000062
Step S400, calculating a mapping matrix based on the similarity matrix S
Figure BDA0003067273180000063
Step S500, calculating an objective function F1 based on the similarity matrix S and the mapping matrix F, wherein the calculation result is used for back propagation to update the mapping matrix F;
step S600, updating a corresponding part of a mapping matrix F by using the accurate matching data set F', calculating an objective function F2, and using a calculation result for back propagation to update an instruction understanding model;
step S700, the steps S200 to S600 are repeated for multiple times until the mapping matrix F realizes accurate matching sk'1(ii) a The resulting mapping matrix F will be used to generate an instruction intent dictionary.
The step S100 includes the following operation sub-steps:
step S110, for the instruction fragmentCollection sk1And a common semantic ontology set sk2Each element in the element, and the description text content and the instruction content in the element are respectively input into an intention description coder and an instruction fragment coder;
and step S120, connecting the coding results of the two coders into a vector, wherein the vector is used as an instruction intention characterization vector of the element.
The step S200 specifically includes the following operation substeps:
step S210, calculating the instruction fragment set according to the following formula
Figure BDA0003067273180000071
Information entropy E (p) of each element in (1)i):
Figure BDA0003067273180000072
Wherein p isi,jIs an instruction fragment piThe frequency of occurrence of the word j in (b) in the whole instruction fragment set;
step S220, calculating the instruction fragment set sk according to the following formula1All elements in and the general semantic ontology set sk2Redundancy of all elements in (d):
Ri,j=max(0,cos(pi,qj))
in the above formula, cos (p)i,qj) Represents the instruction fragment set sk1Middle element piWith the general semantic ontology set sk2Middle element qjThe instructions of the two are intended to represent the cosine distance between the vectors;
step S230, selecting a value quantity index RE using the ratio of the information entropy to the redundancy as a quantization sample, wherein a calculation formula is as follows:
Figure BDA0003067273180000073
wherein B is a collection of data;
step S240, selecting partial data sk according to the quantized sample selection value index RE1' an exact match dataset F ' is made, the dataset F ' including the sk1' exact match result for each instruction fragment in.
The step S300 includes the following operation sub-steps:
step S310, calculating the instruction fragment set sk1The instruction intention characterization vector and the general semantic ontology set sk of each element2The instructions of each element in (a) are intended to characterize the euclidean distance between the vectors;
step S320, all calculation results form a similarity matrix
Figure BDA0003067273180000074
Wherein the element si,je.S represents the instruction fragment set sk1Element p in (1)iWith the general semantic ontology set sk2Middle element qjThe euclidean distance between them.
TABLE 2
Figure BDA0003067273180000075
Figure BDA0003067273180000081
Based on the instruction fragments and the general semantic ontology in table 1, the similarity matrix obtained by calculation according to the above steps is shown in table 2, where the number of rows of the matrix is 10 and is equal to the number of instruction fragments, and the number of columns is 10 and is equal to the number of semantic ontologies. Element value s in ith row and jth column of Table 2i,jRepresenting the Euclidean distance between the ith instruction fragment and the jth semantic ontology. The minimum distance value in each row is shown in bold.
The specific content of step S400 is as follows: solving the optimal mapping matrix F using a classical algorithm Sinkhorn in optimal transmission theory, the Sinkhom algorithm defining a mapping matrix F ═ diag (u) K · diag (v), where K: e ═ e-λSIn the formula, λ is a hyper-parameter used for controlling the difference between the elements of the mapping matrix F, and the larger λ is, the smaller the difference is; the hyper-parameter λ is set to 0.1 in the present embodiment.
S is the similarity matrix, diag (u) and diag (v) refer to the remaining 0 squares with vector u and vector v as diagonals, respectively. The vector u and the vector v are calculated based on an iterative mode, and are initialized to be vectors with all 1, and the calculation method comprises the following steps:
Figure BDA0003067273180000082
Figure BDA0003067273180000083
wherein
Figure BDA0003067273180000084
Representing the division of vector elements item by item, a and b respectively representing the normalized vectors of the sum of the similarity matrix S calculated according to rows and columns; the Sinkhorn solution process will iteratively solve for F until convergence or maximum number of steps is reached; mapping matrix
Figure BDA0003067273180000085
For the element fi,jE.g. F, if the instruction fragment set sk1Element p in (1)iWith the general semantic ontology set sk2Middle element qjShould be aligned then fi,j1, otherwise fi,j=0。
The calculation formula of the objective function f1 in step S500 is:
Figure BDA0003067273180000086
wherein the first item
Figure BDA0003067273180000087
Is to aim at the Euclidean distance between the segments to be matchedMinimizing the sum of the distances; second term lambda (-f)i,jlogfi,j) Is an additional entropy regularization target, such that even fi,jNot an integer, namely: f. ofi,j∈[0,1]It can still be made as close to 0 or 1 as possible.
The step S600 includes the following operation sub-steps:
step S610, updating the mapping matrix F by the accurate matching result in the accurate matching data F', and updating the corresponding F in the mapping matrix Fi,jSet to 1, set the other tags to 0;
in step S620, a triple is constructed for each instruction fragment, i.e. for instruction fragment piConstruction of a triplet (p)i,qpos,qneg) Wherein p isiAnd q isposAre alignable positive samples obtained based on a mapping matrix F, and qnegIs from a general semantic ontology set sk2Negative samples sampled from, i.e. from, the common semantic ontology set sk2In order to randomly extract a non-qposA fragment;
step S630, on the basis of the triplet, by minimizing the target f2Training the intention description encoder and the instruction fragment encoder simultaneously, the objective function f2The calculation formula of (2) is as follows:
Figure BDA0003067273180000091
dis (·) is a similarity measure between vector representations, which can be calculated using euclidean distance;
and step S640, updating the instruction understanding model according to the calculation result.
The instruction intention dictionary in step S700 includes a one-to-one correspondence between the general instructions in the semantic ontology and the specific instruction fragments of a single internet of things device.
Until accurate matching sk 'is realized by mapping matrix F'1
TABLE 3
0 0 0 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0 0 0
Mapping matrix of the invention
Figure BDA0003067273180000092
Showing the matching relation between the instruction fragment and the general semantic ontology, wherein the element f of the ith row and the jth columni,j1 means that the ith instruction fragment matches the jth common semantic ontology, fi,jAnd 0 indicates that the ith instruction fragment does not match the jth general semantic ontology. Since each element can only be matched once, each row can only have one value of 1. For the instruction fragments and the general semantic ontology in table 1, the mapping matrix F obtained by final calculation is shown in table 3, where each row of the matrix corresponds to one instruction fragment, and each column corresponds to one semantic ontology, and there are 10 rows and 10 columns in total.
In step S700, the mapping matrix F finally obtained is used to generate an instruction intent dictionary. In the present embodiment, the instruction intention dictionary in step S700 contains a one-to-one correspondence relationship between the general instructions in the general semantic ontology and the specific instruction fragments of a single device.
TABLE 4
Figure BDA0003067273180000101
The instruction intention dictionary obtained from table 3 is shown in table 4, the instruction fragment of each row in table 4 is matched with the universal semantic ontology on the right side thereof, and the accuracy of the matching result is 100%.
The intention description encoder is constructed by a transform layer; each Transformer layer comprises a self-attention layer and a feedforward neural network layer, and the attention mechanism of the self-attention layer is utilized to automatically learn the characteristics with the resolving power so as to meet the training target; the input is a sequence of descriptors with a placeholder "[ cls ]" added at its forefront; for each input word, the output is its vectorized representation obtained by the graph description encoder; the vector representation using "[ cls ]" is characterized as a description; it is intended to describe that the encoder uses a Pre-trained BERT model (BERT: Pre-training of Deep Bidirectional transducers for length estimation) BERT-small, with 4 layers of transducers, each layer having 512 heads.
The instruction fragment encoder uses the same setup and operation as the intent description encoder. The only difference is that the instruction fragment content cannot use the pre-trained language model. The statement contains many special characters and abbreviations and therefore cannot be parsed directly using a pre-trained language model, the model variables being initialized randomly.
During training, an Adam optimizer with a learning rate of 10-5 is used.
The inventors used instruction fragments from 689 configuration files from four vendors (Cisco, Huashi, Xinhua san, Sharp) as the instruction fragment set for training and testing. There were 304 profiles from cisco, 186 from hua ye, 151 from xinhua san, and 67 from sharia. All the vendor profiles come from data centers that support the same service and the devices perform the same network architecture role.
The experimental results show that the alignment accuracy of the present invention can be 100% for different suppliers, which demonstrates the robustness of the method of the present invention to a variety of different environments. Because the invention considers the correlation between the intention description and the instruction fragment, even though the instruction fragments of different suppliers may be greatly different, the invention can still realize the matching of the instruction fragments of various heterogeneous devices with the common semantic ontology by taking the intention description as the reference.
In step S200, the invention adopts a mechanism for automatically screening sample data, so the inventors also studied the influence of different sample selection methods on the matching precision. In addition to the sample selection method of the present invention, we also evaluated the performance of two other sample selection methods (i.e., stochastic and entropy-based only) on four data sets. The random method is to randomly select samples in the learning process, and the method based on the information amount is to calculate the information entropy of all the data in the batch according to the formula in step S210, and then select the sample with higher information entropy. The results of the experiment are shown in FIG. 2.
The accuracy in fig. 2 refers to the ratio of the number of fragments matching correctly to the total number of fragments, and the sample rate refers to the ratio of the number of samples used to make an accurate match data set to the total number of samples. From the experimental results, it can be seen that compared with other methods, the accuracy of the method of the present invention increases fastest with the increase of the number of samples, and only 10% of the labels are required on average to achieve 100% accuracy. This indicates that the swatches of the present invention can participate in achieving the best accuracy with the fewest swatch labels.
Furthermore, it can be seen from the experimental results that the increasing trend of accuracy is consistent for different suppliers, further illustrating the robustness of the method of the present invention to various environments.
The terms "comprises," "comprising," or other similar terms are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process or method.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Without departing from the principle of the invention, a person skilled in the art can make equivalent changes or substitutions on the related technical features, and the technical solutions after the changes or substitutions will fall within the protection scope of the invention.

Claims (9)

1. A heterogeneous equipment body matching method fusing semantic information is characterized by comprising the following steps: matching the instruction fragment of the Internet of things equipment with the universal semantic ontology; the general semantic ontology refers to a general instruction and meaning set, and the instruction fragment isThe specific instruction when the Internet of things equipment executes a certain intention; is provided with N1Instruction fragment set of individual elements
Figure FDA0003067273170000011
And has N2Individual element universal semantic ontology set
Figure FDA0003067273170000012
The method finds sk1And sk2Pairs of elements with the same intent in between, each element can only be matched once;
the method specifically comprises the following operation steps:
step S100, set sk of instruction fragment1And a common semantic ontology set sk2All elements in the system are input into an instruction understanding model one by one to generate respective instruction intention characterization vectors; the input instruction understanding model comprises an intention description coder and an instruction fragment coder;
step S200, screening a small part of instruction fragment sets sk from the whole training data set1'for making an exact match dataset F';
step S300, calculating a similarity matrix based on the instruction intention characterization vector
Figure FDA0003067273170000013
Step S400, calculating a mapping matrix based on the similarity matrix S
Figure FDA0003067273170000014
Step S500, calculating an objective function F1 based on the similarity matrix S and the mapping matrix F, wherein the calculation result is used for back propagation to update the mapping matrix F;
step S600, updating a corresponding part of a mapping matrix F by using the accurate matching data set F', calculating an objective function F2, and using a calculation result for back propagation to update an instruction understanding model;
step S700, the steps S200 to S600 are circulated for a plurality of times untilTo mapping matrix F to achieve accurate matching sk'1(ii) a The resulting mapping matrix F will be used to generate an instruction intent dictionary.
2. The method for matching the heterogeneous device ontology with fused semantic information according to claim 1, wherein the step S100 comprises the following operation substeps:
step S110, for the instruction fragment set sk1And a common semantic ontology set sk2Each element in the element, and the description text content and the instruction content in the element are respectively input into an intention description coder and an instruction fragment coder;
and step S120, connecting the coding results of the two coders into a vector, wherein the vector is used as an instruction intention characterization vector of the element.
3. The method for matching the heterogeneous device ontology with fused semantic information according to claim 1, wherein the step S200 specifically includes the following operation substeps:
step S210, calculating the instruction fragment set according to the following formula
Figure FDA0003067273170000021
Information entropy E (p) of each element in (1)i):
Figure FDA0003067273170000022
Wherein p isi,jIs an instruction fragment piThe frequency of occurrence of the word j in (b) in the whole instruction fragment set;
step S220, calculating the instruction fragment set sk according to the following formula1All elements in and the general semantic ontology set sk2Redundancy of all elements in (d):
Ri,j=max(0,cos(pi,qj))
in the above formula, cos (p)i,qj) Represents the instruction fragment set sk1Middle element piWith the general semantic ontology set sk2Middle element qjThe instructions of the two are intended to represent the cosine distance between the vectors;
step S230, selecting a value quantity index RE using the ratio of the information entropy to the redundancy as a quantization sample, wherein a calculation formula is as follows:
Figure FDA0003067273170000023
wherein B is a collection of data;
step S240, selecting partial data sk according to the quantized sample selection value index RE1' an exact match dataset F ' is made, the dataset F ' including the sk1' exact match result for each instruction fragment in.
4. The method for matching the heterogeneous device ontology with fused semantic information according to claim 1, wherein the step S300 comprises the following operation sub-steps:
step S310, calculating the instruction fragment set sk1The instruction intention characterization vector and the general semantic ontology set sk of each element2The instructions of each element in (a) are intended to characterize the euclidean distance between the vectors;
step S320, all calculation results form a similarity matrix
Figure FDA0003067273170000024
Wherein the element si,je.S represents the instruction fragment set sk1Element p in (1)iWith the general semantic ontology set sk2Middle element qjThe euclidean distance between them.
5. The method for matching the heterogeneous device ontology with fused semantic information according to claim 1, wherein the specific content of the step S400 is as follows: solving the optimal mapping matrix F, S by using the classical algorithm Sinkhorn in the optimal transmission theoryThe inkhorn algorithm defines a mapping matrix F ═ diag (u) · K · diag (v), where K: e ═ e-λSIn the formula, λ is a hyper-parameter used for controlling the difference between the elements of the mapping matrix F, and the larger λ is, the smaller the difference is; s is the similarity matrix, diag (u) and diag (v) refer to the remaining 0 squares with vector u and vector v as diagonals, respectively. The vector u and the vector v are calculated based on an iterative mode, and are initialized to be vectors with all 1, and the calculation method comprises the following steps:
Figure FDA0003067273170000031
Figure FDA0003067273170000032
wherein
Figure FDA0003067273170000033
Representing the division of vector elements item by item, a and b respectively representing the normalized vectors of the sum of the similarity matrix S calculated according to rows and columns; the Sinkhorn solution process will iteratively solve for F until convergence or maximum number of steps is reached; mapping matrix
Figure FDA0003067273170000034
For the element fi,jE.g. F, if the instruction fragment set sk1Element p in (1)iWith the general semantic ontology set sk2Middle element qjShould be aligned then fi,j1, otherwise fi,j=0。
6. The method for matching the heterogeneous device ontology with fused semantic information according to claim 1, wherein the objective function f1 in step S500 is calculated by the following formula:
Figure FDA0003067273170000035
wherein the first item
Figure FDA0003067273170000036
Is to minimize the sum of euclidean distances between matched segments; second term lambda (-f)i,jlogfi,j) Is an additional entropy regularization target, such that even fi,jNot an integer, namely: f. ofi,j∈[0,1]It can still be made as close to 0 or 1 as possible.
7. The method for matching the heterogeneous device ontology with fused semantic information according to claim 1, wherein the step S600 comprises the following operation substeps:
step S610, updating the mapping matrix F by the accurate matching result in the accurate matching data F', and updating the corresponding F in the mapping matrix Fi,jSet to 1, set the other tags to 0;
in step S620, a triple is constructed for each instruction fragment, i.e. for instruction fragment piConstruction of a triplet (p)i,qpos,qneg) Wherein p isiAnd q isposAre alignable positive samples obtained based on a mapping matrix F, and qnegIs from a general semantic ontology set sk2Negative samples sampled from, i.e. from, the common semantic ontology set sk2In order to randomly extract a non-qposA fragment;
step S630, on the basis of the triplet, by minimizing the target f2Training the intention description encoder and the instruction fragment encoder simultaneously, the objective function f2The calculation formula of (2) is as follows:
Figure FDA0003067273170000041
dis (·) is a similarity measure between vector representations, which can be calculated using euclidean distance;
and step S640, updating the instruction understanding model according to the calculation result.
8. The method as claimed in claim 1, wherein the instruction intent dictionary in step S700 includes a one-to-one correspondence relationship between general instructions in the semantic ontology and specific instruction fragments of a single internet of things device.
9. The heterogeneous device ontology matching method fusing semantic information according to claim 1, wherein:
the intention description encoder is constructed by a transform layer; each Transformer layer comprises a self-attention layer and a feedforward neural network layer, and the attention mechanism of the self-attention layer is utilized to automatically learn the characteristics with the resolving power so as to meet the training target; the input is a sequence of descriptors with a placeholder "[ cls ]" added at its forefront; for each input word, the output is its vectorized representation obtained by the graph description encoder; the vector representation using "[ cls ]" is characterized as a description;
the instruction fragment encoder uses the same settings and operations as the intent description encoder, the only difference being that the instruction fragment content cannot use a pre-trained language model; the sentence contains many special characters and abbreviations and therefore cannot be parsed directly using a pre-trained language model.
CN202110530094.3A 2021-05-14 2021-05-14 Heterogeneous device ontology matching method integrating semantic information Active CN113515930B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110530094.3A CN113515930B (en) 2021-05-14 2021-05-14 Heterogeneous device ontology matching method integrating semantic information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110530094.3A CN113515930B (en) 2021-05-14 2021-05-14 Heterogeneous device ontology matching method integrating semantic information

Publications (2)

Publication Number Publication Date
CN113515930A true CN113515930A (en) 2021-10-19
CN113515930B CN113515930B (en) 2023-05-30

Family

ID=78064277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110530094.3A Active CN113515930B (en) 2021-05-14 2021-05-14 Heterogeneous device ontology matching method integrating semantic information

Country Status (1)

Country Link
CN (1) CN113515930B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114818700A (en) * 2022-05-10 2022-07-29 东南大学 Ontology concept matching method based on paired connectivity graph and graph neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090198642A1 (en) * 2008-01-31 2009-08-06 International Business Machines Corporation Method and system for generating an ontology
CN110032635A (en) * 2019-04-22 2019-07-19 齐鲁工业大学 One kind being based on the problem of depth characteristic fused neural network to matching process and device
CN111325028A (en) * 2020-02-20 2020-06-23 齐鲁工业大学 Intelligent semantic matching method and device based on deep hierarchical coding
CN112749566A (en) * 2019-10-31 2021-05-04 兰雨晴 English writing auxiliary oriented semantic matching method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090198642A1 (en) * 2008-01-31 2009-08-06 International Business Machines Corporation Method and system for generating an ontology
CN110032635A (en) * 2019-04-22 2019-07-19 齐鲁工业大学 One kind being based on the problem of depth characteristic fused neural network to matching process and device
CN112749566A (en) * 2019-10-31 2021-05-04 兰雨晴 English writing auxiliary oriented semantic matching method and device
CN111325028A (en) * 2020-02-20 2020-06-23 齐鲁工业大学 Intelligent semantic matching method and device based on deep hierarchical coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
魏莹;王双维;潘迪;张玲;许廷发;梁士利;: "用语谱图融合小波变换进行特定人二字汉语词汇识别", 计算机应用 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114818700A (en) * 2022-05-10 2022-07-29 东南大学 Ontology concept matching method based on paired connectivity graph and graph neural network

Also Published As

Publication number Publication date
CN113515930B (en) 2023-05-30

Similar Documents

Publication Publication Date Title
CN109829299B (en) Unknown attack identification method based on depth self-encoder
CN108268444B (en) Chinese word segmentation method based on bidirectional LSTM, CNN and CRF
Ayache et al. Explaining black boxes on sequential data using weighted automata
US20180144234A1 (en) Sentence Embedding for Sequence-To-Sequence Matching in a Question-Answer System
JP7259650B2 (en) Translation device, translation method and program
CN111310475A (en) Training method and device of word sense disambiguation model
CN114492363B (en) Small sample fine adjustment method, system and related device
CN113268609A (en) Dialog content recommendation method, device, equipment and medium based on knowledge graph
JP6738769B2 (en) Sentence pair classification device, sentence pair classification learning device, method, and program
CN112687328B (en) Method, apparatus and medium for determining phenotypic information of clinical descriptive information
CN111368542A (en) Text language association extraction method and system based on recurrent neural network
CN113987174A (en) Core statement extraction method, system, equipment and storage medium for classification label
CN111125316A (en) Knowledge base question-answering method integrating multiple loss functions and attention mechanism
CN113515930A (en) Heterogeneous equipment body matching method fusing semantic information
CN115271063A (en) Inter-class similarity knowledge distillation method and model based on feature prototype projection
CN110737837A (en) Scientific research collaborator recommendation method based on multi-dimensional features under research gate platform
Sonkamble et al. Speech recognition using vector quantization through modified K-MeansLBG Algorithm
CN114330372A (en) Model training method, related device and equipment
JP6586026B2 (en) Word vector learning device, natural language processing device, method, and program
Jeon et al. Pet: Parameter-efficient knowledge distillation on transformer
CN116955644A (en) Knowledge fusion method, system and storage medium based on knowledge graph
Zorrilla et al. Audio embeddings help to learn better dialogue policies
CN112131363B (en) Automatic question and answer method, device, equipment and storage medium
CN115062123A (en) Knowledge base question-answer pair generation method of conversation generation system
CN114997395A (en) Training method of text generation model, method for generating text and respective devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant