CN115422376B

CN115422376B - Network security event source tracing script generation method based on knowledge graph composite embedding

Info

Publication number: CN115422376B
Application number: CN202211382679.6A
Authority: CN
Inventors: 车洵; 孙捷; 胡牧; 程佳; 孙瀚墨
Original assignee: Nanjing Zhongzhiwei Information Technology Co ltd
Current assignee: Nanjing Zhongzhiwei Information Technology Co ltd
Priority date: 2022-11-07
Filing date: 2022-11-07
Publication date: 2023-03-24
Anticipated expiration: 2042-11-07
Also published as: CN115422376A

Abstract

The invention discloses a network security incident source tracing script generating method based on knowledge graph composite embedding, which comprises the following steps: s1: the entity relation is expanded by introducing a text corpus and is used for enriching the entity relation and expanding a knowledge graph; s2: extracting public features in the knowledge graph, and extracting all input public features by using a public extraction layer; s3: extracting relation features in the knowledge graph, and using corresponding relation extraction layers for different embedding relations; s4: projecting the public features and the relation features to an embedding space, and completing the knowledge graph; s5: sequencing the knowledge graphs obtained in the step (S4), acquiring POS token embedding and semantic context scores through corresponding modules, and generating a network security traceability script through the acquired POS token embedding and semantic context scores by a word replication probability prediction module; the network security event tracing script constructed by the method has extremely high applicability and accuracy.

Description

Network security event source tracing script generation method based on knowledge graph composite embedding

Technical Field

The invention relates to the field of knowledge graphs of network security event traceability scripts, in particular to a network security event traceability script generation method based on knowledge graph composite embedding.

Background

In recent years, with the rapid development of the internet, the network threat problem is becoming more serious and frequent. In contrast to the brand new network threat which is distinguished by the characteristics of high attack speed, long latency, wide attack area and the like, the traditional network security tracing method is time-consuming and labor-consuming. This is because even ordinary event tracing can reach multiple systems, and multiple systems involve multiple team personnel, which requires very high labor cost, and such tracing can be performed up to several times a day. The automation and response of SOAR security arrangements have been developed in response to various problems exposed by conventional methods.

Compared with the traditional method, the SOAR has the advantages of high tracing speed, low labor cost and the like. The SOAR has three core technology capabilities, namely a threat information platform, a safety event corresponding platform, safety arrangement and automation. Among these three technologies, security arrangements and automation are undoubtedly important and central functions. The safe arrangement and automation refer to arranging the scripts (playbooks) needing manual tracing in the traditional method in an automatic mode. In the field of automatically constructing the network security event tracing scenario, the network security event tracing scenario automatically constructed based on the knowledge graph is gradually developed, so how to quickly construct the network security event tracing scenario becomes a serious concern.

Disclosure of Invention

In order to achieve the purpose, the inventor provides a method for generating a network security event source-tracing scenario based on knowledge graph composite embedding, which comprises the following steps:

s1: the entity relation is expanded by introducing a text corpus and is used for enriching the entity relation and expanding a knowledge graph;

s2: extracting public features in the knowledge graph, and extracting all input public features by using a public extraction layer;

s3: extracting relation features in the knowledge graph, and using corresponding relation extraction layers for different embedding relations;

s4: projecting the public characteristic and the relation characteristic to an embedding space, and completing the knowledge graph;

s5: and sequencing the knowledge graph obtained in the step S4, acquiring POS token embedding and semantic context scores through a corresponding module, and generating a network security source tracing script through the acquired POS token embedding and semantic context scores by a word replication probability prediction module.

As a preferred mode of the invention, an entity pair (h, t) which is not mentioned is given, LDP with the mentioned entity pair extracted from a text corpus is ranked, an encoder f of the entity pair parameterized by theta is learned for a subject vector h and an object vector t, and the entity pair (h, t) is encoded into f (h, t; theta) through the encoder f;

where the input to encoder f is:

wherein the content of the first and second substances,

denotes concatenation of vectors, @ denotes multiplication of two vectors by element, (h-t) denotes host vector h minus guest vector t; />

LDP set S for connections h and t _(h,t) LDP L is represented by a vector L using a pre-trained sentence encoder SBERT;

such that LDP co-occurring with entity pair (h, t) is similar to f (h, t; θ), LDP associated with both h and t is used as a positive training instance S = { (h, l, t) }, and LDP associated with either h or t alone is used as a negative training instance S' _(h,t) The expression is:

wherein t ' and h ' respectively represent object vectors and subject vectors which are not equal to t and h, l ' represents the relation of a negative training example, and D represents a subject vector and a set of object vectors;

by minimizing S _(h,t) And S' _(h,t) Learning the parameters of f (h, t, θ) with the margin loss of:

wherein gamma ≧ 0 denotes the margin, f (h, t; theta) is calculated using theta obtained by minimizing the above formula, and then the inner product f (h, t; theta) is used ^T L scores each LDP L, and the first u LDPs with the highest inner product score of f (h, t; theta) are selected to expand the knowledge graph, wherein u is a hyperparameter.

As a preferred mode of the present invention, the S2 step includes the steps of:

after the knowledge map is expanded, the knowledge map is set

Wherein e _s Representing the subject vector, e _o Represents a guest vector, <' > is selected>

Representing a relation vector, connecting the main body vector and the relation vector, wherein the expression is as follows:

[e _s ；e _r ] _1d ∈R ^d

wherein, d = d _e +d _r And [ a; b] _1d A vector concatenation representing vectors a and b, the concatenated embedded vector representing the input of all subsequent layers;

extracting common features of vectors through a common dense layer, wherein the width of the common dense layer is the number of filters of the dense layer, and the size of a kernel contained in each filter is equal to the size of an input embedded kernel;

in the common dense layer, an affine function Ω (·) is applied to a given input embedding, and the expression of the common dense layer is:

Ω(x)＝W _h x+b _h

wherein the content of the first and second substances,

width of public dense layer is nd _h Given as, nd _h Denotes d _h Wherein n is a hyperparameter;

by applying a non-linear activation function f (-) to Ω ([ e ] _s ；e _r ]1d) An output of the common feature extraction is obtained.

As a preferred mode of the present invention, the S3 step includes the steps of:

for the relation r, the coding function is represented by Ω _r Expressing, using a relation dense layer to extract relation perception characteristics, and coding a function omega _r Is an affine function, and the expression is:

Ω _r (x)＝W _r x+b _r

wherein the content of the first and second substances,

and d _z Represents omega _r The output length of (d);

will omega _r Application to input embedding [ e ] _s ；e _r ] _1d ∈R ^d Then, a non-linear activation function f (-) is applied, and the relationship-dense layer will have different encoders facing different relationships for extracting the relationship features.

As a preferred mode of the present invention, the S4 step includes the steps of:

after potential vectors are obtained from the relational dense layer and the public dense layer, the vectors are connected, and the connected vectors are projected to an embedding space through a projection matrix, wherein the expression is as follows:

then apply the nonlinear activation f (-) and will

Is defined as follows:

wherein h is _sr For the predicted result, the link prediction score ψ (e) _s ,e _r ,e _o ) Is defined as h _sr And e _o Inner product of (2)

Calculating the scores of all the triples, and calculating the loss by using a binary cross entropy function;

training the strategy using 1:B, let B denote the number of all entities in the knowledge-graph, the binary cross-entropy function loss

The expression of (c) is:

wherein the content of the first and second substances,

representing the i-th object entity, y _i E {0,1} is a label, σ denotes a sigmoid function;

as a preferable mode of the present invention, the S4 step further includes the steps of:

extracting original embedded features U, then carrying out random disturbance transformation on the original embedding, extracting the features U' through an extraction layer, wherein the expression of a loss function is as follows:

L _MC ＝KL(U′||U)

wherein the KL function represents a KL divergence;

a composite loss function of

As a preferable mode of the present invention, the S5 step further includes the steps of: POS token embedding is generated through a POS generator, and semantic context scores are obtained through a semantic context scoring module.

As a preferable mode of the present invention, the S5 step further includes the steps of:

when the knowledge graph is sequenced, the generated triple embedding characteristics are input into a sequencing network, a main body-relation-object triple structure is given, a placeholder is introduced to be filled into a fixed length N, and the main body-relation-object triple structure characteristics F are input into a sequencing network _stru And fill F _pad Connected and get S through the full connection layer with softmax _matrix And predicting the sorting order S _order The expression is:

S _order ＝argmax _row (FC _s ([F _stru ；F _pad ]))

therein, FC _s Denotes a fully connected layer with softmax, argmax _r o _w Indicating that the argmax operation is performed on the row,

considering the sequence prediction task as a classification problem, where N represents the number of classes, the true sequence Go is calculated _rder And the ordering sequence So _rder Cross entropy loss between, the expression:

wherein L is _sort Which is indicative of a loss of ordering,

where N represents the fourth category, N ranging from 0 to N;

the knowledge graph generates an optimal description sequence through a sequencing network, the sequence is further decoded into sentences through a word decoder, and then syntactic supervision is applied through a POS generator, namely: in knowledge map order G _order Conditioned by first marking<Main body>，<Relationships between>，<Object>The corresponding position added to each triplet linearizes the knowledge-graph and obtains G _linear Then word encoder and POS generator with G _linear As input, the word codes WI = { w are output, respectively _i I ∈ 1 … M } and POS tag code PI = { p = { p } _i ,i∈1…M}；

Encoding a token w in a fusion module _i And POS tag code p _i Fusing to obtain the updated token code w _i The expression is:

w _i ＝LN(FC([w _i ；p _i ])+w _i )

wherein LN represents layer normalization, and the updated token code w after fusion _i Decoded in the word decoder as the statement WI '= { w' _i ,i∈1…K}；

The POS generator monitors through POS labels pre-extracted from sentences, and the loss function expression is as follows:

wherein, P _gen Representing the predicted probability from the POS generator;

the loss function expression of the word encoder and decoder is:

wherein, W _gen Representing the prediction probability of each word token.

generating a sliding window for each word to provide local context, filling in the beginning words of the sliding window, and obtaining context information F through the word features in the sliding window _context And input into FC layer to obtain semantic context score X _semantic The expression is:

X _semantic ＝σ(FC(F _context ))

where σ denotes a sigmoid function.

word replication probability prediction module using captured POS token embedding

And semantic context score X _semantic Probability of copying a word from a knowledge-graph->

For selecting whether to use predicted words from a word decoder or words in a knowledge graph when generating a sentence, the expression is: />

Wherein, W ₁ ,W ₂ ,W ₃ And b _copy Is a parameter that can be learned by the user,

indicating token embedding, s _k The last hidden state of the word decoder at each time step is represented as a balance coefficient which is set to be 0.3, the semantic context scoring module and the word replication probability prediction module are optimized in a combined mode, and the replication or prediction loss function expression is as follows:

wherein, y ^k A group-route 0-1 tag representing a word to be copied or predicted at the kth time step, which is generated from the knowledge-graph and the group-route sentence;

total training loss L _total Composed of four parts, including ordering penalty L _sort POS generates a loss L _pos Word generation loss L _token And replication or prediction loss L _copy The expression for the total training loss is:

L _total ＝L _token +λ ₁ L _pos +λ ₂ L _sort +λ ₃ L _copy

wherein λ is ₁ ,λ ₂ And λ ₃ Is a trade-off factor.

Different from the prior art, the technical scheme has the following beneficial effects:

(1) The method effectively realizes that the creation of the network security incident source-tracing script is completed by constructing the high-performance knowledge map, the method generates the high-performance knowledge map to construct the network security incident source-tracing script, various defects of the traditional method are overcome, and the network security incident source-tracing script constructed by the method has extremely high applicability and accuracy;

(2) The invention introduces an additional network security text corpus to enable the constructed knowledge graph to have more entity relationships, simultaneously uses a new embedding method, and divides the extraction process into public feature extraction and relationship perception extraction, wherein the public feature extraction is used for extracting all inputs, and the relationship feature extraction is used for respectively extracting different relationships, so that the link prediction success rate is greatly increased, and the knowledge graph complemented by the method has extremely high applicability and accuracy in the aspect of generating the network security incident source tracing script.

Drawings

FIG. 1 is a block diagram of a method according to an embodiment;

FIG. 2 is a flowchart of the augmented knowledge graph according to an embodiment;

fig. 3 is a flowchart for generating a network security trace scenario according to the embodiment.

Detailed Description

To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.

As shown in fig. 1 to fig. 3, the embodiment provides a method for generating a traceable scenario of a network security event based on knowledge graph composite embedding, including the following steps:

s4: projecting the public features and the relation features to an embedding space, and completing the knowledge graph;

The step S1 in the above embodiment specifically includes the following steps:

given an entity pair (h, t), ranking the LDP syntactic dependency paths extracted from the text corpus with the mentioned entity pair, learning an encoder f of the entity pair parameterized by theta for a subject vector h and a subject vector t, and encoding the entity pair (h, t) into f (h, t; theta) through the encoder f;

wherein the encoder f is embodied as a non-linearly activated multilayer perceptron, the inputs of which are:

wherein the content of the first and second substances,

denotes concatenation of vectors, @ denotes multiplication of two vectors by element, (h-t) denotes host vector h minus guest vector t; the above equation independently considers the information in the head and tail entity embedding and the interaction between their corresponding dimensions.

LDP set S for connections h and t _(h,t) Because LDP is a sequence of text labels, we can represent LDP by vector using a Sentence coder, in this embodiment LDP L by vector L using a pre-trained Sentence coder SBERT (Sennce-BERT: sentence embedding over twin networks);

such that LDP co-occurring with entity pair (h, t) is similar to f (h, t; θ), LDP associated with both h and t, i.e., both occur simultaneously, is used as a positive training instance S = { (h, l, t) }, and LDP associated with either h or t alone, i.e., not both, is used as a negative training instance S' _(h,t) The expression is:

by minimizing S _(h,t) And S' _(h,t) Learning the parameters of f (h, t, θ), the expression is:

where γ ≧ 0 represents a margin, which was set to 1 in the experiment of this example; to determine which LDPs to borrow for a particular pair of entities not mentioned, in this embodiment f (h, t; θ) is calculated using θ obtained by minimizing the above equation, and then the inner product f (h, t; θ) is used ^T And L scores each LDP L, wherein the L is obtained by a sentence encoder model, and then the first u LDPs with the highest inner product score of f (h, t; theta) are selected to expand the knowledge graph, wherein u is a hyper-parameter.

In the step S2 in the above embodiment, the method further includes the following steps:

after the knowledge map is expanded, the knowledge map is set

Representing a relation vector, connecting a main body vector and the relation vector in order to calculate a score function embedded in the method, wherein the expression is as follows:

[e _s ；e _r ] _1d ∈R ^d

then extracting common features of the vectors through a common dense layer, wherein the width of the common dense layer is the number of filters of the dense layer, and the size of a kernel contained in each filter is equal to the size of the kernel embedded in the input;

Ω(x)＝W _h x+b _h

wherein the content of the first and second substances,

the width of the public dense layer is nd _h Given as, nd _h Denotes d _h Wherein n is a hyperparameter;

In step S3 in the above embodiment, the method further includes the following steps:

in order to extract relation specific features from cascade embedding, a relation perception coding function is considered, and for relation r, a coding function is formed by omega _r Expressing, using a relation dense layer to extract relation perception characteristics, and coding a function omega _r Is an affine function, and the expression is:

Ω _r (x)＝W _r x+b _r

wherein the content of the first and second substances,

and d _z Represents omega _r The output length of (d);

In step S4 in the above embodiment, the following steps are further included:

then apply the non-linear activation f (-) and will

Is defined as follows:

using 1:B training strategy to let B represent the number of all entities in the knowledge-graph, the binary cross-entropy function loss

The expression of (c) is:

wherein the content of the first and second substances,

in step S4 in the above embodiment, the method further includes the following steps:

in order to improve robustness of an extraction layer, an autocorrelation strategy is introduced, original embedded features U are extracted, then random disturbance transformation is carried out on the original embedding, features U' are extracted through the extraction layer, then the features extracted by the extraction layer are expected to be the same as far as possible, and an expression of a loss function is as follows:

L _MC ＝KL(U′||U)

wherein the KL function represents a KL divergence;

a composite loss function of

As shown in fig. 2, after the knowledge graph is generated and completed by the above method, the knowledge graph needs to be sorted, specifically: in step S5 in the above embodiment, the method further includes the following steps: POS token embedding is generated through a POS generator, and semantic context scores are obtained through a semantic context scoring module.

When the knowledge graph is sorted, the generated triple embedding features are input into a sorting network, and the sorting network is supervised by using sequences extracted from real sentences. Given a triple structure of subject-relationship-object, since the length of the knowledge graph is variable, in this embodiment, a placeholder is introduced to fill the triple structure to a fixed length N, which also represents the number of possible location classes, and the triple structure of subject-relationship-object is characterized by F _stru And fill F _pad Connected and S is obtained by a full connection layer FC with softmax _matrix And predicting the sorting order S _order The expression is:

S _order ＝argmax _row (FC _s ([F _stru ；F _pad ]))

wherein, FC _s Denotes a fully connected layer with softmax, argmax _row Indicating that the argmax operation is performed on the row;

the sequence prediction task is treated as a classification problem, where N represents the number of classes, the maximum of the triplets in the knowledge-graph, and thus the true sequence G is computed _order And a sorting sequence S _order Cross entropy loss between, the expression is:

wherein L is _sort Which is indicative of a loss of ordering,

where N represents the fourth category, N ranging from 0 to N;

the knowledge graph generates an optimal description sequence through the sequencing network, the sequence is further decoded into sentences through a word decoder, the word decoder uses a decoder in a Transfomer, and then additional syntactic supervision is applied through a POS generator, and the implementation of the POS generator is as follows: in knowledge map order G _order Conditioned by first marking<Main body>，<Relationships between>，<Object>The corresponding position added to each triplet linearizes the knowledge-graph and obtains G _linear Then word encoder and POS generator with G _linear As input, the word codes WI = { w) are output separately _i I ∈ 1 … M } and POS tag code PI = { p = { (p) } _i ,i∈1…M}；

The token is then encoded w in the fusion module _i And POS tag code p _i Fusing to obtain the updated token code w _i The expression is:

w _i ＝LN(FC([w _i ；p _i ])+w _i )

wherein, P _gen Representing a predicted probability from a POS generator;

the loss function expression of the word encoder and decoder is:

wherein the word encoder uses a pre-trained bert model, W _gen Representing the prediction probability of each word token.

In addition to the syntactic constraints of the copied words, the present embodiment also designs a semantic context scoring component to evaluate the semantic consistency of the copied or predicted words in the sliding window. Generating a sliding window for each word to provide local context, filling in the beginning words of the sliding window, and obtaining context information F through the word features in the sliding window _c o _ntext And input into FC layer to obtain semantic context score X _semantic The expression is:

X _semantic ＝σ(FC(F _context ))

where σ denotes a sigmoid function.

Word replication probability prediction module using obtained POS token embedding v _pk And semantic context score X _semantic Probability of copying words from a knowledge graph

The method is used for selecting a predicted word from a word decoder or a word machine in a knowledge graph at each time step when a sentence is generated, and the expression is as follows:

wherein, y ^k A group-truth true 0-1 tag representing a word copied or predicted at the kth time step, generated from the knowledge-graph and the group-truth sentence;

finally, the total training loss L in the model _total Composed of four parts, including ordering penalty L _s ort, POS generate loss L _pos And a loss L of word generation _token And replication or prediction loss L _copy The expression for the total training loss is:

L _total ＝L _token +λ ₁ L _pos +λ ₂ L _sort +λ ₃ L _copy

wherein λ is ₁ ,λ ₂ And λ ₃ Is a trade-off factor.

The data set used by the method is Comprehensive, multi-Source Cyber-Security Events (Comprehensive Multi-Source network Security Activity) data set and an ADFA (intrusion detection data set) data set. The Comprehensive, multi-Source Cyber-Security Events data set is obtained from various websites and various vulnerability databases on the network, wherein the data set comprises network Security and vulnerability information and network text data. The ADFA dataset contains data for various intrusions. A large number of experiments show that the method is superior to the most advanced method.

The table shows that the optimal link prediction performance is realized in MRR average reciprocal ranking, both HIT @1 (previous accuracy rate) and HIT @10 (previous ten accuracy rate) obtain excellent performances, and the method obtains excellent performances in Comprehensive, multi-Source Cyber-Security Events and ADFA data centralization.

Where MRR performance is improved by 0.6%,0.2% HIT @10 and 1.1%; performance enhancement in MRR 2.4%,2.5% hit @10 and 2.6% hit @1 this simple approach is superior to shared layer models such as Conv, SACN, interact and relationship-specific models such as RGCN. The result of the embodiment shows that the framework combining two different coding functions can effectively improve the performance of knowledge graph embedding. This example analyzes the performance of the method on different relationship types of the Comprehensive, multi-Source Cyber-Security Events dataset because the Comprehensive, multi-Source Cyber-Security Events contain more different relationships than ADFA. Four types of relationships based on the number of tails connected to the head and the number of heads connected to the tail: one-to-one (1:1), one-to-many (1:N), many-to-one (N: 1), and many-to-many (N: N). Using this data set, this example compares three models: convE, interactE, and ComDense perform under four types of relationships. As a result, as shown in the following table, comDenSE was found to be effective in both complex relationship types (i.e., 1:N, N: N, N: 1) and simple relationships (i.e., 1:1). Notably, the performance gain of 1:1 is higher, indicating that ComDensE is particularly effective at capturing simple relationships.

The embodiment shows that the method effectively realizes that the creation of the network security incident tracing script is completed by constructing the high-performance knowledge map, the method generates the high-performance knowledge map to construct the network security incident tracing script, various defects of the traditional method are overcome, and the network security incident tracing script constructed by the method has extremely high applicability and accuracy;

in addition, the invention introduces an additional network security text corpus to enable the constructed knowledge graph to have more entity relationships, and simultaneously uses a new embedding method, the extraction process is divided into public feature extraction and relationship perception extraction, wherein the public feature extraction is used for extracting all inputs, and the relationship feature extraction is used for respectively extracting different relationships, so that the link prediction success rate is greatly increased, and the knowledge graph complemented by the method has extremely high applicability and accuracy in the aspect of generating the network security incident source tracing script.

In the aspect of generating the network security event tracing script, the method optimizes knowledge sequence prediction under sequence supervision, and further enhances the consistency of the generated sentences and the knowledge map through syntax and semantic regularization.

Locations where words are copied from the knowledge-graph are restricted in conjunction with part-of-speech (POS) syntactic tokens, and a semantic context scoring function is used to generate a suitability of each word in the sentence in its context. The constructed network security event source tracing script has extremely high availability and accuracy.

It should be noted that, although the above embodiments have been described herein, the invention is not limited thereto. Therefore, based on the innovative concepts of the present invention, the technical solutions of the present invention can be directly or indirectly applied to other related technical fields by making changes and modifications to the embodiments described herein, or by using equivalent structures or equivalent processes performed in the content of the present specification and the attached drawings, which are included in the scope of the present invention.

Claims

1. A network security event source-tracing script generation method based on knowledge graph composite embedding is characterized by comprising the following steps:

s5: sequencing the knowledge graph obtained in the step S4, acquiring POS token embedding and semantic context scores through a corresponding module, and generating a network security source-tracing script through the acquired POS token embedding and semantic context scores by a word replication probability prediction module;

the step S2 comprises the following steps:

after the knowledge map is expanded, set e _s ,

Wherein e _s Representing the subject vector, e _o The object vector is represented by a vector of objects,

[e _s ；e _r ] _1d ∈R ^d

in the common dense layer, an affine function Ω (-) is applied to a given input embedding, the expression of the common dense layer being:

Ω(x)＝W _h x+b _h

wherein the content of the first and second substances,

by applying a non-linear activation function f (-) to Ω ([ e ] _s ；e _r ] _1d ) Obtaining an output of the common feature extraction;

the step S3 comprises the following steps:

Ω _r (x)＝W _r x+b _r

wherein, the first and the second end of the pipe are connected with each other,

and d _z Represents omega _r The output length of (d);

will omega _r Application to input embedding [ e ] _s ；e _r ] _1d ∈R ^d Then, applying a nonlinear activation function f (·), wherein the relationship dense layer has different encoders facing different relationships and is used for extracting relationship features;

the step S4 comprises the following steps:

then apply the nonlinear activation f (-) and will

Is defined as:

wherein h is _sr The link prediction score psi (e) is the result of the prediction _s ,e _r ,e _o ) Is defined as h _sr And e _o Inner product of (2)

The expression of (c) is:

wherein the content of the first and second substances,

the step S5 further includes the steps of: POS token embedding is generated through a POS generator, and semantic context scores are obtained through a semantic context scoring module;

the step S5 further includes the steps of:

when the knowledge graph is sequenced, the generated triple embedding characteristics are input into a sequencing network, a main body-relation-object triple structure is given, a placeholder is introduced to be filled into a fixed length S, and the main body is sequencedBody-relation-object triad structural feature F _stru And fill F _pad Connected and get S through the full connection layer with softmax _matrix And predicting the sorting order S _order The expression is:

S _order ＝argmax _row (FC _s ([F _stru ；F _pad ]))

therein, FC _s Denotes a fully connected layer with softmax, argmax _row Indicating that the argmax operation is performed on the row,

considering the sequence prediction task as a classification problem, where N represents the number of classes, the true sequence G is computed _order And a sorting sequence S _order Cross entropy loss between, the expression:

wherein L is _sort Which is indicative of a loss of ordering,

where N represents the fourth category, N ranging from 0 to N;

the knowledge graph generates an optimal description sequence through a sequencing network, the sequence is further decoded into sentences through a word decoder, and then syntactic supervision is applied through a POS generator, namely: in knowledge map order G _order Conditional, first, by marking<Main body>，<Relationships between>，<Object>The corresponding position added to each triplet linearizes the knowledge-graph and obtains G _linear Then word encoder and POS generator with G _linear As input, the word codes WI = { w are output, respectively _i I ∈ 1 … M } and POS tag code PI = { p = { p } _i ,i∈1…M}；

Encoding a token w in a fusion module _i And POS tag code p _i Fusing to obtain updated token code w _i The expression is:

w _i ＝LN(FC([w _i ；p _i ])+w _i )

wherein, P _gen Representing a predicted probability from a POS generator;

the loss function expression of the word encoder and decoder is:

wherein, W _gen Representing the prediction probability of each word token.

2. The method according to claim 1, wherein the S1 step comprises the steps of:

given a pair of entities (h) not mentioned ^* ,t ^* ) Ranking LDPs extracted from a text corpus and provided with the entity pairs, learning an encoder f of the entity pair parameterized by theta for a subject vector h and an object vector t, and encoding the entity pair (h, t) into f (h, t; θ);

where the input to encoder f is:

wherein the content of the first and second substances,

denotes concatenation of vectors, @ denotes multiplication of two vectors by elements, (h-t) denotes principal directionSubtracting the object vector t from the amount h;

wherein t ' and h ' respectively represent an object vector and a subject vector which are not equal to t and h, l ' represents the relationship of a negative training example, and D represents a subject vector and an object vector set;

wherein γ ≧ 0 denotes the margin, f (h, t; θ) is calculated using θ obtained by minimizing the above formula, and then the inner product f (h, t; θ) is used ^T L scores each LDP L, and the first u LDPs with the highest inner product score of f (h, t; theta) are selected to expand the knowledge graph, wherein u is a hyperparameter.

3. The method of claim 2, wherein the step S4 further comprises the steps of:

extracting original embedded characteristics U, then carrying out random disturbance transformation on the original embedding, extracting characteristics U' through an extraction layer, wherein the expression of a loss function is as follows:

L _MC ＝KL(U′||U)

wherein the KL function represents a KL divergence;

a composite loss function of