CN110705260A

CN110705260A - Text vector generation method based on unsupervised graph neural network structure

Info

Publication number: CN110705260A
Application number: CN201910905090.1A
Authority: CN
Inventors: 段大高; 闫光宇; 韩忠明; 杨伟杰; 刘文文
Original assignee: Beijing Technology and Business University
Current assignee: Beijing Technology and Business University
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2020-01-17
Anticipated expiration: 2039-09-24
Also published as: CN110705260B

Abstract

The invention discloses a text vector generation method based on an unsupervised graph neural network structure, which is characterized in that stop word processing is carried out on all collected text corpora by utilizing stop word corpora, keywords are selected from the processed corpora, text keyword weight and weight among the keywords are calculated and stored, and a text-keyword network adjacency matrix is constructed; secondly, calculating initial node characteristics of the text by using the trained word vectors as word node characteristics and keywords appearing in the document to obtain a text-keyword network characteristic matrix; and finally, constructing a negative sample adjacency matrix and a characteristic matrix corresponding to the positive sample, converging the loss by utilizing the steps of the loss function and the constructed network model and gradient reduction, and obtaining a text node characteristic vector after convergence to obtain a text expression vector based on the unsupervised GNN. The invention fully considers the discontinuous global word co-occurrence and long-distance semantics in the corpus and the total relevance of a single document to all document-keyword sets.

Description

Text vector generation method based on unsupervised graph neural network structure

Technical Field

The invention relates to the technical field of data mining and natural language processing, in particular to a text vector generation method based on an unsupervised graph neural network, which can be applied to extracting document vectors and also can be applied to downstream tasks such as text classification, clustering and text similarity calculation.

Background

Text has become a hot issue for research on many platforms today, and since most texts are unstructured or semi-structured data, text mining has been one of the important research angles for data mining in multiple fields. Meanwhile, with the gradual popularization of the internet, the data size of the web text is larger and larger, the growth speed of the information amount is gradually increased, and it becomes more and more difficult to extract the information required by the user from the mass data.

The conventional method represents that the average value of all word vectors contained in a document is calculated, and a doc2vec model is adopted. Recently, deep learning models have been widely used to learn text representations, including Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). Because the CNN and RNN give priority to locality and sequentiality, these deep learning models can obtain semantic and syntactic information in locally continuous word sequences well, but ignore non-continuous global word co-occurrence and long-distance semantics in the corpus and the overall correlation of a single document to all document-keyword sets. Aiming at the problem, a novel unsupervised graph neural network-based text vector generation method is provided.

Disclosure of Invention

The invention aims to provide a text vector generation method based on an unsupervised graph neural network structure, which is used for expressing text vectors based on the unsupervised graph neural network and then can utilize document expression vectors to perform downstream tasks such as classification, clustering and the like. To solve the problems of the prior art.

In order to achieve the purpose, the invention provides the following scheme: the invention provides a text vector generation method based on an unsupervised graph neural network structure, which comprises the following steps:

step one, obtaining keywords: performing word segmentation processing and word deactivation processing on all texts in a corpus to obtain a document set, then calculating and storing word frequency of each word in the document set, taking the times with the word frequency being more than or equal to n as a keyword, wherein n is more than 1;

step two, judging the relevance of word semantics in the corpus: defining m continuous words of an unprocessed single document as a word co-occurrence window, wherein m is greater than 1, the unprocessed words are words which are subjected to word segmentation processing but are not subjected to word deactivation processing, setting # W (i) as the number of word co-occurrence windows containing the occurrence of a word i in all documents in a corpus, # W (i, j) as the number of windows containing the occurrence of the words i and j in all documents in the corpus and appearing in the same co-occurrence window simultaneously, # W as the total number of the word co-occurrence windows in all the documents in the corpus, and the formula of the inter-keyword mutual information PMI (i, j) is as follows:

wherein p (i) represents the proportion of the window containing the word i to all the co-occurrence windows, and p (i, j) represents the proportion of the window containing the words i and j to all the co-occurrence windows simultaneously; PMI (i, j) >0 represents high semantic relevance of words in the corpus, PMI (i, j) <0 represents little or no semantic relevance in the corpus;

step three, calculating and storing the weight TF-IDF word frequency-inverse text frequency of the document-word, wherein the formula is as follows:

TF-IDF＝tf(t,D_i)×idf(t)

wherein, tf (t, D)_i) Representing the word frequency of the keyword t in the ith document, M representing the total number of documents, n_tRepresenting the number of documents with keywords t in the document set, and IDF representing the frequency of calculating the reverse text;

step four, constructing an adjacency matrix of the document-word complex network;

step five, training all keywords in the text set, and expressing the keywords by word vectors; setting the vector of the initial document i to be equal to the sum of word vectors of all keywords in the document i divided by the number of the keywords in the document i;

constructing a node characteristic matrix X of the document-word complex network, wherein a behavior characteristic column is a node, a keyword node characteristic is a word vector of a keyword, and a document node characteristic is an initial document vector; defining a node characteristic matrix X as a positive sample characteristic matrix, and A as a positive sample adjacency matrix; the negative samples use the feature matrix mixed by lines as the feature matrix, and the adjacent matrix uses the same adjacent matrix as the positive samples, i.e. the negative samples

Step six, defining a loss function L:

wherein N represents the number of positive sample nodes, M represents the number of negative sample nodes,which represents a network of positive samples,

a network of negative examples is represented,

the representative vectors of the nodes after extracting the local features for the positive samples,

extracting a representation vector of a node after local features are extracted for the negative sample; the local characteristics of the positive and negative samples are processed by the same convolution method epsilon, epsilon (X, A) represents a node characteristic matrix after the positive samples are processed,

representing the global characteristics of the positive sample, wherein R represents a processing process of the node characteristic matrix after the positive sample is processed;

a representation discriminator, discriminating

Whether or not they are similar to each other, if

Approaching 1 indicates similarity, if

Approaching 0 indicates dissimilarity;

step seven, constructing a graph neural network model

Processing the positive sample node feature matrix X and the adjacency matrix A to obtain a negative sample feature matrix

Adjacency matrix

Extracting local features of each node of the positive sample and the negative sample to obtain the global features of the positive sample, and determining the global features of the positive sample by a discriminator

Calculating a loss function, and updating epsilon and R, D by using the loss function with gradient reduction until loss is converged;

and taking the converged text node feature vector to generate an unsupervised GNN-based text representation vector.

Preferably, the process of step four specifically comprises: firstly, taking each document as a node in a network, and taking each keyword as a node; then, edges among the nodes are constructed, and the edge weight between the node i and the node j is defined as A_ijThe formula is as follows:

preferably, the specific process of the step six is as follows:

local features are extracted by applying a single-layer GCN structure: the formula of the node feature matrix epsilon (X, A) after the positive sample processing is as follows:

wherein

I_NIs a matrix of the units,

is that

θ is a learnable parameter matrix;

for global features, averaging is performed on all node features of the positive sample by using an averaging method:

wherein sigma is a nonlinear Sigmoid function, and N represents the number of nodes;

for the discriminator, a simple bilinear scoring function is applied:

where W is a learnable scoring matrix and σ is a nonlinear Sigmoid function.

The invention discloses the following technical effects: in the process of compressing and converting text content, constructing a text-keyword network is an effective method, namely, a large amount of texts and keywords are converted after each text keyword is found, and the texts and the keywords are converted into a large complex network. The text scale can be greatly compressed, and basic information in the text is lost as little as possible; and learning the constructed text-keyword network by using a graph neural network to obtain a new text expression vector which not only contains the keyword information in the document, but also contains the weight of the keyword and the structural information of the graph. In the process of the invention, data is easy to process keywords, the process is simple and effective, the text-keyword complex network adjacency matrix node characteristic matrix is convenient to obtain, meanwhile, the negative sample and the graph neural network model are easy to calculate, and discontinuous global word co-occurrence and long-distance semantics in a corpus and the total correlation of a single document to all document-keyword sets are fully considered.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is an exemplary diagram of text-keywords in step four according to the embodiment of the present invention;

fig. 3 is an exemplary diagram of a network model in step seven of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Referring to fig. 1 to 3, the present invention provides a method for generating a text vector based on an unsupervised graph neural network, which specifically includes the following steps:

step one, acquiring a large amount of text corpora as a corpus, taking a data set 20Newsgroups (20NG) as an example, and downloading addresses as follows:

http:// qwone.com/. jason/20Newsgroups/20news-bydate.tar.gz, which includes 18846 documents. The present invention borrows 3 documents from this data to build a complex network example.

Obtaining the stop word corpus, taking the stop word list summarized by the Xinlang user as an example, and the download address is as follows: http:// blog.sina.com. cn/s/blog _ a19a 3770102wjau. html, which includes 891 stop words, including ' \\ about ', ' above ', ' also ', ' I ', ' wait ', ' to ', ' the ' … … ', etc. The invention borrows the data to remove stop words; all texts are processed by the stop words, and if 891 stop words such as 'about', 'above', 'also', 'I', 'wait', 'to', 'the' … …, and the like appear in the texts, the words are deleted in the texts, and finally a document set after the stop words is obtained. For example document D₁₀Comprises the following steps: "I wait to fly in the sky", according to stopping word list order and deleting the word in the word list, look for "about" in the file at first, if exist, delete "about"; then 'above' is deleted in the document; … … until the last word in the deactivation vocabulary is deleted. Since '"I', 'wait', 'to', 'in', 'the' are stop words, document D after the stop words are removed₁₀Comprises the following steps: "fly sky". Computing sums in document collections after stop wordsThe word frequency (TF) of each word is stored, wherein the word frequency is the frequency of the occurrence of a word in a certain article, and words with the word frequency of more than or equal to 5 are taken as keywords. Then document D₁Containing a keyword { w₁,w₂Document D₂Containing a keyword { w₁,w₃Document D₃Containing a keyword { w₃,w₄}。

Step two, calculating and storing a formula of the mutual information PMI (i, j) among the keywords as follows:

the size of a word co-occurrence window is regulated to be 5, the invention defines continuous 5 words of an unprocessed single document as a word co-occurrence window, wherein # W (i) is the number of the word co-occurrence windows containing the occurrence of a word i in all documents in a corpus, # W (i, j) is the number of the word co-occurrence windows containing the occurrence of the words i and j in all documents in the corpus and # W is the total number of the word co-occurrence windows in all the documents in the corpus, namely W is the same as the number of the words in all the documents. A positive PMI value implies high semantic relevance of words in the corpus, while a negative PMI value indicates little or no semantic relevance in the corpus. For example for original document D₁In the "I way to fly in the sky", the first word co-occurrence window is "I way to fly in", the second word co-occurrence window is "way to fly in the same", the third word co-occurrence window is "to fly in the sky", the fourth word co-occurrence window is "fly in the sky", the fifth word co-occurrence window is "in the sky", the sixth word co-occurrence window is "the sky", and the seventh word co-occurrence window is "sky". Wherein, represents the word appearing after the word from the word to the word in the word sky, if the sentence is the endSentence then represents the automatically filled space symbol. If 'sky' and 'fly' are document D₁The keywords do not appear in other co-occurrence windows at the same time, thenAssuming that # W is 100000, W (sky) is 5, and W (fly) is 40, the method is as follows

PMI (sky, fly) ═ log1000 ═ 3. Then PMI (w) is calculated₁,w₂)＝3，PMI(w₁,w₃)＝2，PMI(w₁,w₄)＝-1，PMI(w₂,w₃)＝1，PMI(w₂,w₄)＝-2，PMI(w₃,w₄)＝2。

Step three, calculating and storing the weight TF-IDF (word frequency-inverse text frequency) of the document-word, wherein the formula is as follows:

TF-IDF＝tf(t,D_i)×idf(t)

wherein, tf (t, D)_i) The word frequency of the keyword t in the ith document, M is the total number of the documents, n_tFor the number of documents in the document set in which the keyword t appears, IDF indicates the calculation of the inverted text frequency, wherein

The purpose of adding 0.01 is to prevent the document-word weight from being 0 (n)_tM). (the text frequency refers to the number of times that a certain keyword appears in all articles in the whole corpus, the text frequency is the reciprocal of the text frequency and is mainly used for reducing the effect of words which are common in all documents but have little influence on the documents.)

For the 3 selected data sets, the document number M is 3, for 'sky', if it is in document D₂The number of word frequencies occurring in is tf (sky, D)₂) 30, which is a keyword n in 2 documents_sky＝2，

Is calculated to obtain

Step four, constructing a document-word complex network adjacency matrix:

each document is taken as a node in the network and each keyword is also taken as a node. Constructing edges between nodes, and defining the edge weight between the node i and the node j as A_ijThe formula is as follows:

suppose that 3 of the documents are selected to build a document-word complex network, document D₁Containing a keyword { w₁,w₂Document D₂Containing a keyword { w₁,w₃Document D₃Containing a keyword { w₃,w₄}，PMI(w₁,w₂)＝3，PMI(w₁,w₃)＝2，PMI(w₁,w₄)＝-1，PMI(w₂,w₃)＝1，PMI(w₂,w₄)＝-2，PMI(w₃,w₄)＝2；

The complex network adjacency matrix is a:

since there are 3 documents and 4 keywords, the adjacency matrix A is a 7 × 7 matrix with the order { D }₁、D₂、D₃、W₁、w₂、w₃、w₄}。

Step five, acquiring word vectors trained by using word2vec, taking the word vectors of Wikipedia trained by GloVe as an example, and downloading addresses are as follows: http:// nlp.stanford.edu/data/glove.6B.zip. With the word vector dimension being 3 for the example. All keywords in the text set are represented by trained word vectors. The initial document i vector is equal to the sum of the word vectors for all keywords in document i divided by the number of keywords in document i. For example for document D₁Containing a keyword { w₁,w₂}，w₁The node of (a) represents a vector of

w₂The node of (a) represents a vector of

Then D is₁Is represented as a vector

And similarly, constructing a node characteristic matrix X of the document-word complex network, wherein the behavior characteristic column is a node, the keyword node characteristic is a word vector of the keyword, and the document node characteristic is an initial document vector. Defining A as a positive sample adjacency matrix, and defining a node characteristic matrix X as a positive sample characteristic matrix, wherein the positive sample characteristic matrix X has the following formula:

the negative samples use the feature matrix mixed by lines as the feature matrix, and the adjacent matrix uses the same adjacent matrix as the positive samples, i.e.

For w₁，w₂The nodes are characterized by (1, 2, 3), (3, 1, 2), w after confusion₁，w₂The node features may be (3, 1, 2) or (1, 2, 3), that is, for all node features, the dimensions and the included values thereof do not change, and only the original w_iThe characteristics of the node may no longer represent w_i. Namely, it is

Can be as follows:

or

Step six: defining a loss function L:

wherein N represents the number of positive sample nodes, M represents the number of negative sample nodes,

which represents a network of positive samples,

a network of negative examples is represented,

extracting a part for a negative sampleA representation vector of the feature back node; the local characteristics of the positive and negative samples are processed by adopting the same convolution method epsilon, and epsilon (X, A) represents a node characteristic matrix after the positive samples are processed.

Representing the global characteristics of the positive sample,

wherein R represents a processing process of the node characteristic matrix after the positive sample is processed;

a representation discriminator, discriminating

Whether or not they are similar to each other, ifApproaching 1 indicates similarity, if

Approaching 0 indicates dissimilarity.

The formula for extracting the local features and applying the node feature matrix epsilon (X, A) of the single-layer GCN structure is as follows:

wherein

I_NIs a matrix of the units,

is that

Degree matrix of, e.g.

θ is a learnable parameter matrix. For the non-linear parameter σ, we use the ReLU function. I.e. for an input a,comprises the following steps:

for global features, averaging all node features of the positive sample by using a simple averaging method:

where σ is a nonlinear Sigmoid function and N represents the number of nodes.

For the discriminator, a simple bilinear scoring function is applied:

where W is a learnable scoring matrix and σ is a nonlinear Sigmoid function.

Step seven: step of constructing graph neural network model

Mixing the positive sample node feature matrix X line by line to obtain a negative sample feature matrix

Adjacency matrix

Extracting local features of each node of the alignment sample

Extracting local features of each node of negative sample

Obtaining positive sample global features

Pass discriminator

And (4) calculating the Loss, and if the Loss does not converge, repeating the steps 1,2, 3,4 and 5 until the Loss converges.

For example, the first training: first step input

To obtain

Secondly, extracting node characteristics of the positive sample, and firstly calculating:

the learnable concealment layer number is set to 6, and θ is initialized to [ -0.5692, -0.6487,0.3339], [ -0.1275, -0.4908, -0.1815], [ -0.7875, -0.4873, -0.3584], [0.7263,0.3383,0.4262], [0.2747,0.5978, -0.5178], [0.6429, -0.4805, -0.2120] ] whose dimension is ([6,3 ]).

And calculating to obtain:

h [ [ [ [ -0.7087, -0.5292, -1.1873,4.3132,1.2835,0.4477], [ -0.9484, -0.5760, -1.3368,4.7530,2.1082,0.9065], [ -1.2537, -0.6958, -1.6096,5.6373,3.0459,1.2284], [ -0.5677, -0.5290, -1.1280, 4.1638, 0.7651,0.0254], [ -0.5199, -0.4663, -0.9613,3.5127,0.8292, -0.0187], [ -1.0183, -0.6611, -1.5236,5.4620,2.1234,0.9007], [ -1.1076, -0.6147, -1.3856,4.8227,2.7644,0.9474] ], its dimension is ([1,7,6 ]).

And the third step is to calculate:

its dimension is ([1,7,6 ]).

Fourthly, calculating global features:

[ its dimension is (1, 6).

The fifth step is initialization:

w [ [ [ [ -0.2551,0.1770, -0.2642, -0.1486,0.2632, -0.3471], [0.3415,0.0906, -0.0688,0.3749, -0.0906, -0.0022], [0.0060, -0.3362, -0.1600, -0.2831,0.0946, -0.3743], [ -0.2553, -0.0699, -0.1703,0.2189,0.2910, -0.2692], [0.2505,0.2865,0.1543, -0.1312,0.2121, -0.1092], [0.2638, -0.3451,0.1210,0.2282, 0.3422, -0.0979] ], the dimension of which is ([1,6,6 ]).

The discriminator score is calculated as:

[1.7562,2.3045,2.9415,1.3836,1.1912,2.5069,2.5062,2.3998,1.1779,1.3023,2.6828,1.5789,2.8675,2.0953]. The Loss is calculated to be 1.483.

And (3) training for the second time: the first step A, X,

In the same way, the first and second,in the second step

Also, θ becomes:

[[-0.5682,-0.6477,0.3349],[-0.1285,-0.4918,-0.1825],[-0.7865,-0.4863,-0.3574],[0.7253,0.3373,0.4252],[0.2737,0.5968,-0.5188],[0.6419,-0.4815,-0.2130]]its dimension is ([6, 3)]) The obtained H [ [ [ [ -0.7035, -0.5294, -1.1803,4.3039,1.2742,0.4384],[-0.9422,-0.5762,-1.3290,4.7432,2.0984,0.8967],[-1.2459,-0.6958,-1.6004,5.6259,3.0346,1.2170],[-0.5631,-0.5292,-1.1212,4.1545,0.7558,0.0162],[-0.5158,-0.4664,-0.9555,3.5047,0.8211,-0.0206],[-1.0114,-0.6612,-1.5147,5.4508,2.1122,0.8895],[-1.1007,-0.6147,-1.3776,4.8128,2.7545,0.9375]]]Its dimension is ([1,7, 6)]). The third step of the same process

The fourth step

And may also be obtained by calculation.

The fifth step learns the weight matrix W to become:

[ [ [ [ [ -0.2541,0.1780, -0.2632, -0.1476,0.2642, -0.3461], [0.3425,0.0916, -0.0678,0.3759, -0.0896, -0.0012], [0.0070, -0.3352, -0.1590, -0.2821,0.0956, -0.3733], [ -0.2563, -0.0709, -0.1713,0.2179,0.2900, -0.2702], [0.2495,0.2855,0.1533, -0.1322,0.2111, -0.1102], [0.2628, -0.3461,0.1200,0.2272,0.3412, -0.0989] ] its dimension is ([1,6,6 ]). A new discriminator score [1.7128,2.2531,2.8782,1.3445,1.1597,2.4496,2.4513,1.3324,2.7238,2.8783,2.3160,1.1949,2.5691,1.1294] may be calculated, resulting in an updated Loss of 1.1571.

The 3 rd training resulted in Loss being 1.1187, 1.1330 …, 0.6888 at 54 th, 0.6988 at 55 th, 0.6938 at 56 th, 0.6944 at 57 th, 0.6992 at 58 th, 0.6973 at 59 th. When the pass setting was 5, that is, 55 to 59 passes of Loss was not as good as 54, the Loss was considered to converge.

And (3) taking the converged text node feature vectors (the first three line vectors obtained by the 54 th training) [ -0.0229,0.0709,0.1803,0.1003, -0.0238,0.3006], [ -0.0212,0.0884,0.1985,0.1419, -0.0237,0.3395], [ -0.0267, -0.0027,0.0808, -0.0097, -0.0214,0.1106], so as to generate the unsupervised GNN-based text representation vector.

The principle is as follows: firstly, a large amount of text corpora are collected, and stop word corpora are downloaded. And performing stop word processing on all the collected text corpora by using the stop word corpora. And then calculating and storing the word frequency of the words in each document, taking the words with the word frequency larger than n as keywords, calculating and storing a text keyword weight TF-IDF and a weight PMI among the keywords, and defining network nodes and node edge weights to obtain a text-keyword network adjacency matrix. Secondly, the trained word vectors are used as word node characteristics, and initial node characteristics of the text are calculated by using keywords appearing in the document to obtain a text-keyword network characteristic matrix. And then constructing a negative sample adjacency matrix and a characteristic matrix corresponding to the positive sample, utilizing the defined loss function and the constructed network model step, utilizing gradient descent to make loss convergence, and taking the converged text node characteristic vector to obtain a text expression vector based on the unsupervised GNN. By adopting the method, the data is easy to obtain, the keyword processing process is simple and effective, the text-keyword complex network adjacency matrix node characteristic matrix is convenient to obtain, the negative sample is easy to construct, the graph neural network model is easy to calculate, the discontinuous global word co-occurrence and long-distance semantics in the corpus and the total correlation of a single document to all document-keyword sets are fully considered, and the user can conveniently extract the required information from mass data.

In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, are merely for convenience of description of the present invention, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention.

The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention can be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are within the scope of the present invention defined by the claims.

Claims

1. A text vector generation method based on an unsupervised graph neural network structure is characterized by comprising the following steps:

step one, obtaining keywords: performing word segmentation processing and stop word processing on all texts in a corpus to obtain a document set, then calculating and storing word frequency of each word in the document set, and selecting times with the word frequency being more than or equal to n as keywords, wherein n is more than 1;

TF-IDF＝tf(t,D_i)×idf(t)

step five, training all keywords in the text set, expressing the keywords by word vectors, and setting the vectors of the initial document i to be equal to the sum of the word vectors of all the keywords in the document i divided by the number of the keywords in the document i;

constructing a node characteristic matrix X of the document-word complex network, wherein a row characteristic column is a node, a keyword node characteristic is a word vector of a keyword, and a document node characteristic is an initial document vector; defining a node characteristic matrix X as a positive sample characteristic matrix, and A as a positive sample adjacency matrix; the negative samples use the feature matrix mixed by lines as the feature matrix, and the adjacent matrix uses the same adjacent matrix as the positive samples, i.e. the negative samples

Step six, defining a loss function L:

which represents a network of positive samples,

a network of negative examples is represented,

a representation discriminator, discriminating

Whether or not they are similar to each other, if

Approaching 1 indicates similarity, if

Approaching 0 indicates dissimilarity;

step seven, constructing a graph neural network model

Adjacency matrix

2. The unsupervised graph neural network structure-based text vector generation method of claim 1, wherein: the process of the fourth step is specifically as follows: firstly, taking each document as a node in a network, and taking each keyword as a node; then, edges among the nodes are constructed, and the edge weight between the node i and the node j is defined as A_ijThe formula is as follows:

3. the unsupervised graph neural network structure-based text vector generation method of claim 1, wherein: the concrete process of the step six is as follows:

wherein

I_NIs a matrix of the units,

is that

θ is a learnable parameter matrix;

for the discriminator, a simple bilinear scoring function is applied:

where W is a learnable scoring matrix and σ is a nonlinear Sigmoid function.