CN114996483B

CN114996483B - Event map data processing method based on variational self-encoder

Info

Publication number: CN114996483B
Application number: CN202210929367.6A
Authority: CN
Inventors: 蒋炜; 魏晓菁; 王红凯; 冯珺; 赵帅; 王艺丹; 张烨华; 徐弢; 陈文健
Original assignee: Lenovo Beijing Ltd; State Grid Zhejiang Electric Power Co Ltd; NARI Group Corp; Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd
Current assignee: Lenovo Beijing Ltd; State Grid Zhejiang Electric Power Co Ltd; NARI Group Corp; Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2022-08-03
Filing date: 2022-08-03
Publication date: 2022-10-21
Anticipated expiration: 2042-08-03
Also published as: CN114996483A

Abstract

The invention discloses a data processing method of a case map based on a variational self-encoder, which comprises the following steps: acquiring the data dimension of each event node in the first event map, and classifying all the event nodes according to the data dimension to obtain a plurality of first event node sets; determining a corresponding variational self-encoder according to the data dimension of each first event node set, and sequentially encoding the event data of each event node in the first event node set to obtain a plurality of encoded set data; decoding the encoded set data of the corresponding data dimension to obtain a plurality of second event node sets based on a decoding unit of the variational self-encoder; and extracting the matter data in each second matter node set, and recombining the matter graph according to the matter data and the node labels corresponding to each matter node to obtain the second matter graph. The invention can reduce the loss of data information and efficiently realize the data migration in the event graph.

Description

Data processing method of event map based on variational self-encoder

Technical Field

The invention relates to the technical field of data processing, in particular to a data processing method of a case map based on a variational self-encoder.

Background

A variational autoencoder is a structure consisting of an encoder and a decoder, trained to minimize the reconstruction error between the encoded and decoded data and the original data. Compared with a common encoder for compressing data, the variational self-encoder has the advantages of small information loss and small error.

The data size in the case map is large, and when the data in the case map is migrated, in order to reduce the data size of the case map in the migration process, the data migration can be performed through the variational self-encoder, so that the data error amount of the case map is reduced.

However, in an actual application scenario, data dimensions of each event node in the event graph may be different, so when event nodes with different data dimensions are compressed, in order to improve high integrity of data corresponding to the event nodes, an adaptive variable self-encoder needs to be selected according to the data dimensions for data encoding, in the prior art, the event nodes in the event graph cannot be classified according to the data dimensions, different encoding modes are adopted, and the event nodes cannot be recombined after being decoded to obtain the event graph after migration. The current affair atlas migration has the defects of low efficiency and large data information loss.

Disclosure of Invention

The invention overcomes the defects of the prior art, and provides a data processing method of a matter graph based on a variational self-encoder, which can reduce the loss amount of data information and efficiently realize the migration of data in the matter graph.

In order to solve the technical problems, the technical scheme of the invention is as follows:

the invention provides a data processing method of a case atlas based on a variational self-encoder, which comprises the following steps:

s1, obtaining corresponding node connection information for a node label corresponding to each event node in a first event map according to the connection relation of each event node;

s2, acquiring the data dimension of each event node in the first event map, and classifying all event nodes according to the data dimension to obtain a plurality of first event node sets;

s3, determining a corresponding variational self-encoder according to the data dimension of each first event node set, and sequentially encoding the event data of each event node in the first event node set according to the encoding units of the variational self-encoder to obtain a plurality of encoded set data;

s4, after the plurality of pieces of coding set data are judged to be migrated to the target space, decoding the coding set data of the corresponding data dimensionality based on a decoding unit of the variational self-encoder to obtain a plurality of second event node sets;

and S5, extracting the matter data in each second matter node set, and recombining the matter map according to the matter data and the node labels corresponding to the matter nodes to obtain the second matter map.

Further, the S1 includes:

randomly selecting a matter node in the first matter graph as a starting point, and adding a corresponding node label for each matter node by starting from the starting point;

when a node label is added to each affair node, determining the connection relation corresponding to each affair node to obtain the node connection information corresponding to each affair node;

and counting the node labels and the node connection information of each event node to generate a node correspondence table.

Further, the S2 includes:

acquiring a data type of data corresponding to each event node, wherein the data type is at least one of character data, image data, audio data and video data;

determining data dimensions of corresponding affair nodes according to data types of data corresponding to each affair node, wherein the data dimensions comprise at least one type dimension information, and the type dimension information is at least one of character dimension information, graph dimension information, audio dimension information and video dimension information;

and classifying the event nodes with the same category dimension information quantity and dimension type to obtain a first event node set.

Further, the S3 includes:

extracting all data dimensions of each first event node set to obtain coding dimension information;

comparing the coding dimension information with a preset encoder selection table to obtain a corresponding variational self-encoder, wherein the encoder selection table has a corresponding relation between each piece of coding dimension information and the variational self-encoder;

and based on the coding unit of the variation self-coder, coding the first event node set with the corresponding coding dimension information to obtain corresponding coding set data.

Further, the comparing the encoding dimension information with a preset encoder selection table to obtain a corresponding variational self-encoder, where the encoder selection table has a corresponding relationship between each encoding dimension information and the variational self-encoder, and includes:

if the variational self-encoder corresponding to the compared encoding dimension information does not exist in the encoder selection table, the compared encoding dimension information is used as difference dimension information;

comparing the difference dimension information with encoding dimension information preset in an encoder selection table for the first time;

and if the difference dimension information is judged to be completely contained by one preset encoding dimension information, taking the variation self-encoder corresponding to the preset encoding dimension information as the variation self-encoder of the compared encoding dimension information.

Further, the comparing the encoding dimension information with a preset encoder selection table to obtain a corresponding variation self-encoder, where the encoder selection table has a corresponding relationship between each encoding dimension information and the variation self-encoder, and includes:

if the difference dimension information is judged not to be completely contained by any one preset encoding dimension information, calculating the similarity of the difference dimension information and each preset encoding dimension information to obtain a similarity coefficient;

and taking the variational self-encoder corresponding to the preset encoding dimension information with the highest similarity coefficient of the difference dimension information as the variational self-encoder of the encoding dimension information for comparison.

Further, if it is determined that the difference dimension information is not completely contained in any preset encoding dimension information, calculating the similarity between the difference dimension information and each preset encoding dimension information to obtain a similarity coefficient, including:

determining the number of the category dimension information in the difference dimension information to obtain a first dimension number;

determining the number of category dimension information in each preset encoding dimension information to obtain a second dimension number;

determining the number of the same type of dimension information in the difference dimension information and preset coding dimension information to obtain the number of the same dimension;

calculating according to the first dimension number, the second dimension number and the same dimension number to obtain the similarity between the difference dimension information and each preset encoding dimension information, obtaining a similarity coefficient according to the similarity, calculating the similarity coefficient by the following formula,

wherein the content of the first and second substances,

in order to be the coefficient of the degree of similarity,

the number of the dimensions being the same as the number of the dimensions,

for the number of the first dimension, the number of dimensions,

for the first calculation of the weight, the weight is calculated,

for the number of the second dimension, the number of the first dimension,

a weight is calculated for the second.

Further, the S4 includes:

after the plurality of coding set data are judged to be migrated to a target space, sequentially determining a variational self-encoder corresponding to each coding set data according to the coding dimension information of each coding set data;

and sequentially decoding the corresponding coding set data according to the decoding unit of each variational self-encoder to obtain a plurality of second event node sets.

Further, the S5 includes:

after all the coded set data are judged to be decoded respectively to obtain a plurality of corresponding second event node sets, node labels and node connection information corresponding to each event node are obtained in sequence;

establishing a plurality of storage units in a target space, and marking each storage unit according to a node label corresponding to each event node;

respectively storing the affair data of each affair node into a storage unit with the same node label, and establishing a data calling path of the storage unit and the corresponding affair node in the map, so that the corresponding affair node is called based on the data calling path to be displayed after being triggered;

and determining the positions of all the event nodes in the event map and the connection relation between each event node and other event nodes according to the node labels and the node connection information of all the event nodes in the node corresponding table to obtain a second event map after all the event nodes are recombined.

Further, the method also comprises the following steps:

traversing the matter data in each storage unit at intervals of a preset time period, and acquiring the category dimension information corresponding to each matter data at the current moment;

if the category dimension information corresponding to the current moment is the same as the historical category dimension information of the corresponding storage unit, not modifying the category dimension information of the storage unit;

and if the type dimension information corresponding to the current time is different from the historical type dimension information of the corresponding storage unit, modifying the type dimension information of the storage unit, and modifying the historical type dimension information into the type dimension information corresponding to the current time.

The invention has the beneficial effects that:

(1) According to the method, corresponding first event node sets are generated by classification according to different data dimensions of each event node in an event map, then corresponding variational self-encoders are determined according to the data dimensions of the first event node sets, different encoding modes are realized by using different variational self-encoders, and the corresponding variational self-encoders perform encoding operation on the corresponding event nodes so as to improve the integrity of corresponding data and reduce the loss amount of data information when the event nodes with different data dimensions are compressed; the decoding operation is similar to that of the corresponding matter nodes, the adaptive variational self-encoder is also adopted to decode the corresponding matter nodes, and meanwhile, in order to recombine the decoded matter data, the scheme can be combined with the node labels and the node connection information in the node corresponding table to accurately recombine to obtain a second matter map, so that the data in the matter map can be efficiently migrated;

(2) When the variational self-encoder corresponding to the compared encoding dimension information does not exist in the encoder selection table, two methods for matching the optimal variational self-encoder are laid out according to the inclusion relation between the difference dimension information and the preset encoding dimension information; the first is corresponding to the complete containing condition, and the optimal variational self-encoder is directly matched for the corresponding difference dimension information; the second one is corresponding to the situation of incomplete inclusion, the scheme calculates the similarity between the difference dimension information and each preset encoding dimension information to obtain a similarity coefficient, and finds the optimal variational self-encoder according to the similarity coefficient, so that the data loss can be reduced to the maximum extent;

(3) The invention can update the data dimension of the event nodes in real time, does not need to acquire the data dimension again when data migration is carried out next time, and directly classifies the event nodes, thereby reducing the data processing amount.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic diagram of a case map provided by the present invention.

Detailed Description

In order that the manner in which the present invention is attained and can be more readily understood, a more particular description of the invention briefly summarized above may be had by reference to the embodiments thereof which are illustrated in the appended drawings.

First, the variational autocoder is explained:

a variational self-encoder is a structure consisting of an encoder and a decoder, trained to minimize the reconstruction error between the encoded and decoded data and the original data. Compared with a common encoder for compressing data, the variational self-encoder has the advantages of small information loss and small error. In machine learning, dimensionality reduction is the process of reducing the number of features that describe data. Dimensionality reduction can be done by selection (keeping only some existing features) or by extraction (generating a smaller number of new features based on old feature combinations). Dimensionality reduction is useful in many scenarios where low-dimensional data (data visualization, data storage, heavy computation.) is required. Although there are many different dimension reduction methods, we can build an overall framework that is applicable to most methods. First, we call the encoder the process of generating a "new feature" representation (by selection or extraction) from an "old feature" representation, and then call its inverse the decoding. Dimension reduction may be understood as data compression, where an encoder compresses data (from an initial space to an encoded space, also called a latent space), while a decoder is used for decompression. Of course, depending on the initial data distribution, the implicit spatial size, and the choice of encoder, compression may be lossy, i.e., some information may be lost during encoding and not recovered at decoding.

Therefore, for the variational autocoder, for different data types, the data loss amount is different during the encoding and decoding processes of the variational autocoder, for example, when the variational autocoder a processes the data of the text type, the data loss amount is low, but when the variational autocoder a processes the data of the video type, the data loss amount may be high. According to the scheme, the corresponding variational self-encoder is selected for data processing according to different data types, so that the overall data loss in the data processing process is low.

Referring to fig. 1, which is a schematic diagram of a case map according to an embodiment of the present invention, there are 7 case nodes in the diagram, where each case node stores corresponding data, for example, the case node 1 stores data whose data type is text data, the case node 2 stores data whose data type is image data, the case node 3 stores data whose data type is audio data, the case node 4 stores data whose data type is video data, the case node 5 stores data whose data type is text data and video data, the case node 6 stores data whose data type is text data, video data, and image data, and the case node 7 stores data whose data type is text data, image data, audio data, and video data. It should be noted that the above-mentioned event nodes are only examples, and in practical applications, there are a plurality of event nodes in a event graph, and some event nodes store the same data type.

The invention provides a data processing method of a case atlas based on a variational self-encoder, which comprises the following steps of S1-S5:

s1, adding a corresponding node label to each event node in the first event map, and obtaining corresponding node connection information according to the connection relation of each event node.

It can be understood that, when the data of the event graph is migrated, the data of each event node is processed, at this time, the processed data of each event node is dispersed, in order to make the processed nodes conform to the original connection relationship, referring to fig. 1, the method adds corresponding node labels to the event nodes, and simultaneously obtains corresponding node connection information, and after the data processing is completed, the original connection relationship of the nodes in the event graph can be recovered according to the node connection information.

Illustratively, the node labels may be "1", "2", "3", "4", "5", "6", "7" in fig. 1; the node connection information is, for example, "2 is connected to 1, 3, and 4", "4 is connected to 5 and 2", and the like.

In some embodiments, the S1 comprises:

and randomly selecting a matter node in the first matter graph as a starting point, and adding a corresponding node label for each matter node by starting from the starting point. Referring to fig. 1, the case node corresponding to "1" may be a starting point, and then a corresponding node label is added to each case node.

And when the node label is added to each affair node, determining the connection relation corresponding to each affair node to obtain the node connection information corresponding to each affair node. Referring to fig. 1, the node connection information is, for example, "2 is connected to 1, 3, and 4", "4 is connected to 5 and 2", and the like.

And counting the node labels and the node connection information of each event node to generate a node correspondence table. According to the scheme, the node correspondence table is generated according to the node labels and the node connection information, and after subsequent data processing is completed, corresponding event nodes are correspondingly connected according to the node correspondence table.

And S2, acquiring the data dimension of each event node in the first event map, and classifying all the event nodes according to the data dimension to obtain a plurality of first event node sets.

According to the scheme, the data dimension of each event node in the first event map is obtained, the data dimension is at least one of character dimension information, graph dimension information, audio dimension information and video dimension information, and then all the event nodes are classified to obtain a plurality of first event node sets.

In some embodiments, the S2 comprises:

and acquiring the data type of the data corresponding to each event node, wherein the data type is at least one of character data, image data, audio data and video data.

And determining the data dimension of the corresponding affair node according to the data type of the data corresponding to each affair node, wherein the data dimension has at least one type dimension information, and the type dimension information is at least one of character dimension information, graph dimension information, audio dimension information and video dimension information. It can be understood that if the data type corresponding to one event node is text data and image data, there are 2 corresponding data dimensions, which are text dimension information and graph dimension information, respectively.

And classifying the event nodes with the same quantity of the category dimension information and dimension types to obtain a first event node set.

Illustratively, the corresponding category dimension information in the event node 1 is character dimension information, and the corresponding category dimension information in the event node 2 is also character dimension information, so the scheme classifies the event node 1 and the event node 2 into a first event node set.

For another example, the corresponding kind dimension information in the matter node 3 is character dimension information and graph dimension information, and the corresponding kind dimension information in the matter node 4 is also character dimension information and graph dimension information, so that the matter node 3 and the matter node 4 are classified into the first matter node set according to the scheme.

It can be understood that there are a plurality of first event node sets in the present scheme, and the number and the dimension types of the category dimension information of the nodes in each first event node set are identical.

And S3, determining a corresponding variational self-encoder according to the data dimension of each first event node set, and sequentially encoding the event data of each event node in the first event node set according to the encoding units of the variational self-encoder to obtain a plurality of encoded set data.

After the first event node set is determined, the corresponding variation self-encoder is determined according to the data dimension of the first event node set, for example, the data dimension includes one kind dimension information, the kind dimension information is character dimension information, then the variation self-encoder used for processing character type data needs to be matched, and the variation self-encoder is used for sequentially encoding the event data of each event node in the first event node set to obtain a plurality of encoding set data.

It can be understood that, with the above embodiment, a corresponding variation self-encoder can be matched for each first event node set, and the loss amount of the variation self-encoder in the data processing process can be made lower.

In some embodiments, the S3 includes S31-S33:

and S31, extracting all data dimensions of each first event node set to obtain coding dimension information.

For example, the data dimension includes a kind dimension information, and the kind dimension information is a text dimension information, and then the corresponding encoding dimension information may be of a corresponding text.

And S32, comparing the coding dimension information with a preset encoder selection table to obtain a corresponding variational self-encoder, wherein the encoder selection table has a corresponding relation between each piece of coding dimension information and the variational self-encoder.

The encoder selection table may be manually preset, and the encoder selection table has corresponding information between the encoding dimension information and the variation self-encoder, for example, if the encoding dimension information is corresponding to a text, the corresponding variation self-encoder is the optimum variation self-encoder for processing text data.

In some embodiments, S32 (comparing the encoding dimension information with a preset encoder selection table to obtain a corresponding variational self-encoder, where the encoder selection table has a corresponding relationship between each encoding dimension information and the variational self-encoder) includes A1 to A3:

and A1, if the variational self-encoder corresponding to the compared encoding dimension information does not exist in the encoder selection table, using the compared encoding dimension information as difference dimension information.

It can be understood that since there are many event nodes and corresponding data types, there may be a variation self-encoder that does not exist in the encoder selection table and corresponds to the encoding dimension information for comparison.

Illustratively, the encoding dimension information is for text and video, but there is no variation self-encoder for text and video in the encoder selection table. In this case, the encoding dimension information corresponding to the text and the video is the difference dimension information.

And A2, comparing the difference dimension information with the preset coding dimension information in a coder selection table once.

At this time, the difference dimension information is compared with the preset encoding dimension information in the encoder selection table once.

And A3, if the difference dimension information is judged to be completely contained by one of the preset encoding dimension information, taking the variation self-encoder corresponding to the preset encoding dimension information as a variation self-encoder of the compared encoding dimension information.

According to the scheme, if the difference dimension information is judged to be completely contained by one preset encoding dimension information, the variational self-encoder corresponding to the preset encoding dimension information is used as the variational self-encoder of the encoding dimension information for comparison.

Illustratively, the difference dimension information corresponds to "text + video", and one preset encoding dimension information corresponds to "text + video + picture", so that the preset encoding dimension information completely contains the difference dimension information, and the scheme directly uses the variational self-encoder corresponding to the preset encoding dimension information as the variational self-encoder of the encoding dimension information for comparison.

It can be understood that, with the above embodiment, when there is no variation self-encoder corresponding to the compared encoding dimension information in the encoder selection table, the scheme may match the corresponding difference dimension information to the optimal variation self-encoder, so as to reduce the data loss to the greatest extent.

In other embodiments, S32 (the encoding dimension information is compared with a preset encoder selection table to obtain a corresponding variational self-encoder, where the encoder selection table has a corresponding relationship between each encoding dimension information and the variational self-encoder) includes B1 to B2:

and B1, if the difference dimension information is judged not to be completely contained by any one preset encoding dimension information, calculating the similarity of the difference dimension information and each preset encoding dimension information to obtain a similarity coefficient.

According to the scheme, when the difference dimension information is judged not to be completely contained by any one preset coding dimension information, the similarity of the difference dimension information and each preset coding dimension information is calculated to obtain a similarity coefficient, and the optimal variational self-encoder is found according to the similarity coefficient.

Wherein B1 (if it is determined that the difference dimension information is not completely contained in any one of the preset encoding dimension information, calculating a similarity between the difference dimension information and each of the preset encoding dimension information to obtain a similarity coefficient) includes:

and determining the number of the category dimension information in the difference dimension information to obtain a first dimension number. For example, the difference dimension information corresponds to "text + video", then the first dimension number is 2.

And determining the number of the category dimension information in each preset encoding dimension information to obtain a second dimension number. For example, the second number of dimensions may be 1, 2, 3, 4, etc.

And determining the quantity of the same type of dimension information in the difference dimension information and the preset coding dimension information to obtain the same dimension quantity. For example, the number of dimensions is 0 or 1.

wherein the content of the first and second substances,

in order to be the coefficient of the degree of similarity,

is the same number of dimensions as the number of dimensions,

for the number of the first dimension, the number of dimensions,

is a first meterThe weight is calculated,

for the number of the second dimension, the number of the first dimension,

a weight is calculated for the second.

In the above-mentioned formula,

representing the difference between the number of first dimensions and the number of same dimensions, it will be appreciated that the larger the difference, the lower the similarity,

the smaller the size of the hole is,

representing the difference between the number of the second dimension and the number of the same dimension, the larger the difference, the lower the corresponding similarity,

the smaller, wherein the first weight is calculated

Greater than the second calculated weight

Can make

The proportion of the calculation result of (2) is larger.

And B2, taking the variation self-encoder corresponding to the preset encoding dimension information with the highest difference dimension information similarity coefficient as a variation self-encoder of the encoding dimension information for comparison.

According to the scheme, after the similarity coefficient is obtained, the variational self-encoder corresponding to the preset encoding dimension information with the highest similarity coefficient is found and used as the variational self-encoder of the encoding dimension information for comparison, and the data loss can be reduced to the maximum extent.

And S33, based on the coding unit of the variational self-coder, coding the first event node set with the corresponding coding dimension information to obtain corresponding coding set data.

It can be understood that, after the variational self-encoder is obtained, the encoding unit of the variational self-encoder is used to perform encoding processing, that is, compression processing, on the first event node set with the corresponding encoding dimension information, so as to obtain corresponding encoded set data.

And S4, after the plurality of pieces of coding set data are judged to be migrated to the target space, decoding the coding set data of the corresponding data dimensionality based on a decoding unit of the variational self-encoder to obtain a plurality of second event node sets.

It can be understood that, in step S3, data is encoded, and after data migration is completed, data needs to be decoded, and after decoding, the present solution obtains a plurality of second event node sets corresponding to the plurality of first event node sets.

In some embodiments, said S4 comprises S41-S42:

and S41, after judging that the plurality of coding set data are transferred to the target space, sequentially determining a variational self-encoder corresponding to each coding set data according to the coding dimension information of each coding set data.

In the embodiment, the first event node set already determines the corresponding variational self-encoder used for encoding, and step S41 corresponds to the same variational self-encoder, and then the variational self-encoder is used to decode each encoded set data.

And S42, sequentially decoding the corresponding coding set data according to the decoding unit of each variational self-encoder to obtain a plurality of second event node sets.

After the variational self-encoder is determined, the scheme can utilize a decoding unit of the variational self-encoder to decode the migrated data to obtain a plurality of second event node sets.

After the data is decoded, a plurality of second event node sets of the event graph need to be recombined, and in the process of recombining, the scheme uses the node label corresponding to each event node for recombining.

In some embodiments, said S5 comprises S51-S54:

and S51, after judging that all the coding set data are decoded respectively to obtain a plurality of corresponding second event node sets, sequentially acquiring node labels and node connection information corresponding to each event node.

According to the scheme, after all data are decoded, a plurality of corresponding second event node sets are obtained, and then node labels and node connection information corresponding to each event node are sequentially obtained.

S52, a plurality of storage units are established in the target space, and each storage unit is marked according to the node label corresponding to each event node.

It can be understood that, in order to store data, the scheme may establish a plurality of storage units in a target space, one event node may correspond to one storage unit, and the scheme may mark each storage unit according to a node tag corresponding to each event node, so as to store corresponding data into the corresponding storage unit.

And S53, respectively storing the affair data of each affair node into a storage unit with the same node label, and establishing a data calling path of the storage unit and the corresponding affair node in the map, so that the corresponding affair node is called based on the data calling path to be displayed after being triggered.

In the scheme, in order to facilitate a user to check the event data corresponding to the event node, the data retrieval path of the storage unit and the corresponding event node in the map is also established, so that the corresponding event node can be retrieved by utilizing the data retrieval path and displayed to the user after being triggered.

And S54, determining the positions of all the affair nodes in the affair map and the connection relation between each affair node and other affair nodes according to the node labels and the node connection information of all the affair nodes in the node corresponding table, and obtaining a second affair map after all the affair nodes are recombined.

It can be understood that the positions of all the event nodes in the event graph and the connection relation between each event node and other event nodes can be obtained according to the node labels and the node connection information in the node correspondence table, so that the second event graph after all the event nodes are recombined is obtained, and data migration is realized.

In practical application, the scheme further comprises C1-C3:

and C1, traversing the matter data in each storage unit at intervals of a preset time period, and acquiring the category dimension information corresponding to each matter data at the current moment.

It can be understood that the event data in the event node is updated after a period of time, and the corresponding category dimension information is also updated.

And C2, if the type dimension information corresponding to the current moment is the same as the historical type dimension information of the corresponding storage unit, not modifying the type dimension information of the storage unit.

For example, the type dimension information corresponding to the current time is character dimension information, and the type dimension information of the history of the storage unit is also character dimension information, so that the type dimension information corresponding to the current time is the same as the type dimension information of the history of the corresponding storage unit, and at this time, the type dimension information of the storage unit does not need to be modified.

And C3, if the type dimension information corresponding to the current time is different from the historical type dimension information of the corresponding storage unit, modifying the type dimension information of the storage unit, and modifying the historical type dimension information into the type dimension information corresponding to the current time.

For example, there is no corresponding text dimension information at first, and after a period of time, the fact data in the corresponding storage unit includes text data, and then the corresponding category dimension information is also updated, that is, there is more text dimension information than before, and at this time, the category dimension information of the storage unit needs to be modified, and the historical category dimension information is modified into the category dimension information corresponding to the current time.

The embodiment can update the data dimension of the event node in real time, and when data migration is performed next time, the event node is directly classified without acquiring the data dimension in the step S2, so that the data processing amount is reduced.

In addition to the above embodiments, the present invention may have other embodiments; all technical solutions formed by adopting equivalent substitutions or equivalent transformations fall within the protection scope of the present invention.

Claims

1. The data processing method of the event map based on the variational self-encoder is characterized by comprising the following steps:

s1, adding a corresponding node label to each event node in a first event graph, and obtaining corresponding node connection information according to the connection relation of each event node;

s3, determining a corresponding variation self-encoder according to the data dimension of each first event node set, and sequentially encoding the event data of each event node in the first event node set according to the encoding units of the variation self-encoder to obtain a plurality of encoded set data;

s4, after judging that the plurality of coding set data are migrated to the target space, decoding the coding set data of the corresponding data dimension based on a decoding unit of a variation self-encoder to obtain a plurality of second event node sets;

2. The method of claim 1, wherein the event graph is a set of event graphs,

the S1 comprises:

3. The method of claim 2, wherein the event graph is a set of words of the variable self-encoder,

the S2 comprises:

4. The method of claim 3, wherein the event graph is a set of event graphs,

the S3 comprises the following steps:

comparing the coding dimension information with a preset coder selection table to obtain a corresponding variation self-coder, wherein the coder selection table has a corresponding relation between each piece of coding dimension information and the variation self-coder;

and based on the coding unit of the variational self-coder, coding the first event node set with the corresponding coding dimension information to obtain corresponding coding set data.

5. The method of claim 4, wherein the event graph is a set of words of the variable self-encoder,

comparing the coding dimension information with a preset encoder selection table to obtain a corresponding variational self-encoder, wherein the encoder selection table has a corresponding relation between each coding dimension information and the variational self-encoder, and the method comprises the following steps:

and if the difference dimension information is judged to be completely contained by one preset encoding dimension information, taking the variation self-encoder corresponding to the preset encoding dimension information as a variation self-encoder of the compared encoding dimension information.

6. The method of claim 5, wherein the event graph is a set of words of the variable self-encoder,

7. The method of claim 6, wherein the event graph is a set of words of the variable self-encoder,

if it is determined that the difference dimension information is not completely contained by any one of the preset encoding dimension information, calculating the similarity between the difference dimension information and each of the preset encoding dimension information to obtain a similarity coefficient, including:

determining the number of the same kind of dimension information in the difference dimension information and the preset coding dimension information to obtain the same dimension number;

wherein the content of the first and second substances,

in order to be the coefficient of the degree of similarity,

is the same number of dimensions as the number of dimensions,

for the number of the first dimension, the number of dimensions,

for the first calculation of the weight, the weight is calculated,

as a second number of dimensions, the number of dimensions,

a weight is calculated for the second.

8. The method of claim 4, wherein the event graph is a set of words of the variable self-encoder,

the S4 comprises the following steps:

and sequentially decoding the corresponding coding set data according to the decoding unit of each variational self-encoder to obtain a plurality of second affair node sets.

9. The variation self-encoder case graph-based data processing method according to claim 8,

the S5 comprises the following steps:

and determining the positions of all the affair nodes in the affair map and the connection relation between each affair node and other affair nodes according to the node labels and the node connection information of all the affair nodes in the node corresponding table to obtain a second affair map after all the affair nodes are recombined.

10. The method of claim 9, further comprising: