CN114996483B - Event map data processing method based on variational self-encoder - Google Patents

Event map data processing method based on variational self-encoder Download PDF

Info

Publication number
CN114996483B
CN114996483B CN202210929367.6A CN202210929367A CN114996483B CN 114996483 B CN114996483 B CN 114996483B CN 202210929367 A CN202210929367 A CN 202210929367A CN 114996483 B CN114996483 B CN 114996483B
Authority
CN
China
Prior art keywords
dimension information
node
data
event
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210929367.6A
Other languages
Chinese (zh)
Other versions
CN114996483A (en
Inventor
蒋炜
魏晓菁
王红凯
冯珺
赵帅
王艺丹
张烨华
徐弢
陈文健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
State Grid Zhejiang Electric Power Co Ltd
NARI Group Corp
Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
Lenovo Beijing Ltd
State Grid Zhejiang Electric Power Co Ltd
NARI Group Corp
Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd, State Grid Zhejiang Electric Power Co Ltd, NARI Group Corp, Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202210929367.6A priority Critical patent/CN114996483B/en
Publication of CN114996483A publication Critical patent/CN114996483A/en
Application granted granted Critical
Publication of CN114996483B publication Critical patent/CN114996483B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Abstract

The invention discloses a data processing method of a case map based on a variational self-encoder, which comprises the following steps: acquiring the data dimension of each event node in the first event map, and classifying all the event nodes according to the data dimension to obtain a plurality of first event node sets; determining a corresponding variational self-encoder according to the data dimension of each first event node set, and sequentially encoding the event data of each event node in the first event node set to obtain a plurality of encoded set data; decoding the encoded set data of the corresponding data dimension to obtain a plurality of second event node sets based on a decoding unit of the variational self-encoder; and extracting the matter data in each second matter node set, and recombining the matter graph according to the matter data and the node labels corresponding to each matter node to obtain the second matter graph. The invention can reduce the loss of data information and efficiently realize the data migration in the event graph.

Description

Data processing method of event map based on variational self-encoder
Technical Field
The invention relates to the technical field of data processing, in particular to a data processing method of a case map based on a variational self-encoder.
Background
A variational autoencoder is a structure consisting of an encoder and a decoder, trained to minimize the reconstruction error between the encoded and decoded data and the original data. Compared with a common encoder for compressing data, the variational self-encoder has the advantages of small information loss and small error.
The data size in the case map is large, and when the data in the case map is migrated, in order to reduce the data size of the case map in the migration process, the data migration can be performed through the variational self-encoder, so that the data error amount of the case map is reduced.
However, in an actual application scenario, data dimensions of each event node in the event graph may be different, so when event nodes with different data dimensions are compressed, in order to improve high integrity of data corresponding to the event nodes, an adaptive variable self-encoder needs to be selected according to the data dimensions for data encoding, in the prior art, the event nodes in the event graph cannot be classified according to the data dimensions, different encoding modes are adopted, and the event nodes cannot be recombined after being decoded to obtain the event graph after migration. The current affair atlas migration has the defects of low efficiency and large data information loss.
Disclosure of Invention
The invention overcomes the defects of the prior art, and provides a data processing method of a matter graph based on a variational self-encoder, which can reduce the loss amount of data information and efficiently realize the migration of data in the matter graph.
In order to solve the technical problems, the technical scheme of the invention is as follows:
the invention provides a data processing method of a case atlas based on a variational self-encoder, which comprises the following steps:
s1, obtaining corresponding node connection information for a node label corresponding to each event node in a first event map according to the connection relation of each event node;
s2, acquiring the data dimension of each event node in the first event map, and classifying all event nodes according to the data dimension to obtain a plurality of first event node sets;
s3, determining a corresponding variational self-encoder according to the data dimension of each first event node set, and sequentially encoding the event data of each event node in the first event node set according to the encoding units of the variational self-encoder to obtain a plurality of encoded set data;
s4, after the plurality of pieces of coding set data are judged to be migrated to the target space, decoding the coding set data of the corresponding data dimensionality based on a decoding unit of the variational self-encoder to obtain a plurality of second event node sets;
and S5, extracting the matter data in each second matter node set, and recombining the matter map according to the matter data and the node labels corresponding to the matter nodes to obtain the second matter map.
Further, the S1 includes:
randomly selecting a matter node in the first matter graph as a starting point, and adding a corresponding node label for each matter node by starting from the starting point;
when a node label is added to each affair node, determining the connection relation corresponding to each affair node to obtain the node connection information corresponding to each affair node;
and counting the node labels and the node connection information of each event node to generate a node correspondence table.
Further, the S2 includes:
acquiring a data type of data corresponding to each event node, wherein the data type is at least one of character data, image data, audio data and video data;
determining data dimensions of corresponding affair nodes according to data types of data corresponding to each affair node, wherein the data dimensions comprise at least one type dimension information, and the type dimension information is at least one of character dimension information, graph dimension information, audio dimension information and video dimension information;
and classifying the event nodes with the same category dimension information quantity and dimension type to obtain a first event node set.
Further, the S3 includes:
extracting all data dimensions of each first event node set to obtain coding dimension information;
comparing the coding dimension information with a preset encoder selection table to obtain a corresponding variational self-encoder, wherein the encoder selection table has a corresponding relation between each piece of coding dimension information and the variational self-encoder;
and based on the coding unit of the variation self-coder, coding the first event node set with the corresponding coding dimension information to obtain corresponding coding set data.
Further, the comparing the encoding dimension information with a preset encoder selection table to obtain a corresponding variational self-encoder, where the encoder selection table has a corresponding relationship between each encoding dimension information and the variational self-encoder, and includes:
if the variational self-encoder corresponding to the compared encoding dimension information does not exist in the encoder selection table, the compared encoding dimension information is used as difference dimension information;
comparing the difference dimension information with encoding dimension information preset in an encoder selection table for the first time;
and if the difference dimension information is judged to be completely contained by one preset encoding dimension information, taking the variation self-encoder corresponding to the preset encoding dimension information as the variation self-encoder of the compared encoding dimension information.
Further, the comparing the encoding dimension information with a preset encoder selection table to obtain a corresponding variation self-encoder, where the encoder selection table has a corresponding relationship between each encoding dimension information and the variation self-encoder, and includes:
if the difference dimension information is judged not to be completely contained by any one preset encoding dimension information, calculating the similarity of the difference dimension information and each preset encoding dimension information to obtain a similarity coefficient;
and taking the variational self-encoder corresponding to the preset encoding dimension information with the highest similarity coefficient of the difference dimension information as the variational self-encoder of the encoding dimension information for comparison.
Further, if it is determined that the difference dimension information is not completely contained in any preset encoding dimension information, calculating the similarity between the difference dimension information and each preset encoding dimension information to obtain a similarity coefficient, including:
determining the number of the category dimension information in the difference dimension information to obtain a first dimension number;
determining the number of category dimension information in each preset encoding dimension information to obtain a second dimension number;
determining the number of the same type of dimension information in the difference dimension information and preset coding dimension information to obtain the number of the same dimension;
calculating according to the first dimension number, the second dimension number and the same dimension number to obtain the similarity between the difference dimension information and each preset encoding dimension information, obtaining a similarity coefficient according to the similarity, calculating the similarity coefficient by the following formula,
Figure 627150DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 991135DEST_PATH_IMAGE003
in order to be the coefficient of the degree of similarity,
Figure 554972DEST_PATH_IMAGE004
the number of the dimensions being the same as the number of the dimensions,
Figure 269987DEST_PATH_IMAGE005
for the number of the first dimension, the number of dimensions,
Figure 203570DEST_PATH_IMAGE006
for the first calculation of the weight, the weight is calculated,
Figure 422062DEST_PATH_IMAGE007
for the number of the second dimension, the number of the first dimension,
Figure 281433DEST_PATH_IMAGE008
a weight is calculated for the second.
Further, the S4 includes:
after the plurality of coding set data are judged to be migrated to a target space, sequentially determining a variational self-encoder corresponding to each coding set data according to the coding dimension information of each coding set data;
and sequentially decoding the corresponding coding set data according to the decoding unit of each variational self-encoder to obtain a plurality of second event node sets.
Further, the S5 includes:
after all the coded set data are judged to be decoded respectively to obtain a plurality of corresponding second event node sets, node labels and node connection information corresponding to each event node are obtained in sequence;
establishing a plurality of storage units in a target space, and marking each storage unit according to a node label corresponding to each event node;
respectively storing the affair data of each affair node into a storage unit with the same node label, and establishing a data calling path of the storage unit and the corresponding affair node in the map, so that the corresponding affair node is called based on the data calling path to be displayed after being triggered;
and determining the positions of all the event nodes in the event map and the connection relation between each event node and other event nodes according to the node labels and the node connection information of all the event nodes in the node corresponding table to obtain a second event map after all the event nodes are recombined.
Further, the method also comprises the following steps:
traversing the matter data in each storage unit at intervals of a preset time period, and acquiring the category dimension information corresponding to each matter data at the current moment;
if the category dimension information corresponding to the current moment is the same as the historical category dimension information of the corresponding storage unit, not modifying the category dimension information of the storage unit;
and if the type dimension information corresponding to the current time is different from the historical type dimension information of the corresponding storage unit, modifying the type dimension information of the storage unit, and modifying the historical type dimension information into the type dimension information corresponding to the current time.
The invention has the beneficial effects that:
(1) According to the method, corresponding first event node sets are generated by classification according to different data dimensions of each event node in an event map, then corresponding variational self-encoders are determined according to the data dimensions of the first event node sets, different encoding modes are realized by using different variational self-encoders, and the corresponding variational self-encoders perform encoding operation on the corresponding event nodes so as to improve the integrity of corresponding data and reduce the loss amount of data information when the event nodes with different data dimensions are compressed; the decoding operation is similar to that of the corresponding matter nodes, the adaptive variational self-encoder is also adopted to decode the corresponding matter nodes, and meanwhile, in order to recombine the decoded matter data, the scheme can be combined with the node labels and the node connection information in the node corresponding table to accurately recombine to obtain a second matter map, so that the data in the matter map can be efficiently migrated;
(2) When the variational self-encoder corresponding to the compared encoding dimension information does not exist in the encoder selection table, two methods for matching the optimal variational self-encoder are laid out according to the inclusion relation between the difference dimension information and the preset encoding dimension information; the first is corresponding to the complete containing condition, and the optimal variational self-encoder is directly matched for the corresponding difference dimension information; the second one is corresponding to the situation of incomplete inclusion, the scheme calculates the similarity between the difference dimension information and each preset encoding dimension information to obtain a similarity coefficient, and finds the optimal variational self-encoder according to the similarity coefficient, so that the data loss can be reduced to the maximum extent;
(3) The invention can update the data dimension of the event nodes in real time, does not need to acquire the data dimension again when data migration is carried out next time, and directly classifies the event nodes, thereby reducing the data processing amount.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic diagram of a case map provided by the present invention.
Detailed Description
In order that the manner in which the present invention is attained and can be more readily understood, a more particular description of the invention briefly summarized above may be had by reference to the embodiments thereof which are illustrated in the appended drawings.
First, the variational autocoder is explained:
a variational self-encoder is a structure consisting of an encoder and a decoder, trained to minimize the reconstruction error between the encoded and decoded data and the original data. Compared with a common encoder for compressing data, the variational self-encoder has the advantages of small information loss and small error. In machine learning, dimensionality reduction is the process of reducing the number of features that describe data. Dimensionality reduction can be done by selection (keeping only some existing features) or by extraction (generating a smaller number of new features based on old feature combinations). Dimensionality reduction is useful in many scenarios where low-dimensional data (data visualization, data storage, heavy computation.) is required. Although there are many different dimension reduction methods, we can build an overall framework that is applicable to most methods. First, we call the encoder the process of generating a "new feature" representation (by selection or extraction) from an "old feature" representation, and then call its inverse the decoding. Dimension reduction may be understood as data compression, where an encoder compresses data (from an initial space to an encoded space, also called a latent space), while a decoder is used for decompression. Of course, depending on the initial data distribution, the implicit spatial size, and the choice of encoder, compression may be lossy, i.e., some information may be lost during encoding and not recovered at decoding.
Therefore, for the variational autocoder, for different data types, the data loss amount is different during the encoding and decoding processes of the variational autocoder, for example, when the variational autocoder a processes the data of the text type, the data loss amount is low, but when the variational autocoder a processes the data of the video type, the data loss amount may be high. According to the scheme, the corresponding variational self-encoder is selected for data processing according to different data types, so that the overall data loss in the data processing process is low.
Referring to fig. 1, which is a schematic diagram of a case map according to an embodiment of the present invention, there are 7 case nodes in the diagram, where each case node stores corresponding data, for example, the case node 1 stores data whose data type is text data, the case node 2 stores data whose data type is image data, the case node 3 stores data whose data type is audio data, the case node 4 stores data whose data type is video data, the case node 5 stores data whose data type is text data and video data, the case node 6 stores data whose data type is text data, video data, and image data, and the case node 7 stores data whose data type is text data, image data, audio data, and video data. It should be noted that the above-mentioned event nodes are only examples, and in practical applications, there are a plurality of event nodes in a event graph, and some event nodes store the same data type.
The invention provides a data processing method of a case atlas based on a variational self-encoder, which comprises the following steps of S1-S5:
s1, adding a corresponding node label to each event node in the first event map, and obtaining corresponding node connection information according to the connection relation of each event node.
It can be understood that, when the data of the event graph is migrated, the data of each event node is processed, at this time, the processed data of each event node is dispersed, in order to make the processed nodes conform to the original connection relationship, referring to fig. 1, the method adds corresponding node labels to the event nodes, and simultaneously obtains corresponding node connection information, and after the data processing is completed, the original connection relationship of the nodes in the event graph can be recovered according to the node connection information.
Illustratively, the node labels may be "1", "2", "3", "4", "5", "6", "7" in fig. 1; the node connection information is, for example, "2 is connected to 1, 3, and 4", "4 is connected to 5 and 2", and the like.
In some embodiments, the S1 comprises:
and randomly selecting a matter node in the first matter graph as a starting point, and adding a corresponding node label for each matter node by starting from the starting point. Referring to fig. 1, the case node corresponding to "1" may be a starting point, and then a corresponding node label is added to each case node.
And when the node label is added to each affair node, determining the connection relation corresponding to each affair node to obtain the node connection information corresponding to each affair node. Referring to fig. 1, the node connection information is, for example, "2 is connected to 1, 3, and 4", "4 is connected to 5 and 2", and the like.
And counting the node labels and the node connection information of each event node to generate a node correspondence table. According to the scheme, the node correspondence table is generated according to the node labels and the node connection information, and after subsequent data processing is completed, corresponding event nodes are correspondingly connected according to the node correspondence table.
And S2, acquiring the data dimension of each event node in the first event map, and classifying all the event nodes according to the data dimension to obtain a plurality of first event node sets.
According to the scheme, the data dimension of each event node in the first event map is obtained, the data dimension is at least one of character dimension information, graph dimension information, audio dimension information and video dimension information, and then all the event nodes are classified to obtain a plurality of first event node sets.
In some embodiments, the S2 comprises:
and acquiring the data type of the data corresponding to each event node, wherein the data type is at least one of character data, image data, audio data and video data.
And determining the data dimension of the corresponding affair node according to the data type of the data corresponding to each affair node, wherein the data dimension has at least one type dimension information, and the type dimension information is at least one of character dimension information, graph dimension information, audio dimension information and video dimension information. It can be understood that if the data type corresponding to one event node is text data and image data, there are 2 corresponding data dimensions, which are text dimension information and graph dimension information, respectively.
And classifying the event nodes with the same quantity of the category dimension information and dimension types to obtain a first event node set.
Illustratively, the corresponding category dimension information in the event node 1 is character dimension information, and the corresponding category dimension information in the event node 2 is also character dimension information, so the scheme classifies the event node 1 and the event node 2 into a first event node set.
For another example, the corresponding kind dimension information in the matter node 3 is character dimension information and graph dimension information, and the corresponding kind dimension information in the matter node 4 is also character dimension information and graph dimension information, so that the matter node 3 and the matter node 4 are classified into the first matter node set according to the scheme.
It can be understood that there are a plurality of first event node sets in the present scheme, and the number and the dimension types of the category dimension information of the nodes in each first event node set are identical.
And S3, determining a corresponding variational self-encoder according to the data dimension of each first event node set, and sequentially encoding the event data of each event node in the first event node set according to the encoding units of the variational self-encoder to obtain a plurality of encoded set data.
After the first event node set is determined, the corresponding variation self-encoder is determined according to the data dimension of the first event node set, for example, the data dimension includes one kind dimension information, the kind dimension information is character dimension information, then the variation self-encoder used for processing character type data needs to be matched, and the variation self-encoder is used for sequentially encoding the event data of each event node in the first event node set to obtain a plurality of encoding set data.
It can be understood that, with the above embodiment, a corresponding variation self-encoder can be matched for each first event node set, and the loss amount of the variation self-encoder in the data processing process can be made lower.
In some embodiments, the S3 includes S31-S33:
and S31, extracting all data dimensions of each first event node set to obtain coding dimension information.
For example, the data dimension includes a kind dimension information, and the kind dimension information is a text dimension information, and then the corresponding encoding dimension information may be of a corresponding text.
And S32, comparing the coding dimension information with a preset encoder selection table to obtain a corresponding variational self-encoder, wherein the encoder selection table has a corresponding relation between each piece of coding dimension information and the variational self-encoder.
The encoder selection table may be manually preset, and the encoder selection table has corresponding information between the encoding dimension information and the variation self-encoder, for example, if the encoding dimension information is corresponding to a text, the corresponding variation self-encoder is the optimum variation self-encoder for processing text data.
In some embodiments, S32 (comparing the encoding dimension information with a preset encoder selection table to obtain a corresponding variational self-encoder, where the encoder selection table has a corresponding relationship between each encoding dimension information and the variational self-encoder) includes A1 to A3:
and A1, if the variational self-encoder corresponding to the compared encoding dimension information does not exist in the encoder selection table, using the compared encoding dimension information as difference dimension information.
It can be understood that since there are many event nodes and corresponding data types, there may be a variation self-encoder that does not exist in the encoder selection table and corresponds to the encoding dimension information for comparison.
Illustratively, the encoding dimension information is for text and video, but there is no variation self-encoder for text and video in the encoder selection table. In this case, the encoding dimension information corresponding to the text and the video is the difference dimension information.
And A2, comparing the difference dimension information with the preset coding dimension information in a coder selection table once.
At this time, the difference dimension information is compared with the preset encoding dimension information in the encoder selection table once.
And A3, if the difference dimension information is judged to be completely contained by one of the preset encoding dimension information, taking the variation self-encoder corresponding to the preset encoding dimension information as a variation self-encoder of the compared encoding dimension information.
According to the scheme, if the difference dimension information is judged to be completely contained by one preset encoding dimension information, the variational self-encoder corresponding to the preset encoding dimension information is used as the variational self-encoder of the encoding dimension information for comparison.
Illustratively, the difference dimension information corresponds to "text + video", and one preset encoding dimension information corresponds to "text + video + picture", so that the preset encoding dimension information completely contains the difference dimension information, and the scheme directly uses the variational self-encoder corresponding to the preset encoding dimension information as the variational self-encoder of the encoding dimension information for comparison.
It can be understood that, with the above embodiment, when there is no variation self-encoder corresponding to the compared encoding dimension information in the encoder selection table, the scheme may match the corresponding difference dimension information to the optimal variation self-encoder, so as to reduce the data loss to the greatest extent.
In other embodiments, S32 (the encoding dimension information is compared with a preset encoder selection table to obtain a corresponding variational self-encoder, where the encoder selection table has a corresponding relationship between each encoding dimension information and the variational self-encoder) includes B1 to B2:
and B1, if the difference dimension information is judged not to be completely contained by any one preset encoding dimension information, calculating the similarity of the difference dimension information and each preset encoding dimension information to obtain a similarity coefficient.
According to the scheme, when the difference dimension information is judged not to be completely contained by any one preset coding dimension information, the similarity of the difference dimension information and each preset coding dimension information is calculated to obtain a similarity coefficient, and the optimal variational self-encoder is found according to the similarity coefficient.
Wherein B1 (if it is determined that the difference dimension information is not completely contained in any one of the preset encoding dimension information, calculating a similarity between the difference dimension information and each of the preset encoding dimension information to obtain a similarity coefficient) includes:
and determining the number of the category dimension information in the difference dimension information to obtain a first dimension number. For example, the difference dimension information corresponds to "text + video", then the first dimension number is 2.
And determining the number of the category dimension information in each preset encoding dimension information to obtain a second dimension number. For example, the second number of dimensions may be 1, 2, 3, 4, etc.
And determining the quantity of the same type of dimension information in the difference dimension information and the preset coding dimension information to obtain the same dimension quantity. For example, the number of dimensions is 0 or 1.
Calculating according to the first dimension number, the second dimension number and the same dimension number to obtain the similarity between the difference dimension information and each preset encoding dimension information, obtaining a similarity coefficient according to the similarity, calculating the similarity coefficient by the following formula,
Figure 218165DEST_PATH_IMAGE009
wherein the content of the first and second substances,
Figure 594920DEST_PATH_IMAGE010
in order to be the coefficient of the degree of similarity,
Figure 434962DEST_PATH_IMAGE004
is the same number of dimensions as the number of dimensions,
Figure 465235DEST_PATH_IMAGE005
for the number of the first dimension, the number of dimensions,
Figure 889263DEST_PATH_IMAGE006
is a first meterThe weight is calculated,
Figure 928764DEST_PATH_IMAGE007
for the number of the second dimension, the number of the first dimension,
Figure 346014DEST_PATH_IMAGE008
a weight is calculated for the second.
In the above-mentioned formula,
Figure 16030DEST_PATH_IMAGE011
representing the difference between the number of first dimensions and the number of same dimensions, it will be appreciated that the larger the difference, the lower the similarity,
Figure 927354DEST_PATH_IMAGE012
the smaller the size of the hole is,
Figure 239387DEST_PATH_IMAGE013
representing the difference between the number of the second dimension and the number of the same dimension, the larger the difference, the lower the corresponding similarity,
Figure 54022DEST_PATH_IMAGE014
the smaller, wherein the first weight is calculated
Figure 567042DEST_PATH_IMAGE006
Greater than the second calculated weight
Figure 700084DEST_PATH_IMAGE008
Can make
Figure DEST_PATH_IMAGE015
The proportion of the calculation result of (2) is larger.
And B2, taking the variation self-encoder corresponding to the preset encoding dimension information with the highest difference dimension information similarity coefficient as a variation self-encoder of the encoding dimension information for comparison.
According to the scheme, after the similarity coefficient is obtained, the variational self-encoder corresponding to the preset encoding dimension information with the highest similarity coefficient is found and used as the variational self-encoder of the encoding dimension information for comparison, and the data loss can be reduced to the maximum extent.
And S33, based on the coding unit of the variational self-coder, coding the first event node set with the corresponding coding dimension information to obtain corresponding coding set data.
It can be understood that, after the variational self-encoder is obtained, the encoding unit of the variational self-encoder is used to perform encoding processing, that is, compression processing, on the first event node set with the corresponding encoding dimension information, so as to obtain corresponding encoded set data.
And S4, after the plurality of pieces of coding set data are judged to be migrated to the target space, decoding the coding set data of the corresponding data dimensionality based on a decoding unit of the variational self-encoder to obtain a plurality of second event node sets.
It can be understood that, in step S3, data is encoded, and after data migration is completed, data needs to be decoded, and after decoding, the present solution obtains a plurality of second event node sets corresponding to the plurality of first event node sets.
In some embodiments, said S4 comprises S41-S42:
and S41, after judging that the plurality of coding set data are transferred to the target space, sequentially determining a variational self-encoder corresponding to each coding set data according to the coding dimension information of each coding set data.
In the embodiment, the first event node set already determines the corresponding variational self-encoder used for encoding, and step S41 corresponds to the same variational self-encoder, and then the variational self-encoder is used to decode each encoded set data.
And S42, sequentially decoding the corresponding coding set data according to the decoding unit of each variational self-encoder to obtain a plurality of second event node sets.
After the variational self-encoder is determined, the scheme can utilize a decoding unit of the variational self-encoder to decode the migrated data to obtain a plurality of second event node sets.
And S5, extracting the matter data in each second matter node set, and recombining the matter map according to the matter data and the node labels corresponding to the matter nodes to obtain the second matter map.
After the data is decoded, a plurality of second event node sets of the event graph need to be recombined, and in the process of recombining, the scheme uses the node label corresponding to each event node for recombining.
In some embodiments, said S5 comprises S51-S54:
and S51, after judging that all the coding set data are decoded respectively to obtain a plurality of corresponding second event node sets, sequentially acquiring node labels and node connection information corresponding to each event node.
According to the scheme, after all data are decoded, a plurality of corresponding second event node sets are obtained, and then node labels and node connection information corresponding to each event node are sequentially obtained.
S52, a plurality of storage units are established in the target space, and each storage unit is marked according to the node label corresponding to each event node.
It can be understood that, in order to store data, the scheme may establish a plurality of storage units in a target space, one event node may correspond to one storage unit, and the scheme may mark each storage unit according to a node tag corresponding to each event node, so as to store corresponding data into the corresponding storage unit.
And S53, respectively storing the affair data of each affair node into a storage unit with the same node label, and establishing a data calling path of the storage unit and the corresponding affair node in the map, so that the corresponding affair node is called based on the data calling path to be displayed after being triggered.
In the scheme, in order to facilitate a user to check the event data corresponding to the event node, the data retrieval path of the storage unit and the corresponding event node in the map is also established, so that the corresponding event node can be retrieved by utilizing the data retrieval path and displayed to the user after being triggered.
And S54, determining the positions of all the affair nodes in the affair map and the connection relation between each affair node and other affair nodes according to the node labels and the node connection information of all the affair nodes in the node corresponding table, and obtaining a second affair map after all the affair nodes are recombined.
It can be understood that the positions of all the event nodes in the event graph and the connection relation between each event node and other event nodes can be obtained according to the node labels and the node connection information in the node correspondence table, so that the second event graph after all the event nodes are recombined is obtained, and data migration is realized.
In practical application, the scheme further comprises C1-C3:
and C1, traversing the matter data in each storage unit at intervals of a preset time period, and acquiring the category dimension information corresponding to each matter data at the current moment.
It can be understood that the event data in the event node is updated after a period of time, and the corresponding category dimension information is also updated.
And C2, if the type dimension information corresponding to the current moment is the same as the historical type dimension information of the corresponding storage unit, not modifying the type dimension information of the storage unit.
For example, the type dimension information corresponding to the current time is character dimension information, and the type dimension information of the history of the storage unit is also character dimension information, so that the type dimension information corresponding to the current time is the same as the type dimension information of the history of the corresponding storage unit, and at this time, the type dimension information of the storage unit does not need to be modified.
And C3, if the type dimension information corresponding to the current time is different from the historical type dimension information of the corresponding storage unit, modifying the type dimension information of the storage unit, and modifying the historical type dimension information into the type dimension information corresponding to the current time.
For example, there is no corresponding text dimension information at first, and after a period of time, the fact data in the corresponding storage unit includes text data, and then the corresponding category dimension information is also updated, that is, there is more text dimension information than before, and at this time, the category dimension information of the storage unit needs to be modified, and the historical category dimension information is modified into the category dimension information corresponding to the current time.
The embodiment can update the data dimension of the event node in real time, and when data migration is performed next time, the event node is directly classified without acquiring the data dimension in the step S2, so that the data processing amount is reduced.
In addition to the above embodiments, the present invention may have other embodiments; all technical solutions formed by adopting equivalent substitutions or equivalent transformations fall within the protection scope of the present invention.

Claims (10)

1. The data processing method of the event map based on the variational self-encoder is characterized by comprising the following steps:
s1, adding a corresponding node label to each event node in a first event graph, and obtaining corresponding node connection information according to the connection relation of each event node;
s2, acquiring the data dimension of each event node in the first event map, and classifying all event nodes according to the data dimension to obtain a plurality of first event node sets;
s3, determining a corresponding variation self-encoder according to the data dimension of each first event node set, and sequentially encoding the event data of each event node in the first event node set according to the encoding units of the variation self-encoder to obtain a plurality of encoded set data;
s4, after judging that the plurality of coding set data are migrated to the target space, decoding the coding set data of the corresponding data dimension based on a decoding unit of a variation self-encoder to obtain a plurality of second event node sets;
and S5, extracting the matter data in each second matter node set, and recombining the matter map according to the matter data and the node labels corresponding to the matter nodes to obtain the second matter map.
2. The method of claim 1, wherein the event graph is a set of event graphs,
the S1 comprises:
randomly selecting a matter node in the first matter graph as a starting point, and adding a corresponding node label for each matter node by starting from the starting point;
when a node label is added to each affair node, determining the connection relation corresponding to each affair node to obtain the node connection information corresponding to each affair node;
and counting the node labels and the node connection information of each event node to generate a node correspondence table.
3. The method of claim 2, wherein the event graph is a set of words of the variable self-encoder,
the S2 comprises:
acquiring a data type of data corresponding to each event node, wherein the data type is at least one of character data, image data, audio data and video data;
determining data dimensions of corresponding affair nodes according to data types of data corresponding to each affair node, wherein the data dimensions comprise at least one type dimension information, and the type dimension information is at least one of character dimension information, graph dimension information, audio dimension information and video dimension information;
and classifying the event nodes with the same category dimension information quantity and dimension type to obtain a first event node set.
4. The method of claim 3, wherein the event graph is a set of event graphs,
the S3 comprises the following steps:
extracting all data dimensions of each first event node set to obtain coding dimension information;
comparing the coding dimension information with a preset coder selection table to obtain a corresponding variation self-coder, wherein the coder selection table has a corresponding relation between each piece of coding dimension information and the variation self-coder;
and based on the coding unit of the variational self-coder, coding the first event node set with the corresponding coding dimension information to obtain corresponding coding set data.
5. The method of claim 4, wherein the event graph is a set of words of the variable self-encoder,
comparing the coding dimension information with a preset encoder selection table to obtain a corresponding variational self-encoder, wherein the encoder selection table has a corresponding relation between each coding dimension information and the variational self-encoder, and the method comprises the following steps:
if the variational self-encoder corresponding to the compared encoding dimension information does not exist in the encoder selection table, the compared encoding dimension information is used as difference dimension information;
comparing the difference dimension information with encoding dimension information preset in an encoder selection table for the first time;
and if the difference dimension information is judged to be completely contained by one preset encoding dimension information, taking the variation self-encoder corresponding to the preset encoding dimension information as a variation self-encoder of the compared encoding dimension information.
6. The method of claim 5, wherein the event graph is a set of words of the variable self-encoder,
comparing the coding dimension information with a preset encoder selection table to obtain a corresponding variational self-encoder, wherein the encoder selection table has a corresponding relation between each coding dimension information and the variational self-encoder, and the method comprises the following steps:
if the difference dimension information is judged not to be completely contained by any one preset encoding dimension information, calculating the similarity of the difference dimension information and each preset encoding dimension information to obtain a similarity coefficient;
and taking the variational self-encoder corresponding to the preset encoding dimension information with the highest similarity coefficient of the difference dimension information as the variational self-encoder of the encoding dimension information for comparison.
7. The method of claim 6, wherein the event graph is a set of words of the variable self-encoder,
if it is determined that the difference dimension information is not completely contained by any one of the preset encoding dimension information, calculating the similarity between the difference dimension information and each of the preset encoding dimension information to obtain a similarity coefficient, including:
determining the number of the category dimension information in the difference dimension information to obtain a first dimension number;
determining the number of category dimension information in each preset encoding dimension information to obtain a second dimension number;
determining the number of the same kind of dimension information in the difference dimension information and the preset coding dimension information to obtain the same dimension number;
calculating according to the first dimension number, the second dimension number and the same dimension number to obtain the similarity between the difference dimension information and each preset encoding dimension information, obtaining a similarity coefficient according to the similarity, calculating the similarity coefficient by the following formula,
Figure 226132DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 674430DEST_PATH_IMAGE002
in order to be the coefficient of the degree of similarity,
Figure 860692DEST_PATH_IMAGE003
is the same number of dimensions as the number of dimensions,
Figure 990322DEST_PATH_IMAGE004
for the number of the first dimension, the number of dimensions,
Figure 335853DEST_PATH_IMAGE005
for the first calculation of the weight, the weight is calculated,
Figure 830419DEST_PATH_IMAGE006
as a second number of dimensions, the number of dimensions,
Figure 628611DEST_PATH_IMAGE007
a weight is calculated for the second.
8. The method of claim 4, wherein the event graph is a set of words of the variable self-encoder,
the S4 comprises the following steps:
after the plurality of coding set data are judged to be migrated to a target space, sequentially determining a variational self-encoder corresponding to each coding set data according to the coding dimension information of each coding set data;
and sequentially decoding the corresponding coding set data according to the decoding unit of each variational self-encoder to obtain a plurality of second affair node sets.
9. The variation self-encoder case graph-based data processing method according to claim 8,
the S5 comprises the following steps:
after all the coded set data are judged to be decoded respectively to obtain a plurality of corresponding second event node sets, node labels and node connection information corresponding to each event node are obtained in sequence;
establishing a plurality of storage units in a target space, and marking each storage unit according to a node label corresponding to each event node;
respectively storing the affair data of each affair node into a storage unit with the same node label, and establishing a data calling path of the storage unit and the corresponding affair node in the map, so that the corresponding affair node is called based on the data calling path to be displayed after being triggered;
and determining the positions of all the affair nodes in the affair map and the connection relation between each affair node and other affair nodes according to the node labels and the node connection information of all the affair nodes in the node corresponding table to obtain a second affair map after all the affair nodes are recombined.
10. The method of claim 9, further comprising:
traversing the matter data in each storage unit at intervals of a preset time period, and acquiring the category dimension information corresponding to each matter data at the current moment;
if the category dimension information corresponding to the current moment is the same as the historical category dimension information of the corresponding storage unit, not modifying the category dimension information of the storage unit;
and if the type dimension information corresponding to the current time is different from the historical type dimension information of the corresponding storage unit, modifying the type dimension information of the storage unit, and modifying the historical type dimension information into the type dimension information corresponding to the current time.
CN202210929367.6A 2022-08-03 2022-08-03 Event map data processing method based on variational self-encoder Active CN114996483B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210929367.6A CN114996483B (en) 2022-08-03 2022-08-03 Event map data processing method based on variational self-encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210929367.6A CN114996483B (en) 2022-08-03 2022-08-03 Event map data processing method based on variational self-encoder

Publications (2)

Publication Number Publication Date
CN114996483A CN114996483A (en) 2022-09-02
CN114996483B true CN114996483B (en) 2022-10-21

Family

ID=83021097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210929367.6A Active CN114996483B (en) 2022-08-03 2022-08-03 Event map data processing method based on variational self-encoder

Country Status (1)

Country Link
CN (1) CN114996483B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113821636A (en) * 2021-08-27 2021-12-21 上海快确信息科技有限公司 Financial text joint extraction and classification scheme based on knowledge graph
CN114612071A (en) * 2022-03-16 2022-06-10 上海正策咨仕信息科技有限公司 Data management method based on knowledge graph
CN114817575A (en) * 2022-06-24 2022-07-29 国网浙江省电力有限公司信息通信分公司 Large-scale electric power affair map processing method based on extended model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8861356B2 (en) * 2007-03-13 2014-10-14 Ntt Docomo, Inc. Method and apparatus for prioritized information delivery with network coding over time-varying network topologies

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113821636A (en) * 2021-08-27 2021-12-21 上海快确信息科技有限公司 Financial text joint extraction and classification scheme based on knowledge graph
CN114612071A (en) * 2022-03-16 2022-06-10 上海正策咨仕信息科技有限公司 Data management method based on knowledge graph
CN114817575A (en) * 2022-06-24 2022-07-29 国网浙江省电力有限公司信息通信分公司 Large-scale electric power affair map processing method based on extended model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《基于文本的实体关系抽取以及知识图谱的表示推理方法研究》;李忠坤;《CNKI》;20190522;第1-5页 *

Also Published As

Publication number Publication date
CN114996483A (en) 2022-09-02

Similar Documents

Publication Publication Date Title
JP4264492B2 (en) Image decoding device
US9349072B2 (en) Local feature based image compression
EP0934662B1 (en) Vector quantisation codebook generation method
US20240121447A1 (en) Systems, apparatus, and methods for bit level representation for data processing and analytics
CN108259911A (en) A kind of OLED screen Demura lossless date-compress, decompression method
CN103188494A (en) Apparatus and method for encoding depth image by skipping discrete cosine transform (DCT), and apparatus and method for decoding depth image by skipping DCT
CN112506879A (en) Data processing method and related equipment
CN113381768B (en) Huffman correction coding method, system and related components
Lin et al. Multistage spatial context models for learned image compression
CN114218223A (en) Protobuf protocol-based Redis data model construction and access method
CN114996483B (en) Event map data processing method based on variational self-encoder
CN108182712B (en) Image processing method, device and system
CN116600119B (en) Video encoding method, video decoding method, video encoding device, video decoding device, computer equipment and storage medium
TW201921844A (en) Dictionary-based data compression
US20230019767A1 (en) Point cloud encoding method and decoding method, encoder and decoder, and storage medium
CN103139566A (en) Method for efficient decoding of variable length codes
JPH03188768A (en) Picture compression system
CN112887722B (en) Lossless image compression method
Nobuhara et al. Fuzzy relation equations for compression/decompression processes of colour images in the RGB and YUV colour spaces
CN113099269A (en) String matching prediction method, encoding and decoding method, related equipment and device
Hu et al. Improved color image coding schemes based on single bit map block truncation coding
Kamal et al. Iteration free fractal compression using genetic algorithm for still colour images
CN111008276A (en) Complete entity relationship extraction method and device
CN110933413A (en) Video frame processing method and device
Pinho et al. A context adaptation model for the compression of images with a reduced number of colors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant