CN113360484B - Data correction method, device and computer readable storage medium - Google Patents

Data correction method, device and computer readable storage medium Download PDF

Info

Publication number
CN113360484B
CN113360484B CN202010145446.9A CN202010145446A CN113360484B CN 113360484 B CN113360484 B CN 113360484B CN 202010145446 A CN202010145446 A CN 202010145446A CN 113360484 B CN113360484 B CN 113360484B
Authority
CN
China
Prior art keywords
data
network
feature extraction
core
pieces
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010145446.9A
Other languages
Chinese (zh)
Other versions
CN113360484A (en
Inventor
安翔宇
翟艳梅
范晓旭
周松桥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
Tianyi Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Cloud Technology Co Ltd filed Critical Tianyi Cloud Technology Co Ltd
Priority to CN202010145446.9A priority Critical patent/CN113360484B/en
Publication of CN113360484A publication Critical patent/CN113360484A/en
Application granted granted Critical
Publication of CN113360484B publication Critical patent/CN113360484B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)
  • Error Detection And Correction (AREA)

Abstract

The invention discloses a data correction method, a data correction device and a computer readable storage medium, and relates to the field of data processing. The data correction method comprises the following steps: inputting a plurality of pieces of data before the time corresponding to the data to be rectified, which are obtained from deep packet inspection DPI data, into a feature extraction network to obtain core features of the plurality of pieces of data output by the feature extraction network; constructing a core feature sequence of the plurality of data according to the corresponding time sequence of the plurality of data; inputting the core characteristic sequence into a pre-trained generator based on a long-short-term memory network to generate predicted data; and replacing the data to be rectified by the predicted data. The invention realizes the scheme of predicting by using the time sequence information of DPI data. The prediction mode is more in line with the characteristics of DPI data, so that the correction effectiveness of the DPI data and the correction accuracy of the corrected data are improved.

Description

Data correction method, device and computer readable storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a data correction method, apparatus, and computer readable storage medium.
Background
Because of uncontrollable factors such as network fluctuation, resource load, source data abnormality and the like, abnormal data can be generated in the DPI data transmission process, so that the acquired DPI data is low in quality, and the development of data products is greatly influenced. Aiming at the problem, the related technology adopts methods of manual rules or clustering models Canopy, K-means and the like to correct the DPI data.
Disclosure of Invention
The inventors found that the correction result of the method used in the related art is not very ideal after analysis.
One technical problem to be solved by the embodiment of the invention is as follows: how to improve the correction effectiveness of DPI data and the accuracy of corrected data.
According to a first aspect of some embodiments of the present invention, there is provided a data deskewing method, including: inputting a plurality of pieces of data before the time corresponding to the data to be rectified, which are obtained from deep packet inspection DPI data, into a feature extraction network to obtain core features of the plurality of pieces of data output by the feature extraction network; constructing a core feature sequence of the plurality of data according to the corresponding time sequence of the plurality of data; inputting the core characteristic sequence into a pre-trained generator based on a long-short-term memory network to generate predicted data; and replacing the data to be rectified by the predicted data.
In some embodiments, the feature extraction network includes a convolutional neural network and a core feature extraction layer; the convolutional neural network extracts hidden features from a plurality of pieces of data, and the core feature extraction layer extracts core features from the hidden features.
In some embodiments, the convolutional neural network has a residual structure.
In some embodiments, the convolutional neural network is a Inception-Resnet network.
In some embodiments, the core feature extraction layer is the attention attention layer.
In some embodiments, the LSTM network based generator is a generator in a generative countermeasure network, the generative countermeasure network further comprising a determiner; the data correction method further comprises the following steps: inputting target data acquired from DPI data for training and a plurality of pieces of training data before time corresponding to the target data into a feature extraction network to acquire core features of the target data and the plurality of pieces of training data output by the feature extraction network; constructing a core feature training sequence of a plurality of pieces of data according to the corresponding time sequence of the plurality of pieces of training data; inputting the core feature sequence into a pre-trained LSTM network-based generator to generate predicted data; inputting the predicted data into a feature extraction network to obtain core features of the predicted data output by the feature extraction network; the core features of the predicted data and the core features of the target data are input into the discriminator so as to train the feature extraction network and the generation type countermeasure network according to the judging result of the discriminator.
In some embodiments, the data deskewing method further comprises: determining the data with null values or the fields with abnormal values in DPI data as data to be rectified; the data before the time corresponding to the data to be rectified and the data to be rectified have the same field, and the numerical values are not null and are not abnormal values.
According to a second aspect of some embodiments of the present invention, there is provided a data rectification apparatus comprising: the feature extraction module is configured to input a plurality of pieces of data before the time corresponding to the data to be rectified, which is acquired from deep packet inspection DPI data, into the feature extraction network, and obtain core features of the plurality of pieces of data output by the feature extraction network; the sequence construction module is configured to construct a core feature sequence of the plurality of data according to the corresponding time sequence of the plurality of data; the data generation module is configured to input the core feature sequence into a pre-trained long-period memory network-based generator to generate predicted data; and the deviation rectifying module is configured to replace the data to be rectified by the predicted data.
According to a third aspect of some embodiments of the present invention, there is provided a data rectification apparatus, comprising: a memory; and a processor coupled to the memory, the processor configured to perform any of the foregoing data deskewing methods based on the instructions stored in the memory.
According to a fourth aspect of some embodiments of the present invention, there is provided a computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements any of the foregoing data deskewing methods.
Some of the embodiments of the above invention have the following advantages or benefits: the invention can predict the core characteristics of a plurality of pieces of data before the time corresponding to the data to be rectified so as to obtain the correct data which should appear at the time corresponding to the data to be rectified, thereby realizing the scheme of predicting by utilizing the time sequence information of DPI data. The prediction mode is more in line with the characteristics of DPI data, so that the correction effectiveness of the DPI data and the correction accuracy of the corrected data are improved.
Other features of the present invention and its advantages will become apparent from the following detailed description of exemplary embodiments of the invention, which proceeds with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
Fig. 1 illustrates a flow diagram of a data deskewing method according to some embodiments of the invention.
Fig. 2 illustrates a flow diagram of a feature extraction method according to some embodiments of the invention.
Fig. 3 illustrates a flow diagram of a training method according to some embodiments of the invention.
Fig. 4 is a schematic diagram illustrating a structure of a data rectification apparatus according to some embodiments of the present invention.
Fig. 5 shows a schematic structural diagram of a data correction device according to other embodiments of the present invention.
Fig. 6 shows a schematic structural diagram of a data correction device according to further embodiments of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
After further analysis, the inventor finds that one characteristic of Deep Packet Inspection (DPI) data is that the deep packet Inspection (DEEP PACKET in short) data has the acquisition time of each field, so that the Deep Packet Inspection (DPI) data has stronger time sequence. Therefore, the deskewing of DPI data can be achieved based on mining of DPI data timing information. An embodiment of the data deskewing method of the present invention is described below with reference to fig. 1.
Fig. 1 illustrates a flow diagram of a data deskewing method according to some embodiments of the invention. As shown in fig. 1, the data rectification method of this embodiment includes steps S102 to S108.
In step S102, a plurality of pieces of data before a time corresponding to the data to be rectified obtained from the DPI data are input into a feature extraction network, and core features of the plurality of pieces of data output by the feature extraction network are obtained.
In some embodiments, the data whose value is null, or the field whose value is abnormal, in the DPI data is determined as the data to be deskewed. The data before the time corresponding to the data to be rectified and the data to be rectified have the same field, and the numerical values are not null and are not abnormal values. The identification of the data to be rectified may be performed by searching and matching, etc., and will not be described in detail herein.
For example, DPI data records the amount of traffic used by a particular cell phone user per hour. When the flow of the mobile phone user at 10 am on a certain day is found to be 630G, that is, the value is obviously beyond a reasonable value, or the flow field is empty, the flow of the mobile phone user at 10 am on the certain day can be used as data to be rectified, and the flow data of the user at each hour from 0 am to 9 am on the first two days on the certain day can be used as 'multiple data' in the step S102. According to the actual situation, the data to be rectified can also comprise a group of fields.
In step S104, a core feature sequence of the plurality of pieces of data is constructed in accordance with the corresponding time sequence of the plurality of pieces of data.
In step S106, the core feature sequence is input into a pre-trained Long Short-Term Memory network (LSTM) based generator to generate predicted data.
LSTM is a recurrent neural network. The LSTM can not only predict information of the next time using information of the current time, but also acquire information of an earlier time using a Cell structure in the network. In the network behavior of the user, not only the adjacent front and rear moments are associated, but also the current moment and the earlier moment are associated. For example, a user may browse videos in a video website at a daily commute time, which is relatively fixed. Therefore, when the video browsing data of the 8 th point of the t day of the user is rectified, the video browsing data of the 8 th point of the t-1 th day also has great reference value. Thus, by using the LSTM network, prediction data can be generated more accurately.
In step S108, the predicted data is used to replace the data to be rectified.
By the method of the embodiment, prediction can be performed according to the core characteristics of a plurality of pieces of data before the time corresponding to the data to be rectified, so as to obtain correct data which should appear at the time corresponding to the data to be rectified, and therefore a scheme for predicting by using time sequence information of DPI data is realized. The prediction mode is more in line with the characteristics of DPI data, so that the correction effectiveness of the DPI data and the correction accuracy of the corrected data are improved.
In some embodiments, the feature extraction network may be comprised of multiple sub-networks or layers. An embodiment of the feature extraction method of the present invention is described below with reference to fig. 2.
Fig. 2 illustrates a flow diagram of a feature extraction method according to some embodiments of the invention. As shown in fig. 2, the feature extraction method of this embodiment includes steps S202 to S204, and the feature extraction network includes a convolutional neural network and a core feature extraction layer.
In step S202, the convolutional neural network extracts hidden features from the pieces of data.
In some embodiments, the convolutional neural network has a residual structure. In most networks without residual structure, the input of each layer is the output of the previous layer. Whereas in a network with a residual structure, the inputs of a partial layer include not only the outputs of the adjacent previous layer, but also the outputs of other layers preceding the previous layer. The structure can improve training efficiency and accuracy, and further can improve data correction processing efficiency and accuracy.
In some embodiments, the convolutional neural network is a Inception-Resnet network. Inception-Resnet is a convolutional neural network proposed by google, and has a residual structure. When training is performed based on the network, only the last layer of the network can be subjected to parameter adjustment, so that training efficiency is further improved.
In step S204, the core feature extraction layer extracts core features from the hidden features. Thus, important features among the hidden features can be further extracted.
In some embodiments, the core feature extraction layer is an attention (attention) layer, and the attention layer is implemented using an attention mechanism. In the implementation process, the existing attention-layer module can be utilized to input the hidden features extracted in the previous step through the API interface of the attention-layer module, and core features output by the attention-layer module are obtained. The Attention layer performs weight distribution on each sub-feature in the input hidden features through a built-in algorithm so as to highlight core information in the hidden features.
By the method of the embodiment, hidden and important characteristics in DPI data can be extracted, so that interference information in original data can be removed, and prediction can be performed more accurately.
In some embodiments, the LSTM network based generator is a generator in a generative countermeasure network, the generative countermeasure network further comprising a determiner. Thus, the generator may be trained based on training implementations of the generated countermeasure network. An embodiment of the training method of the present invention is described below with reference to fig. 3.
Fig. 3 illustrates a flow diagram of a training method according to some embodiments of the invention. As shown in fig. 3, the training method of this embodiment includes steps S302 to S310.
In step S302, target data acquired from DPI data for training and a plurality of pieces of training data before a time corresponding to the target data are input into a feature extraction network, and core features of the target data and the plurality of pieces of training data output by the feature extraction network are obtained.
The structure of the feature extraction network may refer to the foregoing embodiments, and will not be described herein.
The target data is the corresponding data to be rectified in the actual rectification process. However, during the training process, the target data has non-null, non-outlier values in order to compare the target data with the predictively generated data for adjustment of the model.
In step S304, a core feature training sequence of the plurality of pieces of data is constructed according to the corresponding time sequence of the plurality of pieces of training data.
In step S306, the core feature sequence is input into a pre-trained LSTM network-based generator, generating predicted data.
In step S308, the predicted data is input into the feature extraction network, and the core features of the predicted data output by the feature extraction network are obtained.
In step S310, the core features of the prediction data and the core features of the target data are input to the discriminator so as to train the feature extraction network and the generation countermeasure network according to the determination result of the discriminator. The discriminator is used for judging whether the generated data is real or not and giving a probability of judging whether the generated data is real or not. When the probability is about 0.5, it is explained that the discriminator cannot judge the authenticity of the generated data, that is, the data generated by the generator has the effect that the discrimination from the real data is difficult. At this point the training may be ended.
By the method of the embodiment, the model can be trained by using the dynamic game mechanism of the generator and the discriminator, so that the prediction accuracy is further improved. In addition, the data at the future moment is predicted based on the time sequence information in the training process, and the real data at the same time are compared to train, so that a model obtained through training can be suitable for an actual deviation correcting application scene.
An embodiment of the data rectification apparatus of the present invention is described below with reference to fig. 4.
Fig. 4 is a schematic diagram illustrating a structure of a data rectification apparatus according to some embodiments of the present invention. As shown in fig. 4, the data correction device 40 of this embodiment includes: the feature extraction module 410 is configured to input a plurality of pieces of data before a time corresponding to the data to be rectified, which is acquired from deep packet inspection DPI data, into the feature extraction network, and obtain core features of the plurality of pieces of data output by the feature extraction network; a sequence construction module 420 configured to construct a core feature sequence of the plurality of pieces of data according to a corresponding temporal order of the plurality of pieces of data; a data generation module 430 configured to input the core feature sequence into a pre-trained long-short term memory network-based generator, generating predicted data; the deskew module 440 is configured to replace the data to be deskewed with the predicted data.
In some embodiments, the feature extraction network includes a convolutional neural network and a core feature extraction layer; the convolutional neural network extracts hidden features from a plurality of pieces of data, and the core feature extraction layer extracts core features from the hidden features.
In some embodiments, the convolutional neural network has a residual structure.
In some embodiments, the convolutional neural network is a Inception-Resnet network.
In some embodiments, the core feature extraction layer is the attention attention layer.
In some embodiments, the LSTM network based generator is a generator in a generative countermeasure network, the generative countermeasure network further comprising a determiner; the data deviation correcting device further comprises: a training module 450 configured to input target data acquired from DPI data for training, a plurality of pieces of training data before a time corresponding to the target data, into a feature extraction network, and obtain core features of the target data and the plurality of pieces of training data output by the feature extraction network; constructing a core feature training sequence of a plurality of pieces of data according to the corresponding time sequence of the plurality of pieces of training data; inputting the core feature sequence into a pre-trained LSTM network-based generator to generate predicted data; inputting the predicted data into a feature extraction network to obtain core features of the predicted data output by the feature extraction network; the core features of the predicted data and the core features of the target data are input into the discriminator so as to train the feature extraction network and the generation type countermeasure network according to the judging result of the discriminator.
In some embodiments, the data deskewing method further comprises: a determining module 460 configured to determine, as the data to be rectified, data with null values or fields with abnormal values in the DPI data; the data before the time corresponding to the data to be rectified and the data to be rectified have the same field, and the numerical values are not null and are not abnormal values.
Fig. 5 shows a schematic structural diagram of a data correction device according to other embodiments of the present invention. As shown in fig. 5, the data rectifying device 50 of this embodiment includes: a memory 510 and a processor 520 coupled to the memory 510, the processor 520 being configured to perform the data deskewing method of any of the preceding embodiments based on instructions stored in the memory 510.
The memory 510 may include, for example, system memory, fixed nonvolatile storage media, and the like. The system memory stores, for example, an operating system, application programs, boot Loader (Boot Loader), and other programs.
Fig. 6 shows a schematic structural diagram of a data correction device according to further embodiments of the present invention. As shown in fig. 6, the data correction device 60 of this embodiment includes: the memory 610 and the processor 620 may also include an input-output interface 630, a network interface 640, a storage interface 650, and the like. These interfaces 630, 640, 650 and the memory 610 and processor 620 may be connected by, for example, a bus 660. The input/output interface 630 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. Network interface 640 provides a connection interface for various networking devices. The storage interface 650 provides a connection interface for external storage devices such as SD cards, U-discs, and the like.
An embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements any one of the foregoing data deskewing methods.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flowchart and/or block of the flowchart illustrations and/or block diagrams, and combinations of flowcharts and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (8)

1. A method of data deskewing, comprising:
Inputting a plurality of pieces of data before the time corresponding to the data to be rectified, which are obtained from deep packet inspection DPI data, into a feature extraction network to obtain core features of the plurality of pieces of data output by the feature extraction network, wherein the feature extraction network comprises a convolutional neural network and a core feature extraction layer, the convolutional neural network extracts hidden features from the plurality of pieces of data, and the core feature extraction layer extracts core features from the hidden features;
constructing a core feature sequence of the plurality of data according to the corresponding time sequence of the plurality of data;
Inputting the core feature sequence into a pre-trained generator based on a long-short-term memory network LSTM to generate predicted data, wherein the generator based on the LSTM is a generator in a generated countermeasure network, and the generated countermeasure network further comprises a discriminator;
replacing the data to be rectified with predicted data
The data correction method further comprises the following steps:
inputting target data acquired from DPI data for training and a plurality of pieces of training data before the time corresponding to the target data into a feature extraction network, and obtaining core features of the target data and the plurality of pieces of training data output by the feature extraction network;
Constructing a core feature training sequence of the plurality of pieces of data according to the corresponding time sequence of the plurality of pieces of training data;
Inputting the core feature sequence into a pre-trained LSTM network-based generator to generate predicted data;
Inputting the predicted data into a feature extraction network to obtain core features of the predicted data output by the feature extraction network;
And inputting the core features of the predicted data and the core features of the target data into the discriminator so as to train the feature extraction network and the generation type countermeasure network according to the judging result of the discriminator.
2. The data deskewing method of claim 1, wherein the convolutional neural network has a residual structure.
3. The data deskewing method of claim 2, wherein the convolutional neural network is a Inception-Resnet network.
4. The data deskewing method of claim 1, wherein the core feature extraction layer is an attention attention layer.
5. The data deskewing method of claim 1, further comprising:
Determining the data with null values or the fields with abnormal values in DPI data as data to be rectified;
the plurality of pieces of data before the time corresponding to the data to be rectified and the data to be rectified have the same field, and the numerical values are not null and are not abnormal values.
6. A data rectification apparatus comprising:
The device comprises a feature extraction module, a feature extraction module and a correction module, wherein the feature extraction module is configured to input a plurality of pieces of data before the time corresponding to data to be corrected, which are acquired from deep packet inspection DPI data, into a feature extraction network to obtain core features of the plurality of pieces of data, which are output by the feature extraction network, wherein the feature extraction network comprises a convolutional neural network and a core feature extraction layer, the convolutional neural network extracts hidden features from the plurality of pieces of data, and the core feature extraction layer extracts core features from the hidden features;
The sequence construction module is configured to construct a core feature sequence of the plurality of data according to the corresponding time sequence of the plurality of data;
a data generation module configured to input the core feature sequence into a pre-trained long-short-term memory network LSTM-based generator, generating predicted data, the LSTM-network-based generator being a generator in a generative countermeasure network, the generative countermeasure network further comprising a discriminant;
the deviation rectifying module is configured to replace the data to be rectified by predicted data;
A training module configured to: inputting target data acquired from DPI data for training and a plurality of pieces of training data before the time corresponding to the target data into a feature extraction network, and obtaining core features of the target data and the plurality of pieces of training data output by the feature extraction network; constructing a core feature training sequence of the plurality of pieces of data according to the corresponding time sequence of the plurality of pieces of training data; inputting the core feature sequence into a pre-trained LSTM network-based generator to generate predicted data; inputting the predicted data into a feature extraction network to obtain core features of the predicted data output by the feature extraction network; and inputting the core features of the predicted data and the core features of the target data into the discriminator so as to train the feature extraction network and the generation type countermeasure network according to the judging result of the discriminator.
7. A data rectification apparatus comprising:
A memory; and
A processor coupled to the memory, the processor configured to perform the data deskewing method according to any one of claims 1-5, based on instructions stored in the memory.
8. A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the data deskewing method according to any one of claims 1-5.
CN202010145446.9A 2020-03-05 2020-03-05 Data correction method, device and computer readable storage medium Active CN113360484B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010145446.9A CN113360484B (en) 2020-03-05 2020-03-05 Data correction method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010145446.9A CN113360484B (en) 2020-03-05 2020-03-05 Data correction method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113360484A CN113360484A (en) 2021-09-07
CN113360484B true CN113360484B (en) 2024-07-09

Family

ID=77523455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010145446.9A Active CN113360484B (en) 2020-03-05 2020-03-05 Data correction method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113360484B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446324A (en) * 2018-02-11 2018-08-24 重庆邮电大学 A kind of GPS data reconstructing method based on long memory network LSTM in short-term
CN108614944A (en) * 2018-05-10 2018-10-02 西安电子科技大学 A kind of shield track axis correction parameter prediction technique

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000079444A1 (en) * 1999-06-23 2000-12-28 Koichiro Matsuno Economic data processing device, economic data processing method, economic data processing system, and information recorded medium
CN106934196A (en) * 2015-12-30 2017-07-07 中国移动通信集团河南有限公司 A kind of quantitative forecasting technique and device
CN108733703A (en) * 2017-04-20 2018-11-02 北京京东尚科信息技术有限公司 The answer prediction technique and device of question answering system, electronic equipment, storage medium
CN107463633B (en) * 2017-07-17 2019-09-06 中国航天系统科学与工程研究院 A kind of real time data rejecting outliers method based on EEMD- neural network
CN109960626B (en) * 2017-12-26 2022-10-18 中国移动通信集团辽宁有限公司 Port abnormity identification method, device, equipment and medium
CN108549948A (en) * 2018-06-06 2018-09-18 深圳市海波广告有限公司 A kind of station board position automatic correction method and its system
CN110008472B (en) * 2019-03-29 2022-11-11 北京明略软件系统有限公司 Entity extraction method, device, equipment and computer readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446324A (en) * 2018-02-11 2018-08-24 重庆邮电大学 A kind of GPS data reconstructing method based on long memory network LSTM in short-term
CN108614944A (en) * 2018-05-10 2018-10-02 西安电子科技大学 A kind of shield track axis correction parameter prediction technique

Also Published As

Publication number Publication date
CN113360484A (en) 2021-09-07

Similar Documents

Publication Publication Date Title
US10796244B2 (en) Method and apparatus for labeling training samples
CN106682906B (en) Risk identification and service processing method and equipment
CN112232293A (en) Image processing model training method, image processing method and related equipment
US20080243905A1 (en) Attribute extraction using limited training data
CN111639968B (en) Track data processing method, track data processing device, computer equipment and storage medium
CN103678702A (en) Video duplicate removal method and device
CN104679818A (en) Video keyframe extracting method and video keyframe extracting system
CN105630767A (en) Text similarity comparison method and device
CN110166344B (en) Identity identification method, device and related equipment
CN103324745A (en) Text garbage identifying method and system based on Bayesian model
CN105335368B (en) A kind of product clustering method and device
CN111260220B (en) Group control equipment identification method and device, electronic equipment and storage medium
CN108319672A (en) Mobile terminal malicious information filtering method and system based on cloud computing
CN112149642A (en) Text image recognition method and device
CN115757745A (en) Service scene control method and system based on artificial intelligence and cloud platform
CN111783812A (en) Method and device for identifying forbidden images and computer readable storage medium
CN102289456B (en) The Difference test that WEB creeps
CN114332550A (en) Model training method, system, storage medium and terminal equipment
CN113360484B (en) Data correction method, device and computer readable storage medium
CN110889467A (en) Company name matching method and device, terminal equipment and storage medium
CN109697224B (en) Bill message processing method, device and storage medium
CN115205606A (en) Image multi-label classification method and device and related products
CN115393100A (en) Resource recommendation method and device
CN102741862A (en) Methods and apparatuses for facilitating object recognition
CN115578765A (en) Target identification method, device, system and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220130

Address after: 100007 room 205-32, floor 2, building 2, No. 1 and No. 3, qinglonghutong a, Dongcheng District, Beijing

Applicant after: Tianyiyun Technology Co.,Ltd.

Address before: No.31, Financial Street, Xicheng District, Beijing, 100033

Applicant before: CHINA TELECOM Corp.,Ltd.

GR01 Patent grant