CN113360484A - Data correction method and device and computer readable storage medium - Google Patents

Data correction method and device and computer readable storage medium Download PDF

Info

Publication number
CN113360484A
CN113360484A CN202010145446.9A CN202010145446A CN113360484A CN 113360484 A CN113360484 A CN 113360484A CN 202010145446 A CN202010145446 A CN 202010145446A CN 113360484 A CN113360484 A CN 113360484A
Authority
CN
China
Prior art keywords
data
network
feature extraction
pieces
core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010145446.9A
Other languages
Chinese (zh)
Inventor
安翔宇
翟艳梅
范晓旭
周松桥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202010145446.9A priority Critical patent/CN113360484A/en
Publication of CN113360484A publication Critical patent/CN113360484A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Error Detection And Correction (AREA)

Abstract

The invention discloses a data deviation rectifying method, a data deviation rectifying device and a computer readable storage medium, and relates to the field of data processing. The data deviation rectifying method comprises the following steps: inputting a plurality of pieces of data before time corresponding to data to be corrected, which is acquired from Deep Packet Inspection (DPI) data, into a feature extraction network to obtain core features of the plurality of pieces of data, which are output by the feature extraction network; constructing a core characteristic sequence of the plurality of data according to the corresponding time sequence of the plurality of data; inputting the core characteristic sequence into a generator based on a long-term and short-term memory network trained in advance to generate predicted data; and replacing the data to be rectified with the predicted data. The invention realizes the scheme of predicting by using the time sequence information of the DPI data. The prediction mode is more in line with the characteristics of DPI data, so that the effectiveness of DPI data correction and the accuracy of corrected data are improved.

Description

Data correction method and device and computer readable storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a data error correction method, apparatus, and computer-readable storage medium.
Background
Due to uncontrollable factors such as network fluctuation, resource load, source data abnormity and the like, abnormal data can be generated in the DPI data transmission process, so that the quality of the obtained DPI data is not high, and great influence is caused on the development of data products. Aiming at the problem, the related technology adopts methods such as manual rules or clustering models, Canopy, K-means and the like to correct the DPI data.
Disclosure of Invention
The inventor finds out through analysis that the deviation rectifying result of the method used in the related art is not ideal.
The embodiment of the invention aims to solve the technical problem that: how to improve the effectiveness of DPI data correction and the accuracy of the corrected data.
According to a first aspect of some embodiments of the present invention, there is provided a data skew correction method, including: inputting a plurality of pieces of data before time corresponding to data to be corrected, which is acquired from Deep Packet Inspection (DPI) data, into a feature extraction network to obtain core features of the plurality of pieces of data, which are output by the feature extraction network; constructing a core characteristic sequence of the plurality of data according to the corresponding time sequence of the plurality of data; inputting the core characteristic sequence into a generator based on a long-term and short-term memory network trained in advance to generate predicted data; and replacing the data to be rectified with the predicted data.
In some embodiments, the feature extraction network comprises a convolutional neural network and a core feature extraction layer; the convolutional neural network extracts hidden features from the plurality of pieces of data, and the core feature extraction layer extracts core features from the hidden features.
In some embodiments, the convolutional neural network has a residual structure.
In some embodiments, the convolutional neural network is an inclusion-Resnet network.
In some embodiments, the core feature extraction layer is an attention layer.
In some embodiments, the LSTM network-based generator is a generator in a generative confrontation network, the generative confrontation network further comprising a decider; the data deviation rectifying method further comprises the following steps: inputting target data acquired from DPI data used for training and a plurality of pieces of training data before time corresponding to the target data into a feature extraction network to obtain core features of the target data and the plurality of pieces of training data, wherein the core features are output by the feature extraction network; constructing a core characteristic training sequence of a plurality of pieces of data according to the corresponding time sequence of the plurality of pieces of training data; inputting the core characteristic sequence into a generator based on an LSTM network trained in advance to generate predicted data; inputting the predicted data into a feature extraction network to obtain core features of the predicted data output by the feature extraction network; the core features of the predicted data and the core features of the target data are input into a discriminator so as to train a feature extraction network and a generative confrontation network according to the judgment result of the discriminator.
In some embodiments, the data deskewing method further comprises: determining data with empty numerical values or fields with abnormal numerical values in the DPI data as data to be corrected; the data before the time corresponding to the data to be corrected and the data to be corrected have the same field, and the numerical values are not null and are not abnormal values.
According to a second aspect of some embodiments of the present invention, there is provided a data skew correction apparatus, including: the characteristic extraction module is configured to input a plurality of pieces of data before time corresponding to data to be corrected, which are acquired from Deep Packet Inspection (DPI) data, into a characteristic extraction network to obtain core characteristics of the plurality of pieces of data output by the characteristic extraction network; the sequence construction module is configured to construct a core characteristic sequence of the plurality of pieces of data according to the corresponding time sequence of the plurality of pieces of data; the data generation module is configured to input the core characteristic sequence into a generator based on a long-short term memory network trained in advance and generate predicted data; and the deviation rectifying module is configured to replace the data to be rectified with the predicted data.
According to a third aspect of some embodiments of the present invention, there is provided a data skew correction apparatus, including: a memory; and a processor coupled to the memory, the processor configured to perform any of the foregoing data deskewing methods based on instructions stored in the memory.
According to a fourth aspect of some embodiments of the present invention, there is provided a computer-readable storage medium having a computer program stored thereon, where the program is executed by a processor to implement any one of the data skew correction methods.
Some embodiments of the above invention have the following advantages or benefits: the invention can predict according to the core characteristics of a plurality of data before the time corresponding to the data to be corrected so as to obtain the correct data which should appear at the time corresponding to the data to be corrected, thereby realizing the scheme of predicting by utilizing the time sequence information of the DPI data. The prediction mode is more in line with the characteristics of DPI data, so that the effectiveness of DPI data correction and the accuracy of corrected data are improved.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 illustrates a flow diagram of a method of data deskewing according to some embodiments of the present invention.
FIG. 2 illustrates a flow diagram of a feature extraction method according to some embodiments of the invention.
FIG. 3 illustrates a flow diagram of a training method according to some embodiments of the inventions.
FIG. 4 illustrates a block diagram of a data skew correction device according to some embodiments of the present invention.
FIG. 5 is a schematic diagram of a data skew correction apparatus according to another embodiment of the present invention.
FIG. 6 is a schematic diagram of a data skew correction apparatus according to further embodiments of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
After further analysis, the inventor finds that one characteristic of Deep Packet Inspection (DPI) data is that the Deep Packet Inspection data has the acquisition time of each field, so that the Deep Packet Inspection data has strong time sequence. Therefore, the DPI data can be rectified based on the mining of the DPI data time sequence information. An embodiment of the data deskewing method of the present invention is described below with reference to fig. 1.
FIG. 1 illustrates a flow diagram of a method of data deskewing according to some embodiments of the present invention. As shown in fig. 1, the data skew correction method of this embodiment includes steps S102 to S108.
In step S102, a plurality of pieces of data before time corresponding to the data to be corrected, which is acquired from the DPI data, are input into the feature extraction network, so as to obtain core features of the plurality of pieces of data, which are output by the feature extraction network.
In some embodiments, data with an empty value or a field with an abnormal value in the DPI data is determined as the data to be corrected. The data before the time corresponding to the data to be corrected and the data to be corrected have the same field, and the numerical values are not null and are not abnormal values. The identification of the data to be rectified can be through searching and matching, and the like, which is not described herein again.
For example, DPI data records the amount of traffic a certain handset user uses per hour. When it is found that the traffic of the mobile phone user at 10 am of a certain day is 630G, that is, the value obviously exceeds the reasonable value, or the traffic field is empty, the traffic of the mobile phone user at 10 am of the day may be used as the data to be corrected, and the traffic data of the user from 0 am to 9 am of the day on two days before the day may be used as the "multiple pieces of data" in step S102. According to the actual situation, the data to be rectified may also include a set of fields.
In step S104, a core feature sequence of the plurality of pieces of data is constructed in a corresponding time order of the plurality of pieces of data.
In step S106, the core feature sequence is input into a generator based on a Long Short-Term Memory (LSTM) network trained in advance to generate predicted data.
LSTM is a recurrent neural network. LSTM can not only predict information at the next time using information at the current time, but also acquire information at an earlier time using a Cell structure in the network. In the network behavior of the user, not only the adjacent previous and subsequent time points are related, but also the behavior of the current time point and the earlier time point is related. For example, a user may browse videos on a video website at a daily commute time, which is relatively fixed. Therefore, when the video browsing data of 8 th point on the t day of the user is rectified, the video browsing data of 8 th point on the t-1 st day also has great reference value. Therefore, by using the LSTM network, prediction data can be generated more accurately.
In step S108, the predicted data is used to replace the data to be corrected.
By the method of the embodiment, the core characteristics of a plurality of pieces of data before the time corresponding to the data to be corrected can be predicted to obtain the correct data which should appear at the time corresponding to the data to be corrected, so that the scheme of predicting by using the time sequence information of the DPI data is realized. The prediction mode is more in line with the characteristics of DPI data, so that the effectiveness of DPI data correction and the accuracy of corrected data are improved.
In some embodiments, the feature extraction network may be made up of multiple sub-networks or layers. An embodiment of the feature extraction method of the present invention is described below with reference to fig. 2.
FIG. 2 illustrates a flow diagram of a feature extraction method according to some embodiments of the invention. As shown in fig. 2, the feature extraction method of this embodiment includes steps S202 to S204, and the feature extraction network includes a convolutional neural network and a core feature extraction layer.
In step S202, the convolutional neural network extracts hidden features from a plurality of pieces of data.
In some embodiments, the convolutional neural network has a residual structure. In most networks without a residual structure, the input to each layer is the output of the previous layer. In a network having a residual structure, the input of a partial layer includes not only the output of an adjacent previous layer but also the output of other layers preceding the previous layer. The structure can improve the training efficiency and accuracy, and further can improve the processing efficiency and accuracy of data deviation correction.
In some embodiments, the convolutional neural network is an inclusion-Resnet network. The inclusion-Resnet network provides a convolutional neural network for google, and has a residual error structure. When training is carried out based on the network, parameters can be adjusted only for the last layer of the network, so that the training efficiency is further improved.
In step S204, the core feature extraction layer extracts core features from the hidden features. Thus, important features among the hidden features can be further extracted.
In some embodiments, the core feature extraction layer is an attention (attention) layer, which is implemented using an attention mechanism. In the implementation process, the existing attention layer module can be utilized to input the hidden features extracted in the previous step through an API interface of the attention layer module, and obtain core features output by the attention layer. The Attention layer performs weight assignment on each sub-feature in the input hidden features through a built-in algorithm so as to highlight the core information in the hidden features.
By the method of the embodiment, hidden and important features in the DPI data can be extracted, so that interference information in the original data can be removed, and prediction can be performed more accurately.
In some embodiments, the LSTM network-based generator is a generator in a generative confrontation network, the generative confrontation network further comprising a decider. Thus, training the generator may be achieved based on training the generative confrontation network. An embodiment of the training method of the present invention is described below with reference to fig. 3.
FIG. 3 illustrates a flow diagram of a training method according to some embodiments of the inventions. As shown in fig. 3, the training method of this embodiment includes steps S302 to S310.
In step S302, target data acquired from DPI data used for training and a plurality of pieces of training data before a time corresponding to the target data are input to a feature extraction network, and core features of the target data and the plurality of pieces of training data output by the feature extraction network are obtained.
The structure of the feature extraction network may refer to the foregoing embodiments, and details are not repeated here.
The target data is corresponding data to be rectified in the actual rectifying process. However, during the training process, the target data has a non-null, non-anomalous value in order to compare the target data with the data generated by the prediction in order to make adjustments to the model.
In step S304, a core feature training sequence of the plurality of pieces of data is constructed according to the corresponding time sequence of the plurality of pieces of training data.
In step S306, the core feature sequence is input to a generator based on the LSTM network trained in advance, and predicted data is generated.
In step S308, the predicted data is input into the feature extraction network, and the core features of the predicted data output by the feature extraction network are obtained.
In step S310, the core features of the prediction data and the core features of the target data are input to the discriminator, so that the feature extraction network and the generative countermeasure network are trained based on the determination result of the discriminator. The discriminator is used for judging whether the generated data is real or not and giving the probability of judging whether the data is real or not. When the probability is about 0.5, it indicates that the discriminator cannot judge the authenticity of the generated data, i.e., the data generated by the generator achieves the effect of being difficult to distinguish from the actual data. The training may be ended at this point.
By the method of the embodiment, the dynamic game mechanism of the generator and the discriminator can be used for training the model, so that the accuracy of prediction is further improved. And in the training process, data at a future moment are predicted based on the time sequence information, and real data at the same moment are compared to train, so that the trained model can be suitable for an actual deviation rectifying application scene.
An embodiment of the data skew correcting apparatus of the present invention is described below with reference to fig. 4.
FIG. 4 illustrates a block diagram of a data skew correction device according to some embodiments of the present invention. As shown in fig. 4, the data skew correction apparatus 40 of this embodiment includes: the feature extraction module 410 is configured to input a plurality of pieces of data before time corresponding to data to be corrected, which is acquired from Deep Packet Inspection (DPI) data, into a feature extraction network, so as to obtain core features of the plurality of pieces of data, which are output by the feature extraction network; a sequence construction module 420 configured to construct a core feature sequence of the plurality of pieces of data in a corresponding time order of the plurality of pieces of data; a data generation module 430 configured to input the core feature sequence into a generator based on a long-short term memory network trained in advance, and generate predicted data; a deskew module 440 configured to replace the deskew data with the predicted data.
In some embodiments, the feature extraction network comprises a convolutional neural network and a core feature extraction layer; the convolutional neural network extracts hidden features from the plurality of pieces of data, and the core feature extraction layer extracts core features from the hidden features.
In some embodiments, the convolutional neural network has a residual structure.
In some embodiments, the convolutional neural network is an inclusion-Resnet network.
In some embodiments, the core feature extraction layer is an attention layer.
In some embodiments, the LSTM network-based generator is a generator in a generative confrontation network, the generative confrontation network further comprising a decider; the data deviation correcting device further comprises: a training module 450, configured to input target data acquired from DPI data used for training and a plurality of pieces of training data before time corresponding to the target data into a feature extraction network, and obtain core features of the target data and the plurality of pieces of training data output by the feature extraction network; constructing a core characteristic training sequence of a plurality of pieces of data according to the corresponding time sequence of the plurality of pieces of training data; inputting the core characteristic sequence into a generator based on an LSTM network trained in advance to generate predicted data; inputting the predicted data into a feature extraction network to obtain core features of the predicted data output by the feature extraction network; the core features of the predicted data and the core features of the target data are input into a discriminator so as to train a feature extraction network and a generative confrontation network according to the judgment result of the discriminator.
In some embodiments, the data deskewing method further comprises: the determining module 460 is configured to determine data with an empty numerical value or a field with an abnormal numerical value in the DPI data as data to be corrected; the data before the time corresponding to the data to be corrected and the data to be corrected have the same field, and the numerical values are not null and are not abnormal values.
FIG. 5 is a schematic diagram of a data skew correction apparatus according to another embodiment of the present invention. As shown in fig. 5, the data skew correction apparatus 50 of this embodiment includes: a memory 510 and a processor 520 coupled to the memory 510, the processor 520 being configured to perform a data deskewing method according to any of the embodiments described above based on instructions stored in the memory 510.
Memory 510 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.
FIG. 6 is a schematic diagram of a data skew correction apparatus according to further embodiments of the present invention. As shown in fig. 6, the data skew correction apparatus 60 of this embodiment includes: the memory 610 and the processor 620 may further include an input/output interface 630, a network interface 640, a storage interface 650, and the like. These interfaces 630, 640, 650 and the connections between the memory 610 and the processor 620 may be, for example, via a bus 660. The input/output interface 630 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 640 provides a connection interface for various networking devices. The storage interface 650 provides a connection interface for external storage devices such as an SD card and a usb disk.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement any one of the foregoing data rectification methods.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A method of data deskewing, comprising:
inputting a plurality of pieces of data before time corresponding to data to be corrected, which is acquired from Deep Packet Inspection (DPI) data, into a feature extraction network to obtain core features of the plurality of pieces of data, which are output by the feature extraction network;
constructing a core characteristic sequence of the plurality of pieces of data according to the corresponding time sequence of the plurality of pieces of data;
inputting the core characteristic sequence into a generator which is trained in advance and based on a long-term and short-term memory network, and generating predicted data;
and replacing the data to be rectified with the predicted data.
2. The data rectification method of claim 1, wherein the feature extraction network comprises a convolutional neural network and a core feature extraction layer;
the convolutional neural network extracts hidden features from the pieces of data, and the core feature extraction layer extracts core features from the hidden features.
3. The data rectification method of claim 2, wherein the convolutional neural network has a residual structure.
4. The data rectification method according to claim 3, wherein the convolutional neural network is an inclusion-Resnet network.
5. The data rectification method according to claim 2, wherein the core feature extraction layer is an attention layer.
6. The data rectification method of claim 1, wherein the LSTM network-based generator is a generator in a generative confrontation network, the generative confrontation network further comprising a decider;
the data deviation rectifying method further comprises the following steps:
inputting target data acquired from DPI data used for training and a plurality of pieces of training data before time corresponding to the target data into a feature extraction network to obtain core features of the target data and the plurality of pieces of training data, wherein the core features are output by the feature extraction network;
constructing a core characteristic training sequence of the plurality of pieces of data according to the corresponding time sequence of the plurality of pieces of training data;
inputting the core characteristic sequence into a generator based on an LSTM network trained in advance to generate predicted data;
inputting the predicted data into a feature extraction network to obtain core features of the predicted data output by the feature extraction network;
inputting the core features of the prediction data and the core features of the target data into the discriminator so as to train the feature extraction network and the generative countermeasure network according to the judgment result of the discriminator.
7. The data rectification method of claim 1, further comprising:
determining data with empty numerical values or fields with abnormal numerical values in the DPI data as data to be corrected;
and a plurality of pieces of data before the time corresponding to the data to be corrected have the same field as the data to be corrected, and the numerical values are not null and are not abnormal values.
8. A data skew correction apparatus comprising:
the system comprises a feature extraction module, a data processing module and a data processing module, wherein the feature extraction module is configured to input a plurality of pieces of data before time corresponding to data to be corrected, which is acquired from Deep Packet Inspection (DPI) data, into a feature extraction network to obtain core features of the plurality of pieces of data, which are output by the feature extraction network;
a sequence construction module configured to construct a core feature sequence of the plurality of pieces of data in a corresponding time order of the plurality of pieces of data;
a data generation module configured to input the core feature sequence into a generator based on a long-short term memory network trained in advance, and generate predicted data;
a deskew module configured to replace the data to be deskewed with predicted data.
9. A data skew correction apparatus comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the data deskewing method of any of claims 1-7 based on instructions stored in the memory.
10. A computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the data rectification method of any one of claims 1 to 7.
CN202010145446.9A 2020-03-05 2020-03-05 Data correction method and device and computer readable storage medium Pending CN113360484A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010145446.9A CN113360484A (en) 2020-03-05 2020-03-05 Data correction method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010145446.9A CN113360484A (en) 2020-03-05 2020-03-05 Data correction method and device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN113360484A true CN113360484A (en) 2021-09-07

Family

ID=77523455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010145446.9A Pending CN113360484A (en) 2020-03-05 2020-03-05 Data correction method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113360484A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000079444A1 (en) * 1999-06-23 2000-12-28 Koichiro Matsuno Economic data processing device, economic data processing method, economic data processing system, and information recorded medium
CN106934196A (en) * 2015-12-30 2017-07-07 中国移动通信集团河南有限公司 A kind of quantitative forecasting technique and device
CN107463633A (en) * 2017-07-17 2017-12-12 中国航天系统科学与工程研究院 A kind of real time data rejecting outliers method based on EEMD neutral nets
CN108446324A (en) * 2018-02-11 2018-08-24 重庆邮电大学 A kind of GPS data reconstructing method based on long memory network LSTM in short-term
CN108549948A (en) * 2018-06-06 2018-09-18 深圳市海波广告有限公司 A kind of station board position automatic correction method and its system
CN108614944A (en) * 2018-05-10 2018-10-02 西安电子科技大学 A kind of shield track axis correction parameter prediction technique
CN108733703A (en) * 2017-04-20 2018-11-02 北京京东尚科信息技术有限公司 The answer prediction technique and device of question answering system, electronic equipment, storage medium
CN109960626A (en) * 2017-12-26 2019-07-02 中国移动通信集团辽宁有限公司 Recognition methods, device, equipment and the medium of port exception

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000079444A1 (en) * 1999-06-23 2000-12-28 Koichiro Matsuno Economic data processing device, economic data processing method, economic data processing system, and information recorded medium
CN106934196A (en) * 2015-12-30 2017-07-07 中国移动通信集团河南有限公司 A kind of quantitative forecasting technique and device
CN108733703A (en) * 2017-04-20 2018-11-02 北京京东尚科信息技术有限公司 The answer prediction technique and device of question answering system, electronic equipment, storage medium
CN107463633A (en) * 2017-07-17 2017-12-12 中国航天系统科学与工程研究院 A kind of real time data rejecting outliers method based on EEMD neutral nets
CN109960626A (en) * 2017-12-26 2019-07-02 中国移动通信集团辽宁有限公司 Recognition methods, device, equipment and the medium of port exception
CN108446324A (en) * 2018-02-11 2018-08-24 重庆邮电大学 A kind of GPS data reconstructing method based on long memory network LSTM in short-term
CN108614944A (en) * 2018-05-10 2018-10-02 西安电子科技大学 A kind of shield track axis correction parameter prediction technique
CN108549948A (en) * 2018-06-06 2018-09-18 深圳市海波广告有限公司 A kind of station board position automatic correction method and its system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHULHYUN HWANG ET AL.: "Detection and Correction Method of Erroneous Data Using Quantile Pattern and LSTM", 《JOURNAL OF INFORMATION AND COMMUNICATION CONVERGENCE ENGINEERING》, 31 December 2018 (2018-12-31), pages 242 - 247 *
孙路明 等: "人工智能赋能的数据管理技术研究", 《软件学报》, 5 December 2019 (2019-12-05), pages 600 - 619 *

Similar Documents

Publication Publication Date Title
US10796244B2 (en) Method and apparatus for labeling training samples
CN106682906B (en) Risk identification and service processing method and equipment
CN110166344B (en) Identity identification method, device and related equipment
CN103324745A (en) Text garbage identifying method and system based on Bayesian model
CN105630767A (en) Text similarity comparison method and device
CN105447147A (en) Data processing method and apparatus
CN106156098B (en) Error correction pair mining method and system
CN110392046B (en) Method and device for detecting abnormity of network access
CN104899499A (en) Internet image search based Web verification code generation method
CN105871585A (en) Terminal association method and device
CN104317891A (en) Method and device for tagging pages
CN103297267A (en) Method and system for network behavior risk assessment
EP3918512A1 (en) System and method for spatial encoding and feature generators for enhancing information extraction
CN109905362B (en) User request detection method and device, computer equipment and storage medium
CN103955713A (en) Icon recognition method and device
CN107741850B (en) Method and device for generating dynamic wallpaper package and storage medium
CN105573726B (en) A kind of rules process method and equipment
CN110889467A (en) Company name matching method and device, terminal equipment and storage medium
CN109446052B (en) Verification method and device for application program
US10572486B2 (en) Data communication in a distributed data grid
CN113360484A (en) Data correction method and device and computer readable storage medium
CN109697224B (en) Bill message processing method, device and storage medium
CN103077229A (en) Method and system for matching user groups
CN106610991A (en) Data processing method and device
CN113220949B (en) Construction method and device of private data identification system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220130

Address after: 100007 room 205-32, floor 2, building 2, No. 1 and No. 3, qinglonghutong a, Dongcheng District, Beijing

Applicant after: Tianyiyun Technology Co.,Ltd.

Address before: No.31, Financial Street, Xicheng District, Beijing, 100033

Applicant before: CHINA TELECOM Corp.,Ltd.

TA01 Transfer of patent application right