CN109299185B - Analysis method for convolutional neural network extraction features aiming at time sequence flow data - Google Patents

Analysis method for convolutional neural network extraction features aiming at time sequence flow data Download PDF

Info

Publication number
CN109299185B
CN109299185B CN201811216349.3A CN201811216349A CN109299185B CN 109299185 B CN109299185 B CN 109299185B CN 201811216349 A CN201811216349 A CN 201811216349A CN 109299185 B CN109299185 B CN 109299185B
Authority
CN
China
Prior art keywords
data
neural network
dimension
convolutional neural
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811216349.3A
Other languages
Chinese (zh)
Other versions
CN109299185A (en
Inventor
周同明
汪卫
邢宏岩
刁广州
杨勇
秦嘉岷
姜军
王旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Shipbuilding Technology Research Institute
Original Assignee
Shanghai Shipbuilding Technology Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Shipbuilding Technology Research Institute filed Critical Shanghai Shipbuilding Technology Research Institute
Priority to CN201811216349.3A priority Critical patent/CN109299185B/en
Publication of CN109299185A publication Critical patent/CN109299185A/en
Application granted granted Critical
Publication of CN109299185B publication Critical patent/CN109299185B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an analysis method for extracting characteristics of a convolutional neural network aiming at time sequence flow data, which comprises the following steps of preprocessing corresponding flow data, preprocessing such as data cleaning, data integration, data transformation, data merging, data remodeling and the like, and ensuring the accuracy of subsequent flow data analysis; then sampling the flow data, usually selecting a mode of attenuation window for sampling, and generating an analysis sample; carefully analyzing the data characteristics and the incidence relation of different dimensions, if the data characteristics and the incidence relation of different dimensions do not have the correlation or the correlation is not large, trying to adopt a dimension-divided convolutional neural network structure to carry out mining analysis, and not only keeping the time sequence stream data characteristics, but also finding out the combination characteristics among different dimensions. The method is favorable for finding a better method for processing the stream data by applying the convolutional neural network.

Description

Analysis method for convolutional neural network extraction features aiming at time sequence flow data
Technical Field
The invention relates to the field of stream data analysis, in particular to an analysis method for extracting characteristics of a convolutional neural network aiming at time sequence stream data.
Background
Thousands of data are being produced every moment, resulting in explosive growth of data every moment. The data explosion type growth flow, the huge and stable data storage and the data application availability in the industry field endow abundant original materials for the artificial intelligence era.
Also because we are in an era of data explosion, a powerful data processing and analyzing tool is more urgently needed, so that we can find information which is not valued by us once, find valuable information and even find knowledge about human survival from massive time-series flow data.
Disclosure of Invention
The invention provides an analysis method for extracting characteristics of a convolutional neural network aiming at time sequence flow data, which aims to solve the technical problems that the conventional model and method have difficulties or cannot effectively extract implicit characteristics, and the time sequence characteristics and the dimensional characteristics are difficult to be considered simultaneously in a convolutional neural network structure model in deep learning.
In order to solve the technical problems, the invention provides the following technical scheme:
the invention provides an analysis method for convolutional neural network extraction characteristics aiming at time sequence flow data, which comprises the following steps:
s1: preprocessing stream data;
s2, selecting a sample by an attenuation window method;
s3, designing and building a convolutional neural network model architecture;
s4, extracting features by dimensionality by adopting a convolution model;
s5, displaying and comparing the effect graphs generated by the deep learning logs;
s6, visualizing a deep learning effect graph;
aiming at time sequence characteristics and dimension information characteristics in stream data, a dimension-based convolutional neural network model is set up and adopted, strong characteristics and strong rules contained in basic information in the data are extracted, and time sequence characteristics of the stream data are considered; after the feature extraction and the reinforcement of the multidimensional data, a model which comprises the time sequence feature and the dimension feature is synthesized.
In the step S1, stream data preprocessing is carried out according to the characteristics of the stream data, including data key information identification and redundant attribute identification, factors which have the greatest influence on results are artificially screened out, the stream data of all the factors is preprocessed, and abnormal items, missing items, redundant items and difference items of historical data are supplemented by utilizing preprocessing means of data cleaning, data integration, data transformation, data merging and data remodeling; and carefully observing sample data obtained by preprocessing, carrying out digital description on screening important information, and manually establishing dimensionality of the screening target characteristics.
As a preferred technical solution of the present invention, in the step S1, the abnormal data preprocessing mode includes:
the data loss is filtered while the level of the data is increased and the data volume is reduced;
data is abnormal, and preprocessing is performed by adopting a data deletion, integrated analysis and substitution combined with an integral model and considered as missing value equivalence filling, so that the deviation degree between the processed abnormal value and other values is minimized;
data redundancy, if the correlation between two attributes of the data is large, an unimportant attribute is removed from the two attributes;
data normalization, wherein data of each dimension is not in a uniform range, and in cross-dimension calculation, the weight swings up and down too much to be beneficial to adjustment and calculation;
tag and time stamp: and (4) supervising learning aiming at the classification problem, labeling the data set and simultaneously labeling the data set with a timestamp.
As a preferred technical solution of the present invention, the step S2 specifically includes: acquiring a streaming data sample, filtering the streaming data and acquiring the streaming data;
in the acquisition of a stream data sample, in a general sampling problem, the stream data consists of a series of n field tuples, and a subset of the fields is called as key fields; assuming that the sample size after sampling is a/b, hashing the key value of each tuple to one of b buckets, and then putting the tuple of which the hash value is less than a into a sample; if there is more than one key field, the hash function combines the values of these fields to form a single hash value; the finally obtained sample is composed of all tuples of certain specific key values; the ratio of the number of the selected key values to the total number of the key values in the stream is a/b;
in the flow data filtering, a bloom filter is adopted, the bloom filter comprises an array consisting of n bits, the initial value of each bit is 0, and a series of hash functions h1, h2 \8230, each hash function maps a key value to a set S consisting of m key values in n buckets; the bloom filter allows all the flow elements with key values in S to pass through, and blocks most flow elements with key values not in S;
the stream data is obtained by adopting a method of a decay window to obtain the stream data and calculate a smooth accumulated value, wherein the adopted weight is constantly decayed and is called an exponential decay window which is marked as
Figure GDA0004038948250000021
Wherein a is 1 For the first arriving element, a t Let c =10 as the current element -9
As a preferred technical solution of the present invention, in the step S3, input data is input into a model, a first layer of the model is a convolutional layer, and an input of the layer is filtered stream data, unlike a conventional fully-connected layer, an input of each node in the convolutional layer is only a small block of a neural network in a previous layer; the convolutional layer analyzes each small block in the neural network more deeply so as to obtain the characteristic with higher abstraction degree; the node matrix processed by the convolutional layer becomes deeper, and the depth of the node matrix after the convolutional layer is increased; the second layer is a pooling layer, and the neural network of the pooling layer does not change the depth of the matrix but reduces the size of the matrix; the pooling operation is to convert a high-resolution picture into a low-resolution picture (the data size is reduced but the data characteristics are still reserved); through the pooling layer, the number of nodes in the last full-connection layer can be further reduced, so that the number of parameters in the whole neural network is reduced; after processing of the convolutional layers and the pooling layers, giving a final classification result by 1 to 2 fully-connected layers at the end of the convolutional neural network; after several rounds of processing of convolutional and pooling layers, the information in the data has been abstracted into more information-rich features; the convolutional layer and the pooling layer are processes for automatically extracting features, and after the feature extraction is completed, a classification task is completed by using a fully connected layer.
As a preferred technical solution of the present invention, in the step S4, for each dimension, independent convolution is performed; and (3) respectively extracting the feature of each dimension by independently extracting the form of the dimension feature of each dimension, respectively strengthening the feature of each dimension, finally integrating the strongest features of each dimension, and judging the final classification result by combination.
As a preferred embodiment of the present invention, in step S5, after the deep learning is performed, the accuracy of classification or prediction after the deep learning is calculated, and a log and an accuracy result are generated for improving, modifying a model and debugging.
The invention has the following beneficial effects: in the data processing and extraction, the invention can extract the strong characteristics and the strong rules contained in the basic information, and simultaneously considers the novel analysis method of the self-contained time sequence characteristics of the stream data, realizes the method of adopting the convolutional neural network to the time sequence stream data, extracts the strong characteristics and the strong rules in the basic information, and is compatible with the self-contained time sequence characteristics of the stream data.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
In the drawings:
FIG. 1 is an overall flow chart of the multidimensional model construction of the present invention;
FIG. 2 is a diagram of the convolutional neural network structure of the present invention;
FIG. 3 is a block diagram of a multidimensional convolutional neural network of the present invention
FIG. 4 is a flow diagram of the present invention's multidimensional convolutional neural network structure;
FIG. 5 is an example of data samples of the present invention, exemplified by financial time series flow data;
FIG. 6 is an exemplary sampling method for streaming data filtering, as exemplified by financial time series streaming data, in accordance with the present invention;
FIG. 7 illustrates a general neural network convolution approach to the financial time series flow data of the present invention;
FIG. 8 illustrates a multidimensional neural network convolution approach, such as financial time-series flow data, in accordance with the present invention;
FIG. 9 is a graph of the accuracy effect of the present invention on a multidimensional convolutional neural network architecture, exemplified by financial time-series flow data;
FIG. 10 is a graph illustrating the training effect of the present invention on a multidimensional convolutional neural network structure, which is illustrated by financial time series flow data;
FIG. 11 is a parameter comparison example of various model algorithms of the present invention, using financial time series flow data as an example.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it should be understood that they are presented herein only to illustrate and explain the present invention and not to limit the present invention.
Example (b): as shown in fig. 1 to 11, the present invention provides an analysis method for extracting features of a convolutional neural network for time-series stream data, comprising the following steps:
s1: preprocessing stream data;
s2, selecting a sample by an attenuation window method;
s3, designing and building a convolutional neural network model architecture;
s4, extracting features by dimensionality by adopting a convolution model;
s5, displaying and comparing the effect graphs generated by the deep learning logs;
s6, visualizing a deep learning effect graph;
aiming at the time sequence characteristics and the dimension information characteristics in the stream data, a dimension-based convolutional neural network model is built and adopted to extract the strong characteristics and the strong rules contained in the basic information in the data, and the self-contained time sequence characteristics of the stream data are considered; after the feature extraction and the reinforcement of the multidimensional data, a model which comprises the time sequence feature and the dimension feature is synthesized.
In the step S1, stream data preprocessing is carried out according to the characteristics of the stream data, including data key information identification and redundant attribute identification, factors which have the greatest influence on results are artificially screened out, the stream data of all the factors is preprocessed, and abnormal items, missing items, redundant items and difference items of historical data are supplemented by utilizing preprocessing means of data cleaning, data integration, data transformation, data merging and data remodeling; and carefully observing sample data obtained by preprocessing, carrying out digital description on screening important information, and manually establishing dimensionality of the screening target characteristics.
Further, in step S1, the abnormal data preprocessing method includes:
data loss is filtered out while the level of the data is increased and the data volume is reduced;
if the data is abnormal, preprocessing the data in a mode of deleting the data, combining the data with an integral model for comprehensive analysis and substitution and regarding the data as missing value equivalent filling so as to minimize the deviation degree between the processed abnormal value and other values;
data redundancy, if the correlation between two attributes of the data is large, an unimportant attribute is removed from the two attributes;
data normalization, wherein data of each dimension is not in a uniform range, and in cross-dimension calculation, the weight swings up and down too much to be beneficial to adjustment and calculation;
tag and mark timestamp: and (4) supervising learning aiming at the classification problem, labeling the data set and simultaneously labeling the data set with a timestamp.
Further, the specific process in step S2 is as follows: acquiring a streaming data sample, filtering the streaming data and acquiring the streaming data;
in the acquisition of a stream data sample, in a general sampling problem, the stream data consists of a series of n field tuples, and a subset of the fields is called as key fields; assuming that the sample size after sampling is a/b, hashing a key value of each tuple to one of b buckets, and then putting the tuple of which the hash value is smaller than a into a sample; if there is more than one key field, the hash function combines the values of these fields to form a single hash value; the finally obtained sample consists of all tuples of certain specific key values; the ratio of the number of the selected key values to the total number of the key values in the stream is a/b;
in the flow data filtering, a bloom filter is adopted, wherein the bloom filter comprises an array consisting of n bits, the initial value of each bit is 0, and a series of hash functions h1, h2 \8230areadopted; the bloom filter allows all the flow elements with key values in S to pass through, and blocks most flow elements with key values not in S;
the stream data is obtained by adopting a method of a decay window to obtain the stream data and calculate a smooth accumulated value, wherein the adopted weight is constantly decayed and is called an exponential decay window which is marked as
Figure GDA0004038948250000051
Wherein a is 1 For the first arriving element, a t Let c be a very small constant, e.g. 10, for the current element -9
Further, in the step S3, input data is input into the model, the first layer of the model is a convolutional layer, and the input of this layer is filtered stream data, unlike the conventional fully-connected layer, the input of each node in the convolutional layer is only a small block of the neural network in the previous layer; each small block in the neural network is analyzed more deeply by the convolutional layer so as to obtain the characteristic with higher abstraction degree; the node matrix processed by the convolutional layer becomes deeper, and the depth of the node matrix after the convolutional layer is increased; the second layer is a pooling layer, and the neural network of the pooling layer does not change the depth of the matrix but reduces the size of the matrix; the pooling operation is to convert a high-resolution picture into a low-resolution picture (the data size is reduced but the data characteristics are still kept); through the pooling layer, the number of nodes in the last full-connection layer can be further reduced, so that the number of parameters in the whole neural network is reduced; after processing of the convolutional layers and the pooling layers, giving a final classification result by 1 to 2 fully-connected layers at the end of the convolutional neural network; after several rounds of processing of convolutional and pooling layers, the information in the data has been abstracted into more information-rich features; the convolutional layer and the pooling layer are processes for automatically extracting features, and after the feature extraction is completed, a classification task is completed by using a fully connected layer.
Further, in the step S4, for each dimension, independent convolution is performed; and (3) respectively extracting the feature of each dimension by independently extracting the form of the dimension feature of each dimension, respectively strengthening the feature of each dimension, finally integrating the strongest features of each dimension, and judging the final classification result by combination.
Further, in step S5, after the deep learning is performed, the accuracy of classification or prediction after the deep learning is calculated, and a log and an accuracy result are generated for improving, modifying the model and debugging.
Specifically, the method comprises the following steps: in step S1, the stream data is preprocessed in a preprocessing manner, where the preprocessing manner includes data loss, data exception, data redundancy, data normalization, labeling, and time stamping.
In the step S2, the stream data rotation sample in the step S1 is mainly subjected to operation by adopting an attenuation window operation method, the main processes are acquisition of the stream data, filtering of the stream data and operation of the attenuation window, and the dimension for screening the target characteristics is manually established according to the target data set and the project characteristics; the reason for selecting samples using the attenuation window method is: in the time series flow data, the flow data which occurs in the near future can generate influence on the current flow data and the data at a short time in the future, and the influence factor is determined according to the actual situation and the actual data. Typically, there is an important correlation between adjacently generated stream data, while far apart stream data has a much smaller correlation to the data generation that occurs at the moment. So at sample acquisition, the operations of flow data filtering and attenuation window are employed. Taking financial time series prediction as an example, the experiment used a contiguous 90 time window, i.e., past 270 minutes of data (about one day of transaction time) as the basis for future three minute price predictions.
The meaning of the attenuation window is illustrated by taking an exponential moving average line in the time sequence flow data in the financial field as an example: the weighting factor of the price per day is reduced in an exponential equal proportion mode. The closer the time is to the present moment, the greater the weight of the system is, which shows that the weight ratio of the exponential moving average line to the recent price is strengthened, and the recent price fluctuation condition can be reflected more timely. The exponential moving average line is more valuable than the moving average line.
Similarly, in the analog time sequence data, the closer the time is to the current moment, the greater the weight is given to the time sequence data, namely different weights can be distributed to the past time sequence data generated by the recent data, the weight ratio of the recent data is strengthened, and the condition of recent value fluctuation can be reflected more timely.
In an application experiment in financial field time series flow data, all K-line data are time stamped respectively, then 240 minutes of each day are decomposed into 80 buckets in time sequence, the transaction flow data are hashed into the 80 "buckets", a bloom filter is adopted, the bloom filter comprises an array of 80 time-position bits, and each hash function maps a "key" value (time represented by data sample) to a set S of n buckets (daily transaction time period) as described above. The bloom filter passes stream elements of the data sample in S, while blocking stream elements for which most key values are not in S. Such operation plays a role in filtering out a part of data near the closing time and near the opening time, and avoids the possible deviation of data caused by the inactive market liquidity. In the data source which can show the short-term trend and the volume price index of the data by selecting the moving average line, an operation mode of a decay window is introduced, because the closer the time is to the current moment in the time sequence flow data, the more the weight is given, different weights can be distributed to the generation of the recent data, the weight ratio of the recent flow data is strengthened, and the condition of the recent numerical value fluctuation can be reflected more timely. Similarly, in the financial field time sequence flow data, the exponential moving average line EMA is adopted, the EMA effectively reduces the weight of data far away from the current moment, increases the weight of factors with large recent influence, and can enable machine learning to better learn the potential law.
In step S3, the convolutional neural network is a variation of the multilayer perceptron, and is the first learning algorithm for successfully training the structure of the multilayer neural network in the true sense. The weight sharing network structure of the convolutional neural network makes the structure more similar to a biological neural network, greatly reduces the complexity of a model, reduces the number of weights and simplifies the complexity of calculation. Convolution is a mathematical operation on two real variable functions, expressed as: s (t) = (x w) (t); where w must be an effective probability density function, otherwise the output is no longer a weighted average; x is an input; the parameter w is a kernel function, the output sometimes referred to as a feature map; t is the time axis. In discrete form is:
Figure GDA0004038948250000071
in machine learning, the input is data of a multidimensional array, and the kernel is a parameter of the multidimensional array optimized by a learning algorithm. We often perform convolution operations in multiple dimensions: />
Figure GDA0004038948250000072
In step S4, individual convolution is performed for each dimension. And respectively extracting the features of each dimension in a form of independently extracting the features of the dimensions, respectively strengthening the features of each dimension, finally integrating the strongest features of each dimension, and judging the final classification result in a combined manner. In the data processing process, all input column vectors are respectively and independently output to carry out dimension-dividing convolution operation. Therefore, the time sequence characteristics of single dimension are kept, the dimension characteristics can be captured, and the most important characteristics of each dimension are finally integrated. In step S5, after the deep learning is performed, the deep learning is performedClassifying results or calculating the predicted accuracy, and generating a log and an accuracy result for improving, modifying a model and debugging; and simultaneously, if the optimization is needed, the step S6 is carried out, otherwise, the step S3 is carried out repeatedly.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that various changes, modifications and substitutions can be made without departing from the spirit and scope of the invention as defined by the appended claims. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. An analysis method for extracting features of a convolutional neural network aiming at time sequence flow data is characterized by comprising the following steps of:
s1: preprocessing stream data;
s2, selecting a sample by an attenuation window method;
s3, designing and building a convolutional neural network model architecture;
s4, extracting features by dimensionality by adopting a convolution model;
s5, displaying and comparing the effect graphs generated by the deep learning logs;
s6, visualizing a deep learning effect graph;
aiming at the time sequence characteristics and the dimension information characteristics in the stream data, a dimension-based convolutional neural network model is built and adopted to extract the strong characteristics and the strong rules contained in the basic information in the data, and the self-contained time sequence characteristics of the stream data are considered; after the feature extraction and the reinforcement of the multidimensional data, a model which comprises a time sequence feature and a dimension feature is synthesized;
the specific process of the step S2 is as follows: acquiring a streaming data sample, filtering the streaming data and acquiring the streaming data;
in the acquisition of a stream data sample, in a general sampling problem, the stream data consists of a series of n field tuples, and a subset of the fields is called as key fields; assuming that the sample size after sampling is a/b, hashing a key value of each tuple to one of b buckets, and then putting the tuple of which the hash value is smaller than a into a sample; if there is more than one key field, the hash function combines the values of these fields to form a single hash value; the finally obtained sample is composed of all tuples of certain specific key values; the ratio of the number of the selected key values to the total number of the key values in the stream is a/b;
in the flow data filtering, a bloom filter is adopted, wherein the bloom filter comprises an array consisting of n bits, the initial value of each bit is 0, and a series of hash functions h1, h2 \8230areadopted; the bloom filter allows all the flow elements with key values in S to pass through, and blocks most flow elements with key values not in S;
the stream data acquisition has important correlation between the adjacent generated stream data, and the earlier the element appears in the stream, the smaller the correlation, so the attenuation window method is adopted to extract the stream data and calculate a smooth accumulated value, wherein the adopted weight is continuously attenuated, which is called exponential attenuation window and is marked as
Figure FDA0004038948240000011
Wherein a is 1 For the first arriving element, a t Let c =10 for the current element -9
In the step S3, input data is input into the model, the first layer of the model is a convolutional layer, the input of the layer is flow data after being screened, and unlike the traditional full connection layer, the input of each node in the convolutional layer is only a small block of the neural network of the previous layer; the convolutional layer analyzes each small block in the neural network more deeply so as to obtain the characteristic with higher abstraction degree; the node matrix processed by the convolutional layer becomes deeper, and the depth of the node matrix after the convolutional layer is increased; the second layer is a pooling layer, and the neural network of the pooling layer does not change the depth of the matrix but reduces the size of the matrix; the pooling operation is to convert a high-resolution picture into a low-resolution picture, and the data size is reduced while the data characteristics are still kept; through the pooling layer, the number of nodes in the last full-connection layer can be further reduced, so that the number of parameters in the whole neural network is reduced; after processing of the convolutional layers and the pooling layers, giving a final classification result by 1 to 2 fully-connected layers at the end of the convolutional neural network; after several rounds of processing of convolutional and pooling layers, the information in the data has been abstracted into more information-rich features; convolutional and pooling layers are processes that automatically extract features, and after feature extraction is complete, classification tasks are completed using fully-connected layers.
2. The analysis method for extracting features of the convolutional neural network for time-series flow data according to claim 1, wherein in step S1, according to the characteristics of the flow data, flow data preprocessing is performed, including data key information identification and redundancy attribute identification, factors which have the greatest influence on the result are artificially screened out, the flow data of all the factors are preprocessed, and abnormal items, missing items, redundant items and difference items of the historical data are supplemented by preprocessing means of data cleaning, data integration, data transformation, data merging and data remodeling; and carefully observing sample data obtained by preprocessing, carrying out digital description on screening important information, and manually establishing dimensionality of the screening target characteristics.
3. The analysis method for extracting features of the convolutional neural network for time-series flow data according to claim 2, wherein in step S1, abnormal data preprocessing is performed in a manner that includes:
data loss is filtered out while the level of the data is increased and the data volume is reduced;
data is abnormal, and preprocessing is performed by adopting a data deletion, integrated analysis and substitution combined with an integral model and considered as missing value equivalence filling, so that the deviation degree between the processed abnormal value and other values is minimized;
data redundancy, if the correlation between two attributes of the data is large, an unimportant attribute is removed from the two attributes;
data normalization, wherein data of each dimension is not in a uniform range, and in cross-dimension calculation, the weight swings up and down too much to be beneficial to adjustment and calculation;
tag and time stamp: and (4) supervising learning aiming at the classification problem, labeling the data set and simultaneously labeling the data set with a timestamp.
4. The analysis method for extracting features of the convolutional neural network for time-series flow data according to claim 1, wherein in step S4, independent convolution is performed for each dimension; and (3) respectively extracting the feature of each dimension by independently extracting the form of the dimension feature of each dimension, respectively strengthening the feature of each dimension, finally integrating the strongest features of each dimension, and judging the final classification result by combination.
5. The analysis method for extracting features of the convolutional neural network for time-series flow data according to claim 1, wherein in step S5, after deep learning is performed, the accuracy of classification or prediction after deep learning is calculated, and a log and an accuracy result are generated for improving, modifying a model and debugging.
CN201811216349.3A 2018-10-18 2018-10-18 Analysis method for convolutional neural network extraction features aiming at time sequence flow data Active CN109299185B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811216349.3A CN109299185B (en) 2018-10-18 2018-10-18 Analysis method for convolutional neural network extraction features aiming at time sequence flow data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811216349.3A CN109299185B (en) 2018-10-18 2018-10-18 Analysis method for convolutional neural network extraction features aiming at time sequence flow data

Publications (2)

Publication Number Publication Date
CN109299185A CN109299185A (en) 2019-02-01
CN109299185B true CN109299185B (en) 2023-04-07

Family

ID=65157370

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811216349.3A Active CN109299185B (en) 2018-10-18 2018-10-18 Analysis method for convolutional neural network extraction features aiming at time sequence flow data

Country Status (1)

Country Link
CN (1) CN109299185B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967616B (en) * 2020-08-18 2024-04-23 深延科技(北京)有限公司 Automatic time series regression method and device
CN111966740A (en) * 2020-08-24 2020-11-20 安徽思环科技有限公司 Water quality fluorescence data feature extraction method based on deep learning
CN112232197A (en) * 2020-10-15 2021-01-15 武汉微派网络科技有限公司 Juvenile identification method, device and equipment based on user behavior characteristics
CN112184056B (en) * 2020-10-19 2024-02-09 中国工商银行股份有限公司 Data feature extraction method and system based on convolutional neural network
CN114385699A (en) * 2022-01-06 2022-04-22 云南电网有限责任公司信息中心 Abnormal analysis method for user price rate of power grid
CN115065560A (en) * 2022-08-16 2022-09-16 国网智能电网研究院有限公司 Data interaction leakage-prevention detection method and device based on service time sequence characteristic analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194404A (en) * 2017-04-13 2017-09-22 哈尔滨工程大学 Submarine target feature extracting method based on convolutional neural networks
WO2018028255A1 (en) * 2016-08-11 2018-02-15 深圳市未来媒体技术研究院 Image saliency detection method based on adversarial network
CN108647834A (en) * 2018-05-24 2018-10-12 浙江工业大学 A kind of traffic flow forecasting method based on convolutional neural networks structure

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018028255A1 (en) * 2016-08-11 2018-02-15 深圳市未来媒体技术研究院 Image saliency detection method based on adversarial network
CN107194404A (en) * 2017-04-13 2017-09-22 哈尔滨工程大学 Submarine target feature extracting method based on convolutional neural networks
CN108647834A (en) * 2018-05-24 2018-10-12 浙江工业大学 A kind of traffic flow forecasting method based on convolutional neural networks structure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王勇 ; 周慧怡 ; 俸皓 ; 叶苗 ; 柯文龙 ; .基于深度卷积神经网络的网络流量分类方法.通信学报.2018,(01),全文. *

Also Published As

Publication number Publication date
CN109299185A (en) 2019-02-01

Similar Documents

Publication Publication Date Title
CN109299185B (en) Analysis method for convolutional neural network extraction features aiming at time sequence flow data
CN111585997B (en) Network flow abnormity detection method based on small amount of labeled data
CN110298663B (en) Fraud transaction detection method based on sequence wide and deep learning
CN109993100B (en) Method for realizing facial expression recognition based on deep feature clustering
CN112116001B (en) Image recognition method, image recognition device and computer-readable storage medium
CN112308288A (en) Particle swarm optimization LSSVM-based default user probability prediction method
CN112232604B (en) Prediction method for extracting network traffic based on Prophet model
Malathi et al. Evolving data mining algorithms on the prevailing crime trend–an intelligent crime prediction model
CN114255447A (en) Unsupervised end-to-end video abnormal event data identification method and unsupervised end-to-end video abnormal event data identification device
CN116307227A (en) Service information processing method, device and computer equipment
CN114723003A (en) Event sequence prediction method based on time sequence convolution and relational modeling
CN114117029A (en) Solution recommendation method and system based on multi-level information enhancement
Acharya et al. Efficacy of CNN-bidirectional LSTM hybrid model for network-based anomaly detection
CN115883424A (en) Method and system for predicting traffic data between high-speed backbone networks
CN114493858A (en) Illegal fund transfer suspicious transaction monitoring method and related components
Ulizko et al. Graph visualization of the characteristics of complex objects on the example of the analysis of politicians
Xie et al. Models and Features with Covariate Shift Adaptation for Suspicious Network Event Recognition
Kaushal Deep RNN-based Traffic Analysis Scheme for Detecting Target Applications
ARSLAN et al. Crime Classification using Categorical Feature Engineering and Machine Learning
CN115660221B (en) Oil and gas reservoir economic recoverable reserve assessment method and system based on hybrid neural network
CN113688229B (en) Text recommendation method, system, storage medium and equipment
Ramazan Classification of Historical Anatolian Coins with Machine Learning Algorithms
Goutham et al. A Study of incremental Learning model using deep neural network
Falissard et al. Learning a binary search with a recurrent neural network. A novel approach to ordinal regression analysis
APURVA ANALYSIS OF DISEASE DETECTION IN COTTON PLANT LEA YSIS OF DISEASE DETECTION IN COTTON PLANT LEA YSIS OF DISEASE DETECTION IN COTTON PLANT LEAVES USING CONVOLUTIONAL NEURAL NETWORKS.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 200032 No. two, 851 South Road, Xuhui District, Shanghai, Zhongshan

Applicant after: Shanghai Shipbuilding Technology Research Institute (the 11th Research Institute of China Shipbuilding Corp.)

Address before: 200032 No. two, 851 South Road, Xuhui District, Shanghai, Zhongshan

Applicant before: SHIPBUILDING TECHNOLOGY Research Institute (NO 11 RESEARCH INSTITUTE OF CHINA STATE SHIPBUILDING Corp.,Ltd.)

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant