CN108470068A - A kind of abstract index generation method of sequential key assignments type industrial process data - Google Patents

A kind of abstract index generation method of sequential key assignments type industrial process data Download PDF

Info

Publication number
CN108470068A
CN108470068A CN201810270729.9A CN201810270729A CN108470068A CN 108470068 A CN108470068 A CN 108470068A CN 201810270729 A CN201810270729 A CN 201810270729A CN 108470068 A CN108470068 A CN 108470068A
Authority
CN
China
Prior art keywords
time series
data
series data
industrial process
key assignments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810270729.9A
Other languages
Chinese (zh)
Inventor
张可
韩载道
李媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201810270729.9A priority Critical patent/CN108470068A/en
Publication of CN108470068A publication Critical patent/CN108470068A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of abstract index generation methods of sequential key assignments type industrial process data, it includes S1:Obtain sequential key assignments type industrial process data;S2:The time series data of acquisition is pre-processed as smooth noise to obtain the time series data with timestamp;S3:Symbolization polymerization approximate representation method indicates that pretreatment obtains time series data;S4:Symbol is polymerize to the result after approximate representation and carries out pattern clustering, the result after progress pattern clustering is formed into index using prefix algorithm.The advantageous effect that the present invention obtains is:Symbol polymerization approximate representation method and prefix trees algorithm fusion are formed into the abstract index generation method of sequential key assignments type industrial process data based on data preprocessing method;This method can reduce the dimension of former time series data, effectively the feature of the former data of extraction, and realize abstract index generation method using prefix tree algorithm.

Description

A kind of abstract index generation method of sequential key assignments type industrial process data
Technical field
The present invention relates to Time Series Data Mining technical field, especially a kind of sequential key assignments type industrial process data Abstract index generation method.
Background technology
Time series data is widely present in the fields such as industrial process, weather detection, medical diagnosis.The type industry of sequential key assignments Process data has the characteristics that higher-dimension, magnanimity, therefore traditional data summarization index as a kind of typical time series data Generation method cannot analyze such data well.It is a kind of Symbolic Representation method of maturation that symbol, which polymerize approximate representation, It is widely used in the pretreatment and mode discovery of time series data.The advantage is that can utilize more mature efficient needle To the data mining algorithm of string operation.Prefix trees are a kind of key tree constructions, are a kind of mutation of Hash tree.Typical case is For a large amount of character string (but being not limited only to character string) that counts and sort, so often searched automotive engine system is used for text word Frequency counts.The disadvantage is that:Current time series data indexing means are mostly based on single dimension-reduction treatment expression or Symbolic Representation side Method, it is difficult to quickly, efficiently inquire time series data.Therefore, there is an urgent need for a kind of abstract ropes of new sequential key assignments type industrial process data Draw generation method.
Invention content
In view of the drawbacks described above of the prior art, it is an object of the invention to provide a kind of sequential key assignments type industrial process numbers According to abstract index generation method, a kind of indexing means encoded to it using prefix tree algorithm can be built.Symbolization Polymerization approximate representation method indicates that pretreatment obtains time series data;Then symbol is polymerize to the result after approximate representation carry out The cluster result is finally formed index by pattern clustering using prefix tree algorithm.
It is realized the purpose of the present invention is technical solution in this way, a kind of sequential key assignments type industrial process data is plucked Index generation method is wanted, it includes:
S1:Obtain sequential key assignments type industrial process data;
S2:The time series data of acquisition is pre-processed as smooth noise to obtain the time series data with timestamp;
S3:Symbolization polymerization approximate representation method indicates that pretreatment obtains time series data;
S4:Symbol is polymerize to the result after approximate representation and carries out pattern clustering, the result after progress pattern clustering is used Prefix algorithm forms index.
Further, the time series data to acquisition in the step S2 makees the pretreated specific steps of smooth noise such as Under:
S21:Separate-blas estimation is carried out to primordial time series data;It was found that noise, outlier and uncommon value, are investigated every The range of the domain and data type of a attribute and each attribute acceptable value;
S22:By investigating the value in data fields, by acquiring smoothed data according to case mean value method in branch mailbox method Value carrys out smooth ordered data, by continuous data discretization, obtains pretreated time series data, increases granularity.
Further, the step S3 is as follows:
S31:Equal length segmentation is carried out to the time series data obtained after step S2 pretreatments, takes each section of average value structure The time series data of Cheng Xin is indicating former Dimension Time Series;
S32:For the time series data of gained after dimensionality reduction, ordinal number when indicating to obtain this using symbol polymerization approximate representation method According to discretization approximate representation.
Further, the step S4 includes:
S41:For time series data Symbolic Representation form obtained by step S3, using K mean value pattern clustering methods to S3's As a result it clusters, obtains the character string mode result of a string of discretizations;
S42:It based on the above results, is encoded using prefix tree algorithm, forms index.
Further, the step S31 includes:
The time series data dimension that step S2 is obtained is n, and gained dimension is N after processing.I-th subsegment mean value can be by following formula It determines:
By adopting the above-described technical solution, the present invention has the advantage that:The present invention is by stage feeding polymerization approximate representation Method is used for the dimensionality reduction of time series data, ensure that apart from lower bound criterion so as to avoid the under-enumeration row in follow-up similar inquiry For.Invention applies classical Symbolic Representations so that it can be calculated on the basis of Data Dimensionality Reduction into row distance, be follow-up Theoretical foundation is provided using such as similar inquiry, abnormality detection etc..Most importantly the present invention is maximum by applying prefix tree algorithm Reduce to limit meaningless character string comparison, very big improve queried efficiency.
Other advantages, target and the feature of the present invention will be illustrated in the following description to a certain extent, and And to a certain extent, based on will be apparent to those skilled in the art to investigating hereafter, Huo Zheke To be instructed from the practice of the present invention.The target and other advantages of the present invention can be wanted by following specification and right Book is sought to realize and obtain.
Description of the drawings
The description of the drawings of the present invention is as follows:
Fig. 1 is the flow diagram of the abstract index generation method of sequential key assignments type industrial process data.
Fig. 2 is the prefix tree algorithm example flow diagram based on stage feeding polymerization approximate representation.
Specific implementation mode
The invention will be further described with reference to the accompanying drawings and examples.
Embodiment:As depicted in figs. 1 and 2;A kind of abstract index generation method of sequential key assignments type industrial process data, it Include:
S1:Obtain sequential key assignments type industrial process data;
S2:The time series data of acquisition is pre-processed as smooth noise to obtain the time series data with timestamp;
The time series data to acquisition in the step S2 makees that smooth noise is pretreated to be as follows:
S21:Separate-blas estimation is carried out to primordial time series data;It was found that noise, outlier and uncommon value, are investigated every The range of the domain and data type of a attribute and each attribute acceptable value;
S22:By investigating the value in data fields, by acquiring smoothed data according to case mean value method in branch mailbox method Value carrys out smooth ordered data, by continuous data discretization, obtains pretreated time series data, increases granularity.Such as:Number in case According to for:6,8,10, then the smoothed data value acquired according to case mean value method is 8, in this way each value in the case can by for It is changed to 8.
S3:Symbolization polymerization approximate representation method indicates that pretreatment obtains time series data;
The step S3 is as follows:
S31:Equal length segmentation is carried out to the time series data obtained after step S2 pretreatments, takes each section of average value structure The time series data of Cheng Xin is indicating former Dimension Time Series;
The step S31 includes:
The time series data dimension that step S2 is obtained is n, and gained dimension is N after processing.I-th subsegment mean value can be by following formula It determines:
S32:For the time series data of gained after dimensionality reduction, ordinal number when indicating to obtain this using symbol polymerization approximate representation method According to discretization approximate representation.
The size for determining alphabet first, that is, the species number for defining symbol is α=5, i.e., meets Gauss what step 2 obtained The sequence of distribution is divided into 5 intervals of equal probabilitys according to the size of cut-point, and each section, which corresponds to, indicates a kind of symbol, wherein dividing The relationship of the definition of cutpoint and alphabetical table size is as shown in table 1.Symbol is allocated in the way of from low to high, is then compared The mean value of tract and the size of cut-point, if the tract is expressed as this by the mean value of tract in segmentation section Symbol corresponding to a segmentation section.I.e. value less than " -0.84 " section in, symbolic indication A, " -0.84 " to " - The symbol indicated in 0.25 " section is B, symbol C is corresponded in " -0.25 " to " 0.25 " section, in " 0.25 " to " 0.84 " area Between correspond to symbol be D, correspond to symbol E in the section of section " 0.84 " or more, be followed successively by A, B, C, D, E from below to up.Such as table 1 It is shown:
The alphabetical table size corresponding cut-point from 5 to 10 of table 1
S4:Symbol is polymerize to the result after approximate representation and carries out pattern clustering, the result after progress pattern clustering is used Prefix algorithm forms index.
The step S4 includes:
S41:For time series data Symbolic Representation sequence obtained by step S3, using K mean value pattern clustering methods to S3's As a result it clusters, obtains the character string mode result of a string of discretizations.K object is arbitrarily selected to make first from n data object Each object and these center objects are calculated according to the mean value (i.e. center object) of each clustering object for initial cluster center Euclidean distanceCorresponding object is divided again according to minimum range;It recalculates again every Until the mean value (center object) of a (changing) cluster, so cycle know that each cluster no longer changes.
S42:Based on above-mentioned cluster result, the symbol sebolic addressing of each classification is encoded respectively using prefix tree algorithm, Form index.
It should be understood that the part that this specification does not elaborate belongs to the prior art.
Finally illustrate, the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although with reference to compared with Good embodiment describes the invention in detail, it will be understood by those of ordinary skill in the art that, it can be to the skill of the present invention Art scheme is modified or replaced equivalently, and without departing from the objective and range of the technical program, should all be covered in the present invention Right in.

Claims (5)

1. a kind of abstract index generation method of sequential key assignments type industrial process data, which is characterized in that the method step is such as Under:
S1:Obtain sequential key assignments type industrial process data;
S2:The time series data of acquisition is pre-processed as smooth noise to obtain the time series data with timestamp;
S3:Symbolization polymerization approximate representation method indicates that pretreatment obtains time series data;
S4:Symbol is polymerize to the result after approximate representation and carries out pattern clustering, the result after progress pattern clustering is used into prefix Algorithm forms index.
2. the abstract index generation method of sequential key assignments type industrial process data as described in claim 1, which is characterized in that institute It states the time series data to acquisition in step S2 and makees that smooth noise is pretreated to be as follows:
S21:Separate-blas estimation is carried out to primordial time series data;It was found that noise, outlier and uncommon value, investigate each belong to The range of the domain and data type and each attribute acceptable value of property;
S22:By investigate data fields in value, by branch mailbox method according to case mean value method acquire smoothed data value come Continuous data discretization is obtained pretreated time series data by smooth ordered data, increases granularity.
3. the abstract index generation method of sequential key assignments type industrial process data as described in claim 1, which is characterized in that institute Step S3 is stated to be as follows:
S31:Equal length segmentation is carried out to the time series data obtained after step S2 pretreatments, each section of average value is taken to constitute newly Time series data indicating former Dimension Time Series;
S32:For the time series data of gained after dimensionality reduction, indicate to obtain the time series data using symbol polymerization approximate representation method Discretization approximate representation.
4. the abstract index generation method of sequential key assignments type industrial process data as described in claim 1, which is characterized in that institute Stating step S4 includes:
S41:For time series data Symbolic Representation form obtained by step S3, using K mean value pattern clustering methods to the result of S3 Cluster, obtains the character string mode result of a string of discretizations;
S42:It based on the above results, is encoded using prefix tree algorithm, forms index.
5. the abstract index generation method of sequential key assignments type industrial process data as claimed in claim 3, which is characterized in that institute Stating step S31 includes:
The time series data dimension that step S2 is obtained is n, and gained dimension is N after processing;I-th subsegment mean value can be true by following formula It is fixed:
CN201810270729.9A 2018-03-29 2018-03-29 A kind of abstract index generation method of sequential key assignments type industrial process data Pending CN108470068A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810270729.9A CN108470068A (en) 2018-03-29 2018-03-29 A kind of abstract index generation method of sequential key assignments type industrial process data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810270729.9A CN108470068A (en) 2018-03-29 2018-03-29 A kind of abstract index generation method of sequential key assignments type industrial process data

Publications (1)

Publication Number Publication Date
CN108470068A true CN108470068A (en) 2018-08-31

Family

ID=63262296

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810270729.9A Pending CN108470068A (en) 2018-03-29 2018-03-29 A kind of abstract index generation method of sequential key assignments type industrial process data

Country Status (1)

Country Link
CN (1) CN108470068A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684328A (en) * 2018-12-11 2019-04-26 中国北方车辆研究所 A kind of Dimension Time Series compression and storage method
CN110297832A (en) * 2019-07-01 2019-10-01 联想(北京)有限公司 A kind of time series data storage method and device, time series data querying method and device
CN110955294A (en) * 2019-11-25 2020-04-03 重庆大学 Configurable ordered key value class data simulation generation method and generation device thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103676645A (en) * 2013-12-11 2014-03-26 广东电网公司电力科学研究院 Mining method for association rules in time series data flows
CN104182460A (en) * 2014-07-18 2014-12-03 浙江大学 Time sequence similarity query method based on inverted indexes
CN105744562A (en) * 2016-03-25 2016-07-06 中国地质大学(武汉) Method and system for compressing and reconstructing data of wireless sensor network based on symbolic aggregate approximation
CN106095787A (en) * 2016-05-30 2016-11-09 重庆大学 A kind of Symbolic Representation method of time series data
CN107562865A (en) * 2017-08-30 2018-01-09 哈尔滨工业大学深圳研究生院 Multivariate time series association rule mining method based on Eclat

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103676645A (en) * 2013-12-11 2014-03-26 广东电网公司电力科学研究院 Mining method for association rules in time series data flows
CN104182460A (en) * 2014-07-18 2014-12-03 浙江大学 Time sequence similarity query method based on inverted indexes
CN105744562A (en) * 2016-03-25 2016-07-06 中国地质大学(武汉) Method and system for compressing and reconstructing data of wireless sensor network based on symbolic aggregate approximation
CN106095787A (en) * 2016-05-30 2016-11-09 重庆大学 A kind of Symbolic Representation method of time series data
CN107562865A (en) * 2017-08-30 2018-01-09 哈尔滨工业大学深圳研究生院 Multivariate time series association rule mining method based on Eclat

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
才科扎西等: "《基于前缀树的高效频繁项集挖掘算法》", 《计算机工程》 *
朱明: "《数据挖掘》", 30 November 2008 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684328A (en) * 2018-12-11 2019-04-26 中国北方车辆研究所 A kind of Dimension Time Series compression and storage method
CN109684328B (en) * 2018-12-11 2020-06-16 中国北方车辆研究所 High-dimensional time sequence data compression storage method
CN110297832A (en) * 2019-07-01 2019-10-01 联想(北京)有限公司 A kind of time series data storage method and device, time series data querying method and device
CN110955294A (en) * 2019-11-25 2020-04-03 重庆大学 Configurable ordered key value class data simulation generation method and generation device thereof

Similar Documents

Publication Publication Date Title
CN105574212B (en) A kind of image search method of more index disk hash data structures
CN102902826B (en) A kind of image method for quickly retrieving based on reference picture index
CN104182460B (en) Time Series Similarity querying method based on inverted index
US10303800B2 (en) System and method for optimization of audio fingerprint search
CN108470068A (en) A kind of abstract index generation method of sequential key assignments type industrial process data
CN104572886B (en) The financial time series similarity query method represented based on K line charts
CN105787492B (en) Three value mode texture feature extracting methods of part based on mean value sampling
CN111125469B (en) User clustering method and device of social network and computer equipment
CN102184186A (en) Multi-feature adaptive fusion-based image retrieval method
CN105512143A (en) Method and device for web page classification
Wang et al. The research and realization of vehicle license plate character segmentation and recognition technology
CN104850859A (en) Multi-scale analysis based image feature bag constructing method
CN106778869A (en) A kind of quick accurate nearest neighbour classification algorithm based on reference point
CN104361135A (en) Image search method
Gupta et al. A classification method to classify high dimensional data
Gao et al. A neural network classifier based on prior evolution and iterative approximation used for leaf recognition
CN116561230B (en) Distributed storage and retrieval system based on cloud computing
Nayef et al. Statistical grouping for segmenting symbols parts from line drawings, with application to symbol spotting
Patnaik et al. Clustering of Categorical Data by Assigning Rank through Statistical Approach
Singh et al. Survey on outlier detection in data mining
CN105653567A (en) Method for quickly looking for feature character strings in text sequential data
CN110532867A (en) A kind of facial image clustering method based on Fibonacci method
Yuan et al. A lazy associative classifier for time series
CN104899477A (en) Protein subcellular interval prediction method using bag-of-word model
Hu et al. Feature reduction of multi-scale LBP for texture classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180831

RJ01 Rejection of invention patent application after publication