CN106452452A - Full-pulse data lossless compression method based on K-means clustering - Google Patents
Full-pulse data lossless compression method based on K-means clustering Download PDFInfo
- Publication number
- CN106452452A CN106452452A CN201610809393.XA CN201610809393A CN106452452A CN 106452452 A CN106452452 A CN 106452452A CN 201610809393 A CN201610809393 A CN 201610809393A CN 106452452 A CN106452452 A CN 106452452A
- Authority
- CN
- China
- Prior art keywords
- cluster
- data
- class
- point
- compression method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
- H03M7/4006—Conversion to or from arithmetic code
- H03M7/4012—Binary arithmetic codes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Abstract
The invention discloses a full-pulse data lossless compression method based on K-means clustering, and belongs to the field of data compression. The technical scheme adopted in the invention is that the data are subjected to K-means clustering processing at first, points with higher data similarity form the same cluster, values of center points in each cluster are kept, original data are replaced with differences between the data points and the center points, and the differences are much smaller than the original data after processing; and then, the differences are subjected to run-length coding at first, and then subjected to range coding. The compression method is good in compression effect and high in reliability, and can better compress the full-pulse data without loss.
Description
Technical field
The invention belongs to field of data compression is and in particular to arrive a kind of data compression method based on K-means cluster, real
The now lossless compress to electronic countermeasure field overall pulse data.
Background technology
In modern military, electronic countermeasure plays vital effect in strategic attacking and defending.Electronic countermeasure is that enemy and we are double
Side takes various electronics measures and action, in order to weaken or to destroy other side's electronic equipment effective utilization, to ensure one's own side's electronic equipment
Play a kind of mode of operation of efficiency.Countermeasure search is the important component part of electronic countermeasure, and excellent electronic countermeasure is detectd
The technology of examining is the key point grasped the opportunity in advance in electronic warfare.
Overall pulse data is that the key characterization parameter being comprised reconnaissance plane medium-frequency pulse is stored with binary system
Plant data type.The feature of overall pulse data is:(1) each data point comprises five parameters, is pulse arrival time respectively
(TOA), pulse width (PW), pulse amplitude (PA), pulse carrier frequency (CF) and pulse angle of arrival (DOA);(2) data is with binary system
Form represents, each parameter takies 4 bytes;(3) overall pulse data volume is very big, is not easy to store and transmits;(4) overall pulse number
According to the characteristic ginseng value due to comprising a large amount of repetition pulses from same radar emission source, these data have stronger correlation
Property, therefore there is substantial amounts of redundancy.Because overall pulse data volume is greatly it is necessary to be compressed to it, to reduce the size of data,
It is easy to store and transmit;And because there is redundancy, it is possible to being compressed in data.Overall pulse packet intermediate frequency containing reconnaissance plane arteries and veins
The key feature information of punching, compression and decompression procedure can not lose any information it is therefore desirable to adopt lossless compress.
Cluster analyses, as a statistical branch, are mainly used in Data Mining.Clustering algorithm includes K-means
Cluster, FCM cluster, Canopy cluster etc..K-means clustering algorithm is easy to describe, and has speed soon and is applied to process greatly
The advantages of scale data.Present invention firstly provides K-means clustering algorithm is used for the lossless compress of overall pulse data.Overall pulse
Data record is from the not pulse data parameter in the same time of multiple emission sources.From multiple pulses of same emission source, its
Characteristic ginseng value difference is less, has very strong dependency, is attributed in same class cluster by K-means cluster.To each
Data in class cluster, substitutes former data value with data point relative to the difference of central point, and numerical value is relatively compared with initial value for the difference obtaining
Little.After differential coding, the information bit that output code flow takies is less, thus reaching the purpose of data compression.
Content of the invention
At present in field of data compression, general lossless compression algorithm (such as LZ series coding) is generally directed to text data
It is compressed, and the data source compression effectiveness to binary format bad.The present invention propose a kind of based on K-means cluster
Overall pulse data lossless compression method, the method compression effectiveness is good, and reliability is high, preferably can carry out overall pulse data lossless
Compression.
The technical solution used in the present invention is first data to be done K-means clustering processing, the larger point shape of data similarity
Become same class cluster, each class cluster retains the numerical value of central point, and replaces former data with the difference of data point and central point, place
After reason, difference can be more much smaller than former data value.Then difference is first done run-length encoding, then do Interval Coding.Due to the code after coding
The information bit that stream takies is less, can obtain preferable compression effectiveness.
For achieving the above object, the invention mainly includes steps:
Step one:K-means cluster is carried out to the overall pulse data comprising reconnaissance plane intermediate frequency data critical parameter information, obtains
Central point to multiple class clusters and each class cluster.
K-means cluster needs to specify in advance cluster numbers K.Generally, cluster numbers K value existsBetween, its
Middle n is the number of samples of data set.In practical application, real cluster numbers are unknown.Experience have shown that, cluster numbers K are more than true
In the case of value, compression effectiveness change is little, and when cluster numbers K are less than actual value, compression effectiveness is poor.In general electronic warfare,
Target number (i.e. cluster numbers) is within 20, so the present invention selects cluster numbersThis kind of K value system of selection when
Between have good behaviour on complexity and compression effectiveness.
Step 2:Internal for each class cluster all data points are made the difference with such cluster central point, obtains difference data.
Step 3:Difference is done run-length encoding.
Step 4:Data after run-length encoding is done Interval Coding.
Step 5:Code stream after Interval Coding is exported together with central value and obtains compression result.
Following beneficial effect is obtained in that using technical scheme proposed by the present invention:The present invention can enter to overall pulse data
Row lossless compress, and 2 times about of compression ratio is obtained with less time overhead.Data is carried out by the present invention with respect to directly
Coding and for not carrying out K-means cluster preprocessing, it is possible to obtain about 20% about compression multiple lifting.This is because this
What invention proposed has following characteristics based on the lossless compression method of K-means cluster:1) K-means cluster has relatively to high dimensional data
Good dependency, and calculating speed is very fast;2) using predetermined clusters number in clusteringProcessing method, with
Less time overhead obtains preferable compression effectiveness;3) difference data is carried out encoding and directly phase is encoded to initial data
Ratio is more beneficial for improving code efficiency.
Brief description
Fig. 1 is the overall pulse lossless date-compress flow chart based on K-means cluster.
Fig. 2 is K-means clustering algorithm flow chart.
Fig. 3 is K-means Clustering Effect schematic diagram.
Specific embodiment
For making the object, technical solutions and advantages of the present invention definitely, full arteries and veins based on K-means cluster is explained in detail below
Rush the specific implementation step of lossless date-compress algorithm.
As shown in figure 1, being comprised the following steps based on the overall pulse data lossless compression method of K-means cluster:
Step one:K-means cluster conversion is carried out to the overall pulse data of input, data source X is reassembled into one is
Set C={ the C of row class cluster1,C2... ..., CK}.Wherein, C1∪C2∪…∪CK=X,i≠j;I, j=1,
2,……,K.
The idiographic flow of K-means cluster is as shown in Figure 2:
1) input data source X comprises n data point { x1, x2..., xn, each data point is to comprise p characteristic parameter
P dimension data.
2) randomly select K data point as the initial cluster center of each class cluster, calculate each data point to K respectively
Poly-
The Euclidean distance at class center, if the distance meeting certain data point and a certain Lei Cu center be less than its with every other
Cluster
The distance at center, then be divided into this data point in the class cluster representated by this cluster centre, obtains initial K and gathers
Class divides;
3) recalculate K new cluster centre:
Wherein, μiRepresent the central point of i-th class cluster, NiRepresent the data point number in i-th class cluster, xijRepresent i-th
J-th data point in individual class cluster;
4) calculate the Euclidean distance at each class number of clusters strong point and its central point, obtain the total distance of all kinds of clusters and (namely partially
Difference) J.
Specific formula for calculation is as follows:
5) continuous repeat step 3) and 4) calculating, convergence is judged:Cluster target be make all kinds of
Cluster is total
Distance and, that is, deviation J reaches minimum.If the change of deviation J tried to achieve after iterating is less than a certain pre-
If
Accuracy value ε, (assumes ε=10-6), then show algorithmic statement, calculating terminates, otherwise return 2) recalculate.
K-means Clustering Effect schematic diagram is as shown in Figure 3.Generate 150 2-D datas at random, every 50 random data are
One class, average is respectively [- 1, -1], [1,1], [1, -1], and variance is [1,1].After K-means clustering processing, data
Accurately it is divided three classes, with its average closely, Clustering Effect is preferable for the central point of every class.
Step 2:The data processing through K-means clustering algorithm forms K class cluster, for each class cluster, seeks data
Put the difference with central point.
Step 3:Each difference data put is regarded as byte stream, run-length encoding is carried out to it.
Step 4:Data after run-length encoding is carried out Interval Coding.
In order to obtain preferable compression effectiveness, present invention employs Interval Coding algorithm.Interval Coding is by inputted number
According to being mapped in a certain integer range, final output one belongs to this interval integer as exports coding.Interval Coding can
Realize the compression ratio more taller than one this compression upper limit of one symbol of huffman coding.
Interval Coding mainly includes the following steps that:
1) in units of byte, the data after run-length encoding is read out, each byte data regards a symbol as, system
The species number N of meter symbol, as the initial value of the total frequency T of all symbols.
2) set an initial integer range [L, H], and initialize the Lower and upper bounds in interval:Upper bound H=0xf0000000,
Lower bound L=0x00000000, then initially interval scope R=0xf0000000.In addition, setting interval normalized minimum zone
Rmin=0x00010000.
3) initial mapping calculating corresponding to distinct symbols is interval.According to the current frequency f of a certain symbol Ss, accumulated frequence
FsAnd the total frequency T of all symbols, calculate initial mapping interval [L', H'] of symbol S.
The accumulated frequence F of symbol SsRefer to that value of symbol is less than other symbol (x of S<S frequency summation), available formula
(3) calculated:
Lower and upper bounds H', L' of initial mapping interval [L', H'] and scope R' are specifically shown in formula (4), (5), (6).Wherein,
Div represents divides exactly computing.
R'=RdivT × fs(4)
L'=L+RdivT × Fs(5)
H'=L'+R'-1=L+RdivT × (Fs+fs)-1 (6)
4), when the data containing multiple symbols being encoded, its map section is constantly updated according to the symbol of current input
Between.Updating the calculating that principle is next incoming symbol mapping range is that mapping range based on last symbol is carried out.Concrete meter
Calculate and still adopt formula (4) to (6), the parameter in formula is updated using adaptive method.I.e. after current input symbol S,
Its frequency fsPlus 1, accumulated frequence FsAlso correspondingly carry out calculating updating with the total frequency T of all symbols.According to the frequency after updating
Degree fs, accumulated frequence FsAnd the total frequency T of all symbols, the mapping range [L', H'] of the symbol S after being updated.
5) mapping range scope R' after updating<Rmin(RminRepresent smallest interval scope) or compared in units of byte
Lower and upper bounds between new district, when the upper byte of bound is equal, removal identical upper byte is as output code flow and right
Interval carries out normalization process.
6) according to above step, all input datas are encoded.During end-of-encode, all of in removal mapping range
Position is as output code flow, and saves as binary file.
Step 5:The binary file being formed after Interval Coding is saved as output file together with central value.
Application example:
From one section of overall pulse data as sample, size is 3721KB, data point number n=10000, each data point
Comprise 5 characteristic parameters.Data source is 6 classes, and cluster numbers K elect 50 as, and after K-means cluster compression, size is 1868KB,
Compression ratio is 50.2%.The lossless compress effect of overall pulse data and specific data sample have substantial connection, to some full arteries and veins
Rush data, compression ratio can reach 30% about.
Method proposed by the present invention is not limited to the example described in specific embodiment, and those skilled in the art are according to this
Bright technical scheme draws other embodiments, as long as carrying out lossless pressure using K-means cluster conversion to overall pulse data
The algorithm of contracting, including the device realizing corresponding function, similarly belongs to the innovation scope of the present invention, needs to be protected.
Claims (3)
1. a kind of overall pulse data lossless compression method based on K-means cluster, comprises the following steps:
Step one:K-means cluster is carried out to the overall pulse data comprising reconnaissance plane intermediate frequency data critical parameter information, obtains many
Individual class cluster and the central point of each class cluster;
Step 2:Internal for each class cluster all data points are made the difference with such cluster central point, obtains difference data;
Step 3:Difference is done run-length encoding;
Step 4:Data after run-length encoding is done Interval Coding;
Step 5:Code stream after Interval Coding is exported together with central value and obtains compression result.
2. a kind of overall pulse data lossless compression method based on K-means cluster as claimed in claim 1, its feature exists
In:In step one, K-means cluster comprises the following steps:
1) input data source X comprises n data point { x1, x2..., xn, each data point is the p dimension comprising p characteristic parameter
According to;
2) randomly select K data point as the initial cluster center of each class cluster, K value existsBetween;Calculate respectively
Each data point is to the Euclidean distance of K cluster centre, if the distance meeting certain data point with a certain Lei Cu center is less than it
With the distance of every other cluster centre, then this data point is divided in the class cluster representated by this cluster centre, obtains just
K clustering of beginning;
3) recalculate K new cluster centre:
Wherein, μiRepresent the central point of i-th class cluster, NiRepresent the data point number in i-th class cluster, xijRepresent i-th class cluster
In j-th data point;
4) calculate the Euclidean distance at each class number of clusters strong point and its central point, obtain the total distance of all kinds of clusters and namely deviation J, tool
Body computing formula is as follows:
Constantly repeat step 3) and 4) calculating, convergence is judged:The target of cluster makes all kinds of clusters total
Distance is with that is, deviation J reaches minimum;If the change of deviation J tried to achieve after iterating is less than default accuracy value ε, table
Bright algorithmic statement, calculating terminates, and otherwise returns 2) recalculate.
3. a kind of overall pulse data lossless compression method based on K-means cluster as claimed in claim 2, its feature exists
In:Cluster numbers
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610809393.XA CN106452452A (en) | 2016-09-08 | 2016-09-08 | Full-pulse data lossless compression method based on K-means clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610809393.XA CN106452452A (en) | 2016-09-08 | 2016-09-08 | Full-pulse data lossless compression method based on K-means clustering |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106452452A true CN106452452A (en) | 2017-02-22 |
Family
ID=58165400
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610809393.XA Pending CN106452452A (en) | 2016-09-08 | 2016-09-08 | Full-pulse data lossless compression method based on K-means clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106452452A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108062376A (en) * | 2017-12-12 | 2018-05-22 | 清华大学 | A kind of Time Series Compression storage method and system based on similar operating condition |
CN109799483A (en) * | 2019-01-25 | 2019-05-24 | 中国人民解放军空军研究院战略预警研究所 | A kind of data processing method and device |
CN109816029A (en) * | 2019-01-30 | 2019-05-28 | 重庆邮电大学 | High-order clustering algorithm based on military operations chain |
CN111914923A (en) * | 2020-07-28 | 2020-11-10 | 同济大学 | Target distributed identification method based on clustering feature extraction |
CN115622571A (en) * | 2022-12-16 | 2023-01-17 | 电子科技大学 | Radar target identification method based on data processing |
CN116582133A (en) * | 2023-07-12 | 2023-08-11 | 东莞市联睿光电科技有限公司 | Intelligent management system for data in transformer production process |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7349914B1 (en) * | 2004-05-04 | 2008-03-25 | Ncr Corp. | Method and apparatus to cluster binary data transactions |
CN101894135A (en) * | 2009-06-15 | 2010-11-24 | 复旦大学 | Method for compressing and storing GPS data based on route clustering |
CN103678500A (en) * | 2013-11-18 | 2014-03-26 | 南京邮电大学 | Data mining improved type K mean value clustering method based on linear discriminant analysis |
CN104506752A (en) * | 2015-01-06 | 2015-04-08 | 河海大学常州校区 | Similar image compression method based on residual compression sensing |
CN104883558A (en) * | 2015-06-05 | 2015-09-02 | 太原科技大学 | K-means clustering based depth image encoding method |
-
2016
- 2016-09-08 CN CN201610809393.XA patent/CN106452452A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7349914B1 (en) * | 2004-05-04 | 2008-03-25 | Ncr Corp. | Method and apparatus to cluster binary data transactions |
CN101894135A (en) * | 2009-06-15 | 2010-11-24 | 复旦大学 | Method for compressing and storing GPS data based on route clustering |
CN103678500A (en) * | 2013-11-18 | 2014-03-26 | 南京邮电大学 | Data mining improved type K mean value clustering method based on linear discriminant analysis |
CN104506752A (en) * | 2015-01-06 | 2015-04-08 | 河海大学常州校区 | Similar image compression method based on residual compression sensing |
CN104883558A (en) * | 2015-06-05 | 2015-09-02 | 太原科技大学 | K-means clustering based depth image encoding method |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108062376A (en) * | 2017-12-12 | 2018-05-22 | 清华大学 | A kind of Time Series Compression storage method and system based on similar operating condition |
CN109799483A (en) * | 2019-01-25 | 2019-05-24 | 中国人民解放军空军研究院战略预警研究所 | A kind of data processing method and device |
CN109816029A (en) * | 2019-01-30 | 2019-05-28 | 重庆邮电大学 | High-order clustering algorithm based on military operations chain |
CN109816029B (en) * | 2019-01-30 | 2023-12-19 | 重庆邮电大学 | High-order clustering division algorithm based on military operation chain |
CN111914923A (en) * | 2020-07-28 | 2020-11-10 | 同济大学 | Target distributed identification method based on clustering feature extraction |
CN111914923B (en) * | 2020-07-28 | 2022-11-18 | 同济大学 | Target distributed identification method based on clustering feature extraction |
CN115622571A (en) * | 2022-12-16 | 2023-01-17 | 电子科技大学 | Radar target identification method based on data processing |
CN116582133A (en) * | 2023-07-12 | 2023-08-11 | 东莞市联睿光电科技有限公司 | Intelligent management system for data in transformer production process |
CN116582133B (en) * | 2023-07-12 | 2024-02-23 | 东莞市联睿光电科技有限公司 | Intelligent management system for data in transformer production process |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106452452A (en) | Full-pulse data lossless compression method based on K-means clustering | |
CN102694625B (en) | Polarization code decoding method for cyclic redundancy check assistance | |
CN112953550B (en) | Data compression method, electronic device and storage medium | |
CN105512289A (en) | Image retrieval method based on deep learning and Hash | |
Vasuki et al. | A review of vector quantization techniques | |
CN104348490A (en) | Combined data compression algorithm based on effect optimization | |
Roychowdhury | Quantization and centroidal Voronoi tessellations for probability measures on dyadic Cantor sets | |
CN113258934A (en) | Data compression method, system and equipment | |
CN107273471A (en) | A kind of binary electric power time series data index structuring method based on Geohash | |
Yang et al. | One-dimensional deep attention convolution network (ODACN) for signals classification | |
CN116170027B (en) | Data management system and processing method for poison detection equipment | |
CN101099669A (en) | Electrocardiogram data compression method and decoding method based on optimum time frequency space structure code | |
CN107947803A (en) | A kind of method for rapidly decoding of polarization code | |
CN109075805A (en) | Realize the device and method of polarization code | |
CN113759323A (en) | Signal sorting method and device based on improved K-Means combined convolution self-encoder | |
CN114665884B (en) | Time sequence database self-adaptive lossy compression method, system and medium | |
CN108023597A (en) | A kind of reliability of numerical control system data compression method | |
Huang et al. | Latency reduced method for modified successive cancellation decoding of polar codes | |
CN105391455A (en) | Return-to-zero Turbo code starting point and depth blind identification method | |
CN115567609B (en) | Communication method of Internet of things for boiler | |
CN115622571B (en) | Radar target identification method based on data processing | |
CN102571101A (en) | Transmission line malfunction travelling wave data compression method | |
CN115883301A (en) | Signal modulation classification model based on sample recall increment learning and learning method | |
CN111797991A (en) | Deep network model compression system, method and device | |
CN108259515A (en) | A kind of lossless source compression method suitable for transmission link under Bandwidth-Constrained |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170222 |
|
WD01 | Invention patent application deemed withdrawn after publication |