CN103412864B - A kind of data compression storage method - Google Patents

A kind of data compression storage method Download PDF

Info

Publication number
CN103412864B
CN103412864B CN201310223387.2A CN201310223387A CN103412864B CN 103412864 B CN103412864 B CN 103412864B CN 201310223387 A CN201310223387 A CN 201310223387A CN 103412864 B CN103412864 B CN 103412864B
Authority
CN
China
Prior art keywords
data
algorithm
storage
memory node
initial value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310223387.2A
Other languages
Chinese (zh)
Other versions
CN103412864A (en
Inventor
王志恒
周琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leinas Technology (beijing) Ltd By Share Ltd
Original Assignee
Leinas Technology (beijing) Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leinas Technology (beijing) Ltd By Share Ltd filed Critical Leinas Technology (beijing) Ltd By Share Ltd
Priority to CN201310223387.2A priority Critical patent/CN103412864B/en
Publication of CN103412864A publication Critical patent/CN103412864A/en
Application granted granted Critical
Publication of CN103412864B publication Critical patent/CN103412864B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of data compression storage method, including data genaration algorithm storing step, data definition storage step and data storing steps.The present invention is matched to data according to the algorithm in data markers definition, currency can be compressed after the match is successful, it fails to match then carries out original storage to its value, now, each identical node only needs to storage once just can be with, so cause the space empty that originally will store some duplicate data nodes remaining out, so as to solve the problems, such as that it is huge that memory space takes, reduce hardware deployment quantity, miniaturization deployed environment, save lower deployment cost, the present invention can be realized in real time, frequent, the compression storing data of big data quantity, greatly reduce the problem of memory space occupancy.

Description

A kind of data compression storage method
Technical field
The present invention relates to a kind of data compression storage method, more particularly to it is a kind of suitable for real-time, frequent, big data quantity Data compression storage method, belong to technical field of data processing.
Background technology
At present, data storage realizes the storage to each data point, and this mode will take very big memory space.Cause This, compression storing data is current data storage target to be reached and the inevitable outcome realized.The present invention is according to data mark Algorithm in note definition is matched to data.Currency can be compressed after the match is successful.It fails to match then enters to its value Row original storage.Now, each identical node only need to storage once just can be with, so that originally to store some repeat numbers According to node space empty more than out.The present invention can carry out effective data pressure to the data of real-time, frequent, big data quantity Contracting, in order to carry out efficient storage to which.So as to solve the problems, such as that it is huge that memory space takes, hardware deployment quantity is reduced, Miniaturization deployed environment, saves lower deployment cost.
The content of the invention
Present invention solves the technical problem that:Overcome the deficiencies in the prior art, there is provided a kind of data compression storage method, can Effective data compression is carried out with the data to real-time, frequent, big data quantity, in order to efficient storage be carried out to which.
The present invention technical solution be:A kind of data compression storage method, including:Data genaration algorithm storage step Suddenly, data definition storage step and data storing steps;
Data algorithm generation step:Data algorithm Z=F (U, V) is defined, U=f (x, y), x are input data set, and y is x Correlation calculation result value, U is the initial value determined by function f (x, y), and Z is the nodal value that function F (U, V) determines, V is to pass The real number of increasing or time data mark;
Data definition storage step:Data to gathering carry out Data Identification, judge through Data Identification data M whether Meet data algorithm definition, if meeting data algorithm and defining data M are entered with line algorithm mark, is not otherwise identified;
Data storing steps:
(1) proceed by data storage;
(2) data N to being input into judge that the algorithm of searching data N identifies whether exist, if algorithm mark is not deposited Data storage is directly being carried out then;If algorithm mark is present in F (U, V), then last memory node is inquired about;
(3) if last memory node is present, judge whether currency meets the algorithm of memory node;If finally deposited Storage node is not present, then generate data memory node, indicate algorithm initial value and serial number;
(4) the algorithm F (U, V) and initial value of last memory node if currency meets the algorithm of memory node, are taken out U, carries out progressive, i.e. set x progressive, the foundation algorithm F that carry out position in U=f (x, y) to input data set position in initial value U (U, V) is recalculated and is drawn currency S, if currency S and node original value N are unequal, regenerates data storage section Point, indicates algorithm initial value and serial number;Otherwise data memory node serial number increases;
(5) the serial number V of currently stored node is incremented by, then carries out data storage;
(6) by U=f (x, y) and gather current x values and calculate initial value U, and memory node value is N, algorithm F (U, V), first The memory node of knowledge value U and Data Identification V, then carries out data storage;
(7) data storage terminates.
Present invention advantage compared with prior art is:The present invention enters to data according to the algorithm in data markers definition Row matching, can be compressed to currency after the match is successful, and it fails to match then carries out original storage to its value, now, each phase With node only need to storage once just can be with, so that the space empty that originally will store some duplicate data nodes is remaining out, So as to solve the problems, such as that it is huge that memory space takes, hardware deployment quantity is reduced, miniaturization deployed environment, saving are deployed to This, the present invention can realize the real-time, compression storing data of frequent, big data quantity, greatly reduce asking for memory space occupancy Topic.
Description of the drawings
Fig. 1 is the flow chart of data definition storage step of the present invention;
Fig. 2 is the flow chart of data storing steps of the present invention;
Fig. 3 is the comparison chart of Real-time Collection point and mark point.
Specific embodiment
The present invention will be further described in detail with specific embodiment below in conjunction with the accompanying drawings:
As shown in figure 1, the realization of the present invention includes data genaration algorithm storing step, data definition storage step and data Storing step;
Data algorithm generation step:Data algorithm Z=F (U, V) is defined, U=f (x, y), x are input data set, and y is x Correlation calculation result value, U is the initial value determined by function f (x, y), and Z is the nodal value that function F (U, V) determines, V is to pass The real number of increasing or time data mark;
Data definition storage step:Data to gathering carry out Data Identification, judge through Data Identification data M whether Meet data algorithm definition, if meeting data algorithm and defining data M are entered with line algorithm mark, is not otherwise identified;
Data storing steps:
(1) proceed by data storage;
(2) data N to being input into judge that the algorithm of searching data N identifies whether exist, if algorithm mark is not deposited Data storage is directly being carried out then;If algorithm mark is present in F (U, V), then last memory node is inquired about;
(3) if last memory node is present, judge whether currency meets the algorithm of memory node;If finally deposited Storage node is not present, then generate data memory node, indicate algorithm initial value and serial number;
(4) the algorithm F (U, V) and initial value of last memory node if currency meets the algorithm of memory node, are taken out U, carries out progressive, i.e. set x progressive, the foundation algorithm F that carry out position in U=f (x, y) to input data set position in initial value U (U, V) is recalculated and is drawn currency S, if currency S and node original value N are unequal, regenerates data storage section Point, indicates algorithm initial value and serial number;Otherwise data memory node serial number increases;
(5) the serial number V of currently stored node is incremented by, then carries out data storage;
(6) by U=f (x, y) and gather current x values and calculate initial value U, and memory node value is N, algorithm F (U, V), first The memory node of knowledge value U and Data Identification V, then carries out data storage;
(7) data storage terminates.
Embodiment:
During long term test, the discovery signal of telecommunication changes over time such rule:Within 0-30 times second with The incremental of time x meets following (1) function expressions:
Y=sinx+1 x ∈ [0,30] (1)
The incremental of x meets following (2) function expressions over time within 45-50 times second
Y=sinx+2 x ∈ [45,50] (2)
Compression algorithm can be considered as in this case carries out data storage.
Step is as follows:
The first step:Algorithm Z=F (U, V), U=f (x, y) are defined,
Now, above-mentioned (1) expression formula can be converted into:
So:Define algorithm 1:f1(x,y)
So, corresponding to each value in V, a Z value can be produced, constitutes a new set:
Z=sin (0)+1, sin (1)+1, sin (2)+1, sin (3)+1,
sin(29)+1,sin(30)+1}
In the same manner, (2) expression formula can be converted into:
So:Define algorithm 2:f2(x,y)
So, corresponding to each value in V, a Z value can be produced, constitutes a new set:
Z=sin (45)+2, sin (46)+2, sin (47)+2, sin (48)+2,
sin(49)+2,sin(50)+2}
Second step:Mark data points.
Corresponded according to the numerical value of the algorithm of first step definition, set V and set Z, carry out data point markers.For example calculate Method 1:f1(x, y) then produces 31 data points.
{(0,sin(0)+1),
(1,sin(1)+1),
(2,sin(2)+1),
(3,sin(3)+1),
(29,sin(29)+1),
(30,sin(30)+1)}
Then this 31 points are labeled f1(x,y);
For algorithm 2:f2(x, y) then produces 6 data points
{(45,sin(45)+1),
(46,sin(46)+1),
(47,sin(47)+1),
(48,sin(48)+1),
(49,sin(49)+1),
(50,sin(50)+1)}
Then this 6 data points are marked as f2(x,y).
The data of actual acquisition also have other not random hash points in addition to the point in algorithm 1 and algorithm 2.Then these Data do not do algorithm tag.
3rd step, gathered data are simultaneously stored.
Example in undertaking, if equipment gather 60 seconds to data as shown in figure 3, blueness "-*-" is to marked f1(x,y) Point, green "-*-" is to marked f2The point of (x, y), peach point are the points of collection.During the point x=10 for wherein gathering, Collection y=4.
Data storage:From the beginning of x=0;
1. get the data of first point i.e. (0,1) after, by the labelling at second step midpoint, confirm whether this point is labelling The point of algorithm, if not the point that marked algorithm, is then stored in normal way, if marked algorithm, by marked The data point of algorithm is stored.
In this example, this data point is the point that marked algorithm 1.So (0,1) this point will be according to by marked algorithm Data point stored.The memory node of the storage class of algorithm 1 is begun look for, this data point in this instance does not have Memory node is found, so generating algorithm 1:f1First memory node of (x, y), and V is initialized as into V={ 0 }, U is initial Turn to (0)+1 i.e. U=1 of U=sin, now the point of x=0, i.e., (0,1) just save.
2. subsequently, second point is received, is compared according to the step in 1., confirm whether this data point marked The point (shown in Fig. 3) of algorithm 1.So wanting the memory node of lookup algorithm 1, now node is created when first point is stored , it is only necessary to refreshing V={ 1 } just can be with.If no algorithm 1:f1The memory node of (x, y), then will be according to the side in 1. Formula creates the memory node of this algorithm.
3. after storing in the same way, to x=9, in this step, V has refreshed as V={ 9 } each point.
4. in x=10, the point y=4 for now gathering, and the point of labelling is (10, sin (10)+1), so this point is not Meet storage algorithmic rule, then stored as general point.
5., in x=11, confirm that this point is the data point for being identified with algorithm 1.So the last memory node of lookup algorithm 1 It is the memory node for 1. walking establishment in this example, last time storage is 3. walked the, U=1, V={ 9 } now.But V is refreshed into after V={ 10 } according to algorithm, calculate functional value [sin (10)+1], and node original value for 11, [sin (11)+ 1] }, two values are unequal, so this point directly can not refresh V in the first memory node being stored.Now, create f1(x,y) New memory node, i.e., second memory node, and initialize V={ 11 }, U=sin (11)+1 now just has two algorithms 1 memory node.
6. the step of in x=12 according to x=11, carry out judging to confirm that this algorithm is the data point for being identified with algorithm 1, connect The last memory node for getting off to make a look up this node is the memory node for 5. walking establishment, is stored simultaneously in this node V-value is refreshed as V={ 12 }.
7. each point is stored successively to x=30, and V is V={ 30 } also with refreshing;
8. all no marking algorithm of the data point for being gathered between x=31 to x=44, so being deposited as general point Storage.
9. the data point for being gathered between x=45 to x=50 is the data point for being identified with algorithm 2.Storage mode according to With algorithm 1-f1The mode of (x, y) storage mode is similar to.
10. entirely common data point after x=50, carries out the storage of general data.
According to this storage mode:
Algorithm 1:f1(x, y) has two memory nodes, and first memory node is labeled as U=sin (0)+1, V={ 9 }, the Two memory nodes are U=sin (11)+1, V={ 30 }.
Algorithm 2:f2(x, y) has a memory node, and first storage point is designated U=sin (45)+2, V={ 50 }.
In addition point is all stored as general point.
The non-detailed description of the present invention is known to the skilled person technology.

Claims (1)

1. a kind of data compression storage method, it is characterised in that include:Data genaration algorithm storing step, data definition storage step Rapid and data storing steps;
Data algorithm generation step:Data algorithm Z=F (U, V), U=f (x, y) are defined, x is input data set, and y is the phase of x Result of calculation value is closed, U is the initial value determined by function f (x, y), and Z is the nodal value that function F (U, V) determines, V is incremental Real number or time data mark;
Data definition storage step:Data to gathering carry out Data Identification, judge whether meet through data M of Data Identification Data algorithm is defined, and if meeting data algorithm and defining data M is entered with line algorithm mark, is not otherwise identified;
Data storing steps:
(1) proceed by data storage;
(2) data N to being input into judge that the algorithm of searching data N identifies whether exist, if algorithm mark is not present, Data storage is directly carried out then;If algorithm mark is present in F (U, V), then last memory node is inquired about;
(3) if last memory node is present, judge whether currency meets the algorithm of memory node;If last storage section Point is not present, then generate data memory node, indicate algorithm initial value and serial number;
(4) if currency meets the algorithm of memory node, the algorithm F (U, V) and initial value U of last memory node are taken out, it is right Input data set position in initial value U carries out progressive, i.e. to carry out position progressive for set x in U=f (x, y), foundation algorithm F (U, V) recalculate and draw currency S, if currency S and node original value N are unequal, regenerate data memory node, Indicate algorithm initial value and serial number;Otherwise data memory node serial number increases;
(5) the serial number V of currently stored node is incremented by, then carries out data storage;
(6) initial value U is calculated by the current x values of U=f (x, y) and set, and memory node value is N, algorithm F (U, V), initial value U With the memory node of Data Identification V, data storage is then carried out;
(7) data storage terminates.
CN201310223387.2A 2013-06-06 2013-06-06 A kind of data compression storage method Active CN103412864B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310223387.2A CN103412864B (en) 2013-06-06 2013-06-06 A kind of data compression storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310223387.2A CN103412864B (en) 2013-06-06 2013-06-06 A kind of data compression storage method

Publications (2)

Publication Number Publication Date
CN103412864A CN103412864A (en) 2013-11-27
CN103412864B true CN103412864B (en) 2017-04-05

Family

ID=49605876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310223387.2A Active CN103412864B (en) 2013-06-06 2013-06-06 A kind of data compression storage method

Country Status (1)

Country Link
CN (1) CN103412864B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112260694B (en) * 2020-09-21 2022-01-11 广州中望龙腾软件股份有限公司 Data compression method of simulation file
CN112783056B (en) * 2021-01-04 2022-09-23 潍柴动力股份有限公司 Data programming method, device and equipment of ECU and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1908932A (en) * 2005-08-05 2007-02-07 北京人大金仓信息技术有限公司 Huge amount of data compacting storage method and implementation apparatus therefor
CN101882141A (en) * 2009-05-08 2010-11-10 北京众志和达信息技术有限公司 Method and system for implementing repeated data deletion
CN102831222A (en) * 2012-08-24 2012-12-19 华中科技大学 Differential compression method based on data de-duplication

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8407193B2 (en) * 2010-01-27 2013-03-26 International Business Machines Corporation Data deduplication for streaming sequential data storage applications

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1908932A (en) * 2005-08-05 2007-02-07 北京人大金仓信息技术有限公司 Huge amount of data compacting storage method and implementation apparatus therefor
CN101882141A (en) * 2009-05-08 2010-11-10 北京众志和达信息技术有限公司 Method and system for implementing repeated data deletion
CN102831222A (en) * 2012-08-24 2012-12-19 华中科技大学 Differential compression method based on data de-duplication

Also Published As

Publication number Publication date
CN103412864A (en) 2013-11-27

Similar Documents

Publication Publication Date Title
CN103873371B (en) A kind of name route Rapid matching lookup method and device
CN111462282B (en) Scene graph generation method
CN104484673B (en) The Supplementing Data method of real-time stream application of pattern recognition
CN104904167B (en) For lookup of the high-performance based on Hash of packet transaction in communication network
CN103412864B (en) A kind of data compression storage method
CN102420771B (en) Method for increasing concurrent transmission control protocol (TCP) connection speed in high-speed network environment
CN108418727A (en) A kind of method and system of detection network equipment
CN105847145A (en) Important node searching method based on network diameters
CN107071800B (en) A kind of cluster wireless sensor network method of data capture and device
WO2023045879A1 (en) Memory allocation method, memory allocation apparatus, electronic device, and readable storage medium
CN103226858B (en) The processing method and processing device of Bluetooth pairing information
Ghesmoune et al. G-stream: Growing neural gas over data stream
CN111340727A (en) Abnormal flow detection method based on GBR image
Ghesmoune et al. Clustering over data streams based on growing neural gas
CN105550208A (en) Similarity storage design method based on spectral hashing
CN115130617B (en) Detection method for continuous increase of self-adaptive satellite data mode
CN107124410A (en) Network safety situation feature clustering method based on machine deep learning
CN114048328B (en) Knowledge-graph link prediction method and system based on conversion hypothesis and message transmission
CN106294348B (en) For the real-time sort method and device of real-time report data
CN104504714B (en) The detection method of the common obvious object of image
CN111860222B (en) Video behavior recognition method, system, computer device and storage medium based on dense-segmented frame sampling
CN110020087B (en) Distributed PageRank acceleration method based on similarity estimation
CN105354243A (en) Merge clustering-based parallel frequent probability subgraph searching method
CN118277843B (en) Multi-mode network traffic classification method, device and storage medium
KR100459767B1 (en) Incursion detection system using the hybrid neural network and incursion dectection method using the same

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100081 Shenzhou building, South Avenue, Haidian District, Beijing, 402, Zhongguancun

Applicant after: Leinas Technology (Beijing) Limited by Share Ltd

Address before: 100081 Shenzhou building, South Avenue, Haidian District, Beijing, 402, Zhongguancun

Applicant before: China Spacesat Co., Ltd.

COR Change of bibliographic data
GR01 Patent grant
GR01 Patent grant