CN111506781A - Method, system, terminal device and readable storage medium for greatly compressing volume of database - Google Patents

Method, system, terminal device and readable storage medium for greatly compressing volume of database Download PDF

Info

Publication number
CN111506781A
CN111506781A CN202010318025.1A CN202010318025A CN111506781A CN 111506781 A CN111506781 A CN 111506781A CN 202010318025 A CN202010318025 A CN 202010318025A CN 111506781 A CN111506781 A CN 111506781A
Authority
CN
China
Prior art keywords
database
word frequency
volume
optimal tree
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010318025.1A
Other languages
Chinese (zh)
Inventor
胡建伟
颜锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Innogence Technology Co Ltd
Original Assignee
Sichuan Innogence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Innogence Technology Co Ltd filed Critical Sichuan Innogence Technology Co Ltd
Priority to CN202010318025.1A priority Critical patent/CN111506781A/en
Publication of CN111506781A publication Critical patent/CN111506781A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a system, a terminal device and a readable storage medium for greatly compressing the volume of a database, which comprises the following steps: s1, traversing all trees, and extracting all node names and leaf node values; s2, counting the frequency of node names to obtain a word frequency table; s3, sorting the word frequency table according to the word frequency from high to low; s4, establishing an optimal tree according to the word frequency table in descending order; and S5, storing the word frequency table, the optimal tree and the optimal tree paths of all the nodes. According to the method, node name fields which repeatedly appear in the original data are sequenced, an optimal tree is established according to the sequenced word frequency table, the original data are stored in a database by adopting an optimal tree structure, and the word frequency table is stored in the database, so that the database volume is greatly compressed. Compared with the prior art, the method has the advantages that the optimal coding is adopted, the node name fields are arranged in a descending order according to the occurrence frequency, and the volume of the whole database can be reduced to the maximum extent.

Description

Method, system, terminal device and readable storage medium for greatly compressing volume of database
Technical Field
The invention relates to the field, in particular to a method, a system, a terminal device and a readable storage medium for greatly compressing the volume of a database.
Background
In an embedded software product, due to the cost limitation, the limited storage space and the high requirements on software performance, a database is required to be used for ensuring the data reading and writing performance, and the occupied space of the database is required to be as small as possible. Therefore, a method for compressing the database volume with both software execution efficiency and memory space saving is needed.
When the traditional database stores data, if the compression technology is not used for processing the original data in order to ensure the efficiency, the space occupation of the database is higher when the data volume is large; if the original data is compressed by using some data compression technologies and then stored in order to save space, even though the occupied space of the data can be reduced, the processing process of the compression technology is time-consuming and labor-consuming, so that the overall efficiency of the database is greatly reduced.
The problems existing in the prior method are as follows: when the data of the tree structure is imported into the database, the volume and the read-write performance of the database cannot be considered at the same time.
Disclosure of Invention
The present invention is directed to a method, a system, a terminal device and a readable storage medium for greatly compressing a database volume.
The invention aims to be realized by the following technical scheme: a method of substantially compressing a database volume, comprising the steps of:
s1: traversing all the trees, and extracting all the node names and the values of the leaf nodes;
s2: counting the frequency of node names to obtain a word frequency table;
s3: the word frequency table is sorted from high to low according to the word frequency;
s4: establishing an optimal tree according to the word frequency table in descending order;
s5: and storing the word frequency table, the optimal tree and the optimal tree paths of all the nodes.
In step S3, the word frequency of each node name is sorted from high to low, and then each node name is encoded.
A system for substantially compressing a database volume, comprising:
the compression module is used for compressing the data by sequencing, coding and establishing an optimal tree for the node names; wherein, the compression module also comprises a compression module,
the counting module is used for counting the word frequency appearing in the node name and sequencing the word frequency from high to low;
and the coding module is used for coding the node names through the sequencing of the word frequency by the counting module and creating an optimal tree.
The system also comprises an acquisition module and a storage module, wherein the acquisition module acquires original data and extracts node names by traversing the data; the storage module is used for storing the compressed data in the compression module into a database; the compressed data comprises a word frequency table, an optimal tree and optimal tree paths of all nodes in the coding module.
A terminal device for substantially compressing a volume of a database, said terminal device comprising: one or more processors; storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a method of substantially compressing a database volume as described above.
A computer readable storage medium for substantially compressing a database volume, the computer readable storage medium having stored thereon instructions that, when executed by a processor, implement a method of substantially compressing a database volume as described above.
The invention has the beneficial effects that:
(1) the problem that the space occupation is overlarge when the tree structure data is imported into the database is solved;
(2) because a compression technology which has a large influence on the performance is not adopted, the data is compressed in a mapping relation copy mode by adopting the optimal coding, the beneficial effect (1) is realized, and meanwhile, the reading and writing efficiency of the database is ensured;
(3) the method is simple, low in research and development and production cost and convenient for development and realization and large-scale production.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
In order to more clearly understand the technical features, objects and effects of the present invention, embodiments of the present invention will be described with reference to the accompanying drawings.
As shown in fig. 1, a method for substantially compressing a database volume comprises the following steps:
s1: traversing all the trees, and extracting all the node names and the values of the leaf nodes;
s2: counting the frequency of node names to obtain a word frequency table;
s3: the word frequency table is sorted from high to low according to the word frequency;
s4: establishing an optimal tree according to the word frequency table in descending order;
s5: and storing the word frequency table, the optimal tree and the optimal tree paths of all the nodes.
In step S3, the word frequency of each node name is sorted from high to low, and then each node name is encoded.
A system for substantially compressing a database volume, comprising:
the compression module is used for compressing the data by sequencing, coding and establishing an optimal tree for the node names; wherein, the compression module also comprises a compression module,
the counting module is used for counting the word frequency appearing in the node name and sequencing the word frequency from high to low;
and the coding module is used for coding the node names through the sequencing of the word frequency by the counting module and creating an optimal tree.
The system also comprises an acquisition module and a storage module, wherein the acquisition module acquires original data and extracts node names by traversing the data; the storage module is used for storing the compressed data in the compression module into a database; the compressed data comprises a word frequency table, an optimal tree and optimal tree paths of all nodes in the coding module.
A terminal device for substantially compressing a volume of a database, said terminal device comprising: one or more processors; storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a method of substantially compressing a database volume as described above.
A computer readable storage medium for substantially compressing a database volume, the computer readable storage medium having stored thereon instructions that, when executed by a processor, implement a method of substantially compressing a database volume as described above. The specific method principle is as follows:
example 1 is a tree structure with a very deep depth, and the following structure is recorded:
Aaa/Bbb/Ccc=5
Aaa/Bbb/Ccc/Ddd/Eee=1
Aaa/Bbb/Ccc/Ddd/Eee/Fff=string
if the fields of Aaa, Bbb and the like are all long English words, the fields of Aaa, Bbb and the like are tried to be processed as follows:
1. scanning original data, and extracting all fields of node names such as Aaa, Bbb and the like;
2. sorting the word frequency from high to low according to the appearance of each node name;
3. after sorting, encoding is performed. For example: aa is the highest frequency, Bbb is the next to it, and so on, as in table 1:
Figure BDA0002460211440000031
Figure BDA0002460211440000041
table 1 word frequency ordering code table 1
4. Constructing an optimal tree through the word frequency sequencing coding table, and creating an optimal tree structure table, as shown in table 2:
node field Value of
0/1/2 5
0/1/2/3/4 1
0/1/2/3/4/5 string
Table 2 optimal tree structure table 1
5. And creating an sql file according to the word frequency ordering coding table and the optimal tree structure table, and importing the created sql file into a database.
Example 2 is data in xml format, as follows:
Figure BDA0002460211440000042
1. scanning original data, and extracting all fields of node names such as FAP, PerfMgmt and the like;
2. sorting the word frequency from high to low according to the appearance of each node name;
3. and after sequencing, coding is carried out, when the frequency of the node names is the same, different codes are set for each node name, and the sequencing sequence does not influence the construction of the optimal tree. For example: FAP is the highest frequency, PerfMgmt, Config NumberOfEntity … times, and so on, as in Table 3:
node name Encoding
FAP 0
PerMgmt 1
Config 2
ConfigNumberOfEntity 3
Table 3 word frequency ordering code table 2
4. Constructing an optimal tree through the word frequency sequencing coding table, and creating an optimal tree structure table, as shown in table 4:
node field Value of
0/1/3 1
0/1/2/4 http://1.1.1.1
0/1/2/5 abc
Table 4 optimal tree structure table 2
5. And creating an sql file according to the word frequency ordering coding table and the optimal tree structure table, and importing the created sql file into a database.
The foregoing has described the general principles, principal features, and advantages of the invention over conventional techniques. It will be appreciated by persons skilled in the art that the present invention is not limited by the embodiments described above. The foregoing embodiments and description are illustrative of the principles of the present invention, and various changes and modifications can be made herein without departing from the spirit and scope of the invention, which is defined by the claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (7)

1. A method of substantially compressing a volume of a database, comprising the steps of:
s1, traversing all trees, and extracting all node names and leaf node values;
s2, counting the frequency of the node names to obtain a word frequency table;
s3, sorting the word frequency table according to the word frequency from high to low;
s4, creating an optimal tree according to the descending word frequency table;
and S5, storing the word frequency table, the optimal tree and the optimal tree path of all the nodes.
2. The method of claim 1, wherein the step S3 is performed by sorting the frequency of occurrence of each node name from high to low, and then encoding each node name.
3. A system for substantially compressing a volume of a database, comprising:
the compression module is used for compressing the data by sequencing, coding and establishing an optimal tree for the node names; wherein, the compression module also comprises a compression module,
the counting module is used for counting the word frequency appearing in the node name and sequencing the word frequency from high to low;
and the coding module is used for coding the node names through the sequencing of the word frequency by the counting module and creating an optimal tree.
4. The system for substantially compressing a database volume of claim 3, further comprising an acquisition module and a storage module, wherein the acquisition module acquires raw data and extracts data node names by traversal; and the storage module is used for storing the compressed data in the compression module into a database.
5. The system of claim 4, wherein said compressed data comprises a word frequency table in an encoding module, an optimal tree, and an optimal tree path for all nodes.
6. A terminal device for substantially compressing a volume of a database, the terminal device comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a method of substantially compressing a database volume as recited in any of claims 1-2.
7. A computer readable storage medium for substantially compressing a database volume, the computer readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement a method for substantially compressing a database volume as recited in any of claims 1-2.
CN202010318025.1A 2020-04-21 2020-04-21 Method, system, terminal device and readable storage medium for greatly compressing volume of database Pending CN111506781A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010318025.1A CN111506781A (en) 2020-04-21 2020-04-21 Method, system, terminal device and readable storage medium for greatly compressing volume of database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010318025.1A CN111506781A (en) 2020-04-21 2020-04-21 Method, system, terminal device and readable storage medium for greatly compressing volume of database

Publications (1)

Publication Number Publication Date
CN111506781A true CN111506781A (en) 2020-08-07

Family

ID=71869773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010318025.1A Pending CN111506781A (en) 2020-04-21 2020-04-21 Method, system, terminal device and readable storage medium for greatly compressing volume of database

Country Status (1)

Country Link
CN (1) CN111506781A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101520771A (en) * 2009-03-27 2009-09-02 广东国笔科技股份有限公司 Method and system for code compression and decoding for word library
CN102572428A (en) * 2011-12-28 2012-07-11 南京邮电大学 Side information estimating method oriented to distributed coding and decoding of multimedia sensor network
CN104240747A (en) * 2013-06-07 2014-12-24 炬力集成电路设计有限公司 Multimedia data acquisition method and device
CN104283567A (en) * 2013-07-02 2015-01-14 北京四维图新科技股份有限公司 Method for compressing or decompressing name data, and equipment thereof
CN104283568A (en) * 2013-07-12 2015-01-14 中国科学院声学研究所 Data compressed encoding method based on part Hoffman tree
CN106067824A (en) * 2016-06-02 2016-11-02 洛阳晶云信息科技有限公司 A kind of sequencing data compression method based on bigeminy codon
CN106202172A (en) * 2016-06-24 2016-12-07 中国农业银行股份有限公司 Text compression methods and device
CN109491727A (en) * 2018-10-16 2019-03-19 深圳壹账通智能科技有限公司 Object serialization method, terminal device and computer readable storage medium
CN109558128A (en) * 2018-10-25 2019-04-02 平安科技(深圳)有限公司 Json data analysis method, device and computer readable storage medium
CN109831544A (en) * 2019-01-30 2019-05-31 重庆农村商业银行股份有限公司 A kind of coding and storing method and system applied to E-mail address
CN109995377A (en) * 2017-12-29 2019-07-09 烟台正展信息技术有限公司 A kind of improvement Compression Algorithm for Electrocardiogram based on huffman coding

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101520771A (en) * 2009-03-27 2009-09-02 广东国笔科技股份有限公司 Method and system for code compression and decoding for word library
CN102572428A (en) * 2011-12-28 2012-07-11 南京邮电大学 Side information estimating method oriented to distributed coding and decoding of multimedia sensor network
CN104240747A (en) * 2013-06-07 2014-12-24 炬力集成电路设计有限公司 Multimedia data acquisition method and device
CN104283567A (en) * 2013-07-02 2015-01-14 北京四维图新科技股份有限公司 Method for compressing or decompressing name data, and equipment thereof
CN104283568A (en) * 2013-07-12 2015-01-14 中国科学院声学研究所 Data compressed encoding method based on part Hoffman tree
CN106067824A (en) * 2016-06-02 2016-11-02 洛阳晶云信息科技有限公司 A kind of sequencing data compression method based on bigeminy codon
CN106202172A (en) * 2016-06-24 2016-12-07 中国农业银行股份有限公司 Text compression methods and device
CN109995377A (en) * 2017-12-29 2019-07-09 烟台正展信息技术有限公司 A kind of improvement Compression Algorithm for Electrocardiogram based on huffman coding
CN109491727A (en) * 2018-10-16 2019-03-19 深圳壹账通智能科技有限公司 Object serialization method, terminal device and computer readable storage medium
CN109558128A (en) * 2018-10-25 2019-04-02 平安科技(深圳)有限公司 Json data analysis method, device and computer readable storage medium
CN109831544A (en) * 2019-01-30 2019-05-31 重庆农村商业银行股份有限公司 A kind of coding and storing method and system applied to E-mail address

Similar Documents

Publication Publication Date Title
CN111046630B (en) Syntax tree extraction method of JSON data
KR100614677B1 (en) Method for compressing/decompressing a structured document
CN104283567A (en) Method for compressing or decompressing name data, and equipment thereof
CN105144157B (en) System and method for the data in compressed data library
CN103678339B (en) Data backflow method and system and data access method and system in relational database
CN108334609B (en) Method, device, equipment and storage medium for realizing JSON format data access in Oracle
CN104360865A (en) Serialization method, deserialization method and related equipment
CN101783788A (en) File compression method, file compression device, file decompression method, file decompression device, compressed file searching method and compressed file searching device
CA2448787A1 (en) Method and computer-readable medium for importing and exporting hierarchically structured data
CN105005600A (en) Preprocessing method of URL (Uniform Resource Locator) in access log
WO2017036348A1 (en) Method and device for compressing and decompressing extensible markup language document
CN105404472A (en) Method and apparatus for compressing storage space of log time data
CN117216023B (en) Large-scale network data storage method and system
CN116089663A (en) Rule expression matching method and device and computer readable storage medium
CN114443656A (en) Customizable automated data model analysis tool and use method thereof
CN114385146A (en) Simple object transmission serialization method and device
CN111506781A (en) Method, system, terminal device and readable storage medium for greatly compressing volume of database
CN110825744B (en) Cluster environment-based air quality monitoring big data partition storage method
CN109831544B (en) Code storage method and system applied to email address
CN110059303B (en) Method and device for converting Excel file into JSON file
CN103020189A (en) Data processing device and method
CN115640420A (en) ES-based audio information index database establishing and retrieving method, ES-based audio information index database establishing and retrieving equipment and ES-based audio information index database storing medium
CN110569243B (en) Data query method, data query plug-in and data query server
JP2007148751A (en) Encoding method, encoding device, encoding program and decoding device for structured document and data structure for encoded structured document
CN114297046A (en) Event obtaining method, device, equipment and medium based on log

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200807