CN116680269A - Time sequence data coding and compressing method, system, equipment and medium - Google Patents

Time sequence data coding and compressing method, system, equipment and medium Download PDF

Info

Publication number
CN116680269A
CN116680269A CN202310684878.0A CN202310684878A CN116680269A CN 116680269 A CN116680269 A CN 116680269A CN 202310684878 A CN202310684878 A CN 202310684878A CN 116680269 A CN116680269 A CN 116680269A
Authority
CN
China
Prior art keywords
time
compression
coding
mode
encoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310684878.0A
Other languages
Chinese (zh)
Inventor
肖学文
郑强
张晓辉
彭燕华
李强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CISDI Chongqing Information Technology Co Ltd
Original Assignee
CISDI Chongqing Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CISDI Chongqing Information Technology Co Ltd filed Critical CISDI Chongqing Information Technology Co Ltd
Priority to CN202310684878.0A priority Critical patent/CN116680269A/en
Publication of CN116680269A publication Critical patent/CN116680269A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application provides a method, a system, equipment and a medium for encoding and compressing time sequence data, which comprises the following steps: in the process of storing time sequence data in a database, a decision maker is utilized to determine the coding and compression modes of time sequence data storage, after the time sequence data amount stored in the database reaches a preset amount, the time sequence data is periodically analyzed, a plurality of coding and compression schemes are generated based on the cross combination of the coding modes and the compression modes supported by the database, and the coding and compression modes in the decision maker are updated by the coding and compression scheme with the highest score value. According to the application, the time sequence data of the real-time database is analyzed regularly, and the proper coding and compression method is selected intelligently, so that the purposes of saving the storage space and improving the query efficiency can be achieved. The method and the device can effectively utilize the time sequence data characteristics of the real-time database, and timely determine and update the coding and compression strategies of the time sequence data through intelligent analysis, so that the storage space of the data can be saved, and the query speed of the data can be improved.

Description

Time sequence data coding and compressing method, system, equipment and medium
Technical Field
The present application relates to the field of database technologies, and in particular, to a method, a system, an apparatus, and a medium for encoding and compressing time-series data.
Background
With the formal start of 5G (5 th Generation Mobile Communication Technology, abbreviated as 5G) business, china formally enters the 5G era, ioT (Internet of Things, abbreviated as IoT) in the internet of things is also greatly transformed, and the 5G technology makes the transmission of time series data in the internet of things more efficient, and an important technology for storing the data is a time series database, and a large amount of data generated by the internet of things equipment needs to be processed and stored through the efficient time series database. With the increasing amount of time series data, compression of time series data is a very important technical difficulty. The coding and compression of the time sequence data determine the storage space and the read-write efficiency of the data, and has very important significance for optimizing the time sequence database.
In the process of storing the real-time database data, different data types, data magnitudes and data distribution characteristics are stored in different coding modes and compression modes, so that different space occupation rates and different query efficiencies are realized. Over time, the data characteristics may change, possibly resulting in the original coding, compression scheme no longer being applicable. In the prior art, the coding mode and the compression mode of data storage in a real-time database are single, and the original coding mode and compression mode cannot adapt to the characteristics of sequence data aiming at time sequence data which increases rapidly every day; the storage space cannot be saved to the greatest extent, and the storage cost is reduced.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, an object of the present application is to provide a method, a system, an apparatus, and a medium for encoding and compressing time series data, which are used for solving the problems of the prior art in encoding and compressing time series data.
To achieve the above and other related objects, the present application provides a method for encoding and compressing time series data, comprising the steps of:
in the process of storing time sequence data into a database, using a first coding mode and a first compression mode which are preset in a decision maker to perform initial coding and initial compression on the time sequence data;
periodically analyzing the time sequence data, and generating a plurality of coding compression schemes by cross combination based on a coding mode and a compression mode supported by the database after the time sequence data stored by the database is greater than or equal to a preset data volume; each coding compression scheme comprises a coding mode and a compression mode;
counting the coding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization rate and compression rate of each coding compression scheme, and calculating the score value of each coding compression scheme based on the corresponding weight value;
The coding mode and the compression mode corresponding to the coding compression scheme with the highest screening score value are respectively marked as a second coding mode and a second compression mode;
replacing a first coding mode in the decision maker by using the second coding mode, and replacing the first compression mode in the decision maker by using the second compression mode to update the coding mode and the compression mode in the decision maker;
and based on a decision maker for updating the coding mode and the compression mode, coding and compressing the time sequence data stored in the database.
In an embodiment of the present application, the process of counting the encoding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization and compression rate of each encoding compression scheme, and calculating the score value of each encoding compression scheme based on the corresponding weight value includes:
Score i =Wi*(1-TE i )+W2*(1-TD i )+W3*(1-TC i )+W4*(1-TDe i )+W5*(1-TW i )+W6*(1-TR i )+W7*(1-CPU i )+W8*(1-RA i );
in TE i Representing the ith code compressionCoding time of the scheme;
TD i representing the decoding time of the ith coding compression scheme;
TC i representing the compression time of the ith coding compression scheme;
TDe i representing the decompression time of the ith coding compression scheme;
TW i representing the time to write the disc for the ith encoding compression scheme;
TR i Representing the disk read time of the ith code compression scheme;
CPU i representing the processor utilization of the ith coding compression scheme;
RA i representing the compression rate of the ith coding compression scheme;
w1, W2, W3, W4, W5, W6, W7, W8 respectively represent the weight values corresponding to the encoding time, decoding time, compression time, decompression time, writing disk time, reading disk time, processor utilization, and compression rate, and w1+w2+w3+w4+w5+w6+w7+w8=1.
In an embodiment of the present application, before calculating the score value of each coding compression scheme based on the corresponding weight value, the method further includes:
carrying out normalization processing on the encoding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization rate and compression rate of each encoding compression scheme by adopting a linear normalization formula; wherein, the linear normalization formula is:
wherein X' represents a normalized value of a certain parameter in the coding compression scheme;
x represents an original value before normalization of the certain parameter in the coding compression scheme;
x_min represents the minimum value of the certain parameter in the coding compression scheme;
x_max represents the maximum value of the certain parameter in the encoding compression scheme.
In an embodiment of the present application, when calculating the score value of each coding compression scheme based on the corresponding weight value, the method further includes:
correlating the encoding time, the compression time and the disk writing time to generate a first variable time; the method comprises the steps of,
correlating the decoding time, the decompressing time and the disk reading time to generate a second variable time;
and counting the first variable time, the second variable time, the processor utilization rate and the compression rate of each coding compression scheme, and calculating the score value of each coding compression scheme based on the corresponding weight value.
In one embodiment of the present application, the process of associating the encoding time, the compression time and the writing disk time to generate the first variable time includes:
CC=TE+TC+TW;
the decoding time, the decompressing time and the disk reading time are associated, and the process of generating the second variable time comprises the following steps:
DD=TR+TDe+TD;
wherein CC represents a first variable time;
TE represents the encoding time;
TC represents compression time;
TW represents disk write time;
DD represents a second variable time;
TR represents a disk read time;
TDe represents decompression time;
TD represents the decoding time.
In an embodiment of the present application, after storing the time series data in the database, the method further includes:
acquiring time sequence data in a preset time period in the database;
taking the coding compression scheme corresponding to the coding mode and the compression mode in the decision maker as an abscissa and the score value corresponding to the coding compression scheme as an ordinate; a decision maker score histogram is generated based on the abscissa and the ordinate.
In an embodiment of the application, the time series data includes a data column recorded in time series by the same index; the data in the same data column have the same caliber and are comparable.
The application also provides a time sequence data coding and compressing system, which comprises:
the initial coding and compressing module is used for carrying out initial coding and initial compression on the time sequence data by utilizing a first coding mode and a first compressing mode which are preset in the decision maker in the process of storing the time sequence data into the database;
the period analysis module is used for periodically analyzing the time sequence data, and generating a plurality of coding compression schemes by cross combination based on a coding mode and a compression mode supported by the database after the time sequence data stored in the database is greater than or equal to a preset data volume; each coding compression scheme comprises a coding mode and a compression mode;
The score calculation module is used for counting the coding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization rate and compression rate of each coding compression scheme, and calculating the score value of each coding compression scheme based on the corresponding weight value;
the screening module is used for screening the coding mode and the compression mode corresponding to the coding compression scheme with the highest score value, and the coding mode and the compression mode are respectively marked as a second coding mode and a second compression mode;
an updating and replacing module, configured to replace a first coding mode in the decision maker with the second coding mode, and replace a first compression mode in the decision maker with the second compression mode, and update the coding mode and the compression mode in the decision maker;
and the dynamic coding and compressing module is used for coding and compressing the time sequence data stored in the database according to the decision maker for updating the coding mode and the compressing mode.
The application also provides a time sequence data coding and compressing device, which comprises:
a processor; and, a step of, in the first embodiment,
a computer readable medium storing instructions that, when executed by the processor, cause the apparatus to perform the method of encoding and compressing time series data as described in any one of the above.
The present application also provides a computer readable medium having instructions stored thereon, the instructions being loaded by a processor and performing the method of encoding and compressing time series data as described in any one of the above.
As described above, the present application provides a method, system, device, and medium for encoding and compressing time series data, which has the following steps
The beneficial effects are that:
in the process of storing time sequence data into a database, the application utilizes a first coding mode and a first compression mode which are preset in a decision maker to carry out initial coding and initial compression on the time sequence data; the time sequence data is periodically analyzed, and after the time sequence data stored in the database is greater than or equal to the preset data quantity, a plurality of coding compression schemes are generated based on the coding mode and the compression mode supported by the database in a cross combination mode; each coding compression scheme comprises a coding mode and a compression mode; counting the coding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization rate and compression rate of each coding compression scheme, and calculating the score value of each coding compression scheme based on the corresponding weight value; screening the coding mode and the compression mode corresponding to the coding compression scheme with the highest score value, and respectively marking the coding mode and the compression mode as a second coding mode and a second compression mode; then the second coding mode is used for replacing the first coding mode in the decision maker, and the second compression mode is used for replacing the first compression mode in the decision maker, so that the coding mode and the compression mode in the decision maker are updated; and finally, based on a decision maker for updating the coding mode and the compression mode, coding and compressing the time sequence data stored in the database. Therefore, the application can intelligently select proper coding and compression methods by periodically analyzing the time sequence data of the real-time database, thereby achieving the purposes of saving storage space and improving query efficiency. The method and the device can effectively utilize the time sequence data characteristics of the real-time database, and timely determine and update the coding and compression strategies of the time sequence data through intelligent analysis, so that the storage space of the data can be saved, and the query speed of the data can be improved.
Drawings
FIG. 1 is a schematic diagram of an exemplary system architecture to which the teachings of one or more embodiments of the present application may be applied;
FIG. 2 is a flow chart of a method for encoding and compressing time-series data according to an embodiment of the application;
FIG. 3 is a flow chart illustrating a method for encoding and compressing time-series data according to another embodiment of the present application;
FIG. 4 is a flow chart of a decision maker for a method of encoding and compressing time series data according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a method for encoding and compressing time-series data according to an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating data file storage according to an embodiment of the present application;
FIG. 7 is a diagram of an exemplary index information radar according to the present application;
FIG. 8 is a histogram of decision maker scores provided by an embodiment of the present application;
FIG. 9 is a schematic diagram of a hardware configuration of a time-series data encoding and compression system according to an embodiment of the present application;
fig. 10 is a schematic diagram of a hardware configuration of a time-series data encoding and compression apparatus suitable for implementing one or more embodiments of the present application.
Detailed Description
Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present application by way of illustration, and only the components related to the present application are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
FIG. 1 illustrates a schematic diagram of an exemplary system architecture to which the teachings of one or more embodiments of the present application may be applied. As shown in fig. 1, system architecture 100 may include a terminal device 110, a network 120, and a server 130. Terminal device 110 may include various electronic devices such as smart phones, tablet computers, notebook computers, desktop computers, and the like. The server 130 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. Network 120 may be a communication medium of various connection types capable of providing a communication link between terminal device 110 and server 130, and may be, for example, a wired communication link or a wireless communication link.
The system architecture in embodiments of the present application may have any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 130 may be a server group composed of a plurality of server devices. In addition, the technical solution provided in the embodiment of the present application may be applied to the terminal device 110, or may be applied to the server 130, or may be implemented by the terminal device 110 and the server 130 together, which is not limited in particular.
In an embodiment of the present application, the terminal device 110 or the server 130 of the present application may perform initial encoding and initial compression on the time sequence data by using a first encoding mode and a first compression mode preset in the decision maker in the process of storing the time sequence data in the database; the time sequence data is periodically analyzed, and after the time sequence data stored in the database is greater than or equal to the preset data quantity, a plurality of coding compression schemes are generated based on the coding mode and the compression mode supported by the database in a cross combination mode; each coding compression scheme comprises a coding mode and a compression mode; counting the coding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization rate and compression rate of each coding compression scheme, and calculating the score value of each coding compression scheme based on the corresponding weight value; screening the coding mode and the compression mode corresponding to the coding compression scheme with the highest score value, and respectively marking the coding mode and the compression mode as a second coding mode and a second compression mode; then the second coding mode is used for replacing the first coding mode in the decision maker, and the second compression mode is used for replacing the first compression mode in the decision maker, so that the coding mode and the compression mode in the decision maker are updated; and finally, based on a decision maker for updating the coding mode and the compression mode, coding and compressing the time sequence data stored in the database. By using the terminal device 110 or the server 130 to execute the coding and compression method of the time sequence data, the real-time database time sequence data can be analyzed periodically, and a proper coding and compression method can be selected intelligently, so that the purposes of saving storage space and improving query efficiency can be achieved. Meanwhile, the time sequence data characteristics of the real-time database can be effectively utilized, and the coding and compression strategies of the time sequence data can be timely determined and updated through intelligent analysis, so that the storage space of the data can be saved, and the query speed of the data can be improved.
The foregoing describes the contents of an exemplary system architecture to which the technical solution of the present application is applied, and the following describes the method for encoding and compressing time-series data of the present application.
Fig. 2 is a schematic flow chart of a method for encoding and compressing time-series data according to an embodiment of the application. Specifically, in an exemplary embodiment, as shown in fig. 2, the present embodiment provides a method for encoding and compressing time-series data, which includes the following steps:
s210, in the process of storing time sequence data into a database, using a first coding mode and a first compression mode which are preset in a decision maker to perform initial coding and initial compression on the time sequence data;
s220, periodically analyzing the time sequence data, and generating a plurality of coding compression schemes based on a coding mode and a compression mode supported by the database in a cross combination mode after the time sequence data stored in the database is greater than or equal to a preset data volume; each coding compression scheme comprises a coding mode and a compression mode;
s230, counting the coding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization rate and compression rate of each coding compression scheme, and calculating the score value of each coding compression scheme based on the corresponding weight value;
S240, screening a coding mode and a compression mode corresponding to the coding compression scheme with the highest score value, and respectively marking the coding mode and the compression mode as a second coding mode and a second compression mode;
s250, replacing the first coding mode in the decision maker by using the second coding mode, and replacing the first compression mode in the decision maker by using the second compression mode, and updating the coding mode and the compression mode in the decision maker;
s260, based on the decision device for updating the coding mode and the compression mode, the time sequence data stored in the database is coded and compressed.
Therefore, according to the embodiment, the time sequence data of the real-time database is analyzed periodically, and the proper coding and compression method is selected intelligently, so that the purposes of saving the storage space and improving the query efficiency can be achieved. The embodiment can effectively utilize the time sequence data characteristics of the real-time database, and timely determine and update the coding and compression strategies of the time sequence data through intelligent analysis, so that the storage space of the data can be saved, and the query speed of the data can be improved. The time sequence data in the embodiment comprises data generated by a sensor, a PLC, a DCS, a robot and the like; it can be seen that the time series data described in this embodiment can be derived from various software and hardware data sources for data storage, analysis and intelligent application. The coding and compressing method of the time sequence data described in the embodiment can be applied to the scene of the internet of things, namely, the data generated by the equipment and the software of the internet of things are coded and compressed and then stored in a database; the method can also be applied to production control scenes, namely, the sensor data of the production control system are encoded and compressed and then stored in a database.
In an exemplary embodiment, the process of counting the encoding time, decoding time, compression time, decompression time, writing disk time, reading disk time, processor utilization and compression rate of each encoding compression scheme, and calculating the score value of each encoding compression scheme based on the corresponding weight value includes: score i =W1*(1-TE i )+W2*(1-TD i )+W3*(1-TC i )+W4*(1-TDe i )+W5*(1-TW i )+W6*(1-TR i )+W7*(1-CPU i )+W8*(1-RA i ). In TE i Representing the coding time of the ith coding compression scheme; TD (time division) i Representing the decoding time of the ith coding compression scheme; TC (TC) i Representing the compression time of the ith coding compression scheme; TDe (time Domain reflectometry) i Representing the decompression time of the ith coding compression scheme; TW (twinning tag) i Representing the time to write the disc for the ith encoding compression scheme; TR (TR) i Representing the disk read time of the ith code compression scheme; CPU (Central processing Unit) i Representing the processor utilization of the ith coding compression scheme; RA (RA) i Representing the compression rate of the ith coding compression scheme; w1, W2, W3, W4, W5, W6, W7, W8 respectively represent the weight values corresponding to the encoding time, decoding time, compression time, decompression time, writing disk time, reading disk time, processor utilization, and compression rate, and w1+w2+w3+w4+w5+w6+w7+w8=1.
In an exemplary embodiment, before calculating the score value of each encoding compression scheme based on the corresponding weight value, the present embodiment may further include: carrying out normalization processing on the encoding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization rate and compression rate of each encoding compression scheme by adopting a linear normalization formula; wherein, the linear normalization formula is: Wherein X' represents a normalized value of a certain parameter in the coding compression scheme; x represents an original value before normalization of a certain parameter in the coding compression scheme; x_min represents the minimum value of a certain parameter in the coding compression scheme; x_max represents the maximum value of a certain parameter in the encoding compression scheme. Some parameters in the coding compression scheme include: encoding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization and compression rate.
In an exemplary embodiment, when calculating the score value of each coding compression scheme based on the corresponding weight value, the embodiment may further include: correlating the encoding time, the compression time and the disk writing time to generate a first variable time; and correlating the decoding time, the decompressing time and the disk reading time to generate a second variable time; and counting the first variable time, the second variable time, the processor utilization rate and the compression rate of each coding compression scheme, and calculating the score value of each coding compression scheme based on the corresponding weight value. Specifically, the process of associating the encoding time, the compression time and the disk writing time to generate the first variable time includes: cc=te+tc+tw; the process of associating the decoding time, the decompression time and the disk reading time to generate the second variable time includes: dd=tr+tde+td. Wherein CC represents a first variable time; TE represents the encoding time; TC represents compression time; TW represents disk write time; TR represents a disk read time; TDe represents decompression time; TD represents the decoding time.
In an exemplary embodiment, after storing the time series data in the database, the present embodiment may further include: acquiring time sequence data in a preset time period in a database; taking the coding compression scheme corresponding to the coding mode and the compression mode in the decision maker as an abscissa and the score value corresponding to the coding compression scheme as an ordinate; a decision maker score histogram is generated based on the abscissa and the ordinate. As an example, a decision maker score histogram is shown in fig. 8.
According to the above description, in an embodiment, as shown in fig. 3 to 8, the embodiment provides a method for encoding and compressing time series data, and applicable scenarios include, but are not limited to: industrial sensors, robots, DCS data storage, internet of things data storage, and equipment data storage for production control systems. The method comprises the following steps:
in the process of storing the real-time database data, the coding and compression modes of time sequence data storage are determined by a decision maker, and the decision maker has a preset coding and compression mode.
The decision maker periodically analyzes the time sequence data, and when the sequence data amount acquired by the real-time database reaches X, the decision maker analyzes the original data; the coding and compression methods supported by the exhaustive real-time database are combined in a crossing way to form a plurality of coding and compression schemes, and different schemes are used for coding and compressing the sequence data.
And counting the encoding, compressing, decoding and decompressing time and compression ratio index of the various schemes, and selecting an optimal scheme according to index information.
The decision maker recodes and compresses the data according to the optimal scheme, and changes the preset coding and compression modes into the optimal scheme.
In this embodiment, the time series data storage means: the time sequence data consists of time and values, and in order to improve the reading and writing efficiency, the same sequence exists in the same type of values, and the time and the values are respectively stored in columns.
Encoding sequence data refers to: the time sequence data is converted into a binary code stream by an encoder, one piece of time sequence data consists of time and value, and the encoder encodes the time and the value respectively. Compression, similar to encoding, refers to compressing sequence data: and compressing the encoded data by a compressor according to a specified compression method, and finally outputting the compressed data and storing the compressed data as a time sequence data file in a specific format.
The decision maker refers to: the dynamic coding compression method decision device can dynamically find the coding and compression method which is most suitable for a certain time sequence according to the coding, compression, decoding and decompression time, compression ratio index and other data in the data storage process, so as to decide which coding and compression method should be used for storing the data in a future period of time.
The dynamic selection of the optimal coding and compression method is as follows: with the continuous increase of time, the sequence data is also continuously increased, and the data characteristics are also changed; the decision maker periodically analyzes the data stored in the database, encodes and compresses the data in different modes, counts related data, selects the lowest methods such as encoding, decoding, compressing, decompressing and CPU time consumption as candidate methods through the data, and realizes the selection of dynamic encoding and compressing modes so as to improve the overall performance.
The preset coding and compression modes are as follows: the coding and compression modes are preset in the decision maker and are divided into two stages, and the coding and compression modes refer to the coding mode and the compression mode, so that the sequence data can be coded and compressed according to the coding mode and the compression mode preset in the decision maker.
The sequence data amount reaching X means that: the decision maker periodically analyzes the time sequence, and when the data volume of a certain sequence is larger than X, the decision maker analyzes the original data; x is an increment value used to instruct the decision maker to periodically analyze the raw data over time.
The cross combination means: each coding mode supported in the real-time database is respectively combined with other compression modes supported by the real-time database to form all coding and compression combination schemes supported by the real-time database.
The optimal scheme is as follows: when the whole time consumption of the index information is shortest, the efficiency of reading and storing data reaches the highest, and the coding and compression modes represented by the index information are the optimal scheme for storing the current sequence data.
The process for selecting the optimal scheme comprises the following steps: in the execution process of the decision maker, weighting the statistical index data respectively, and then carrying out linear normalization, and selecting the scheme with the highest score as the optimal scheme of the time sequence data through the scoring method.
The coding and compression process comprises the following steps: a pipeline controller is used for parallel processing of encoding and compression, an encoder and a compressor are divided into two stages of a pipeline by utilizing a pipeline technology, and the utilization rate of a CPU and a memory and IO read-write rate are improved in the mode, so that the whole encoding and compression process is optimized.
According to the above description, specifically, the file format and storage manner of the time series data are as shown in fig. 6, and d1 in fig. 6 belongs to the device (DeviceId), similar to the concept of a table. There are two stations (MeasurementId) s1, s2, one device can have multiple stations, similar to the concept of a column in a table. d1.s1 belongs to a time series Path (Path), defined by the device and the stations. The Schema belongs to measuring point description information (MeasurementSchema), and each time sequence corresponds to one description information, including a data type, a coding mode and a compression mode. Each time series has two columns: time columns, value columns, time and value are stored by column.
The decision maker provides a preset coding mode and a preset compression mode for coding and compressing time sequence data; before the sequence data amount is not large, the encoder and the compressor respectively encode and compress the sequence data according to the encoding mode and the compression mode provided by the decision maker, so as to finish the storage of the sequence data. After the sequence data quantity reaches X, the decision maker periodically analyzes the original data, exhausts coding and compression methods supported by the real-time database, and forms various coding and compression schemes by cross combination, and uses different schemes to code and compress the sequence data; counting the encoding, compressing, decoding and decompressing time and compression ratio index of the various schemes, and selecting an optimal scheme according to index information; changing a preset coding and compression mode into an optimal scheme; the encoder and the compressor respectively encode and compress the time sequence data according to the optimal scheme provided by the decision maker, and the time sequence data is stored.
The period of the decision maker is determined by the size of the sequence data volume, and when the sequence data volume reaches X, the decision maker starts to analyze the original data; proposal, statistics index and selection of optimal proposal The method comprises the steps of carrying out a first treatment on the surface of the X is an incremental value that increases as the amount of sequence data increases. X is X 0 Is the initial value of X, the decision maker analyzes the original data i times, X is equal to X i ;X i The value of (2) is as follows: x is X i =(i+1)×X 0 The method comprises the steps of carrying out a first treatment on the surface of the The sequence data quantity X is used for guiding a decision maker to periodically analyze the original data after a period of time, so that the characteristics of the sequence data are dynamically analyzed, and an optimal scheme is selected.
In actual production, different data types, data magnitudes and data distributions have different optimizing effects on the same optimizing algorithm, and have different time efficiency. Over time, the data characteristics change, and the original coding and compression schemes may not be applicable. Therefore, the data are analyzed regularly, the data are reprocessed by the built-in coding and compression modes of the exhaustion system, and the proper coding and compression scheme can be found out according to local conditions and dynamic decisions. By way of example, taking CISDigital-Times developed by a company as an example, CISDigital-Times supports 6 data types, respectively: boolaen, INT32, INT64, FLOAT, DOUBLE, TEXT; 8 coding modes are supported, namely PLAIN and TS_ DIFF, RLE, GORILLA, DICTIONARY, FREQ, ZIGZAG, CHIMP; 4 compression modes, UNCOMPRESSED, SNAPPY, LZ and GZIP respectively, are supported. Different coding, compression methods are applicable to different data types, as shown in table 1:
Table 1: relationship between data type, coding, compression
Each coding mode is respectively and crossly combined with each other compression mode to form all coding and compression combination schemes supported by the real-time database. Taking CISDigital-Times developed by a certain company as an example, there are 48 coding and compression combination schemes for CISDigital-Times.
The pipeline controller is used for parallel processing of coding and compression, and utilizes pipeline technology to divide the coder and the compressor into two stages of a pipeline, firstly, the utilization rate of a CPU and a memory and IO read-write rate can be improved in the mode, so that the whole coding and compression process is optimized, and secondly, the coding and compression of time sequence data can be carried out in stages according to coding and compression schemes provided by the decision maker.
In the execution process of the decision maker, the decision maker counts indexes, weights each index according to actual needs, normalizes linearly, calculates the scores of the coding and compression schemes, and selects the optimal scheme according to the size of the scores to finish decision. In the decision maker executing process, the statistical index data are shown in table 2:
the weighted score is calculated from the indices as shown in the following equation:
Score i =W1*(1-TE i )+W2*(1-TD i )+W3*(1-TC i )+W4*(1-TDe i )+W5*(1-TW i )+W6*(1-TR i )+W7*(1-CPU i )+W8*(1-RA i )
in TE i Representing the coding time of the ith coding compression scheme; TD (time division) i Representing the decoding time of the ith coding compression scheme; TC (TC) i Representing the compression time of the ith coding compression scheme; TDe (time Domain reflectometry) i Representing the decompression time of the ith coding compression scheme; TW (twinning tag) i Representing the time to write the disc for the ith encoding compression scheme; TR (TR) i Representing the disk read time of the ith code compression scheme; CPU (Central processing Unit) i The processor utilization that represents the ith code compression scheme, the higher the complexity, the more CPU time; RA (RA) i The compression ratio of the ith coding compression scheme is represented, and the smaller the compression ratio is, the better the compression ratio is; w1, W2, W3, W4, W5, W6, W7, W8 respectively represent encoding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization and compression rateCorresponding weight value, and w1+w2+w3+w4+w5+w6+w7+w8=1.
In order to give the Score a value of [0,1]Between, need to be to TE i 、TD i 、TC i 、TDe i 、TW i 、TR i 、CPU i 、RA i The equal parameters are normalized, a linear normalization mode is adopted, the linear relation of the data is reserved to the greatest extent, and the following formula is specifically referred to:
wherein X' represents a normalized value of a certain parameter in the coding compression scheme; x represents an original value before normalization of a certain parameter in the coding compression scheme; x_min represents the minimum value of a certain parameter in the coding compression scheme; x_max represents the maximum value of a certain parameter in the encoding compression scheme. Some parameters in the coding compression scheme include: encoding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization and compression rate.
As is clear from the above formula, the greater the Score value, the higher the efficiency of the coding compression scheme, and the coding compression scheme is adopted to perform coding and compression processing on a certain sequence.
In practice, the inventors have found that the shut down is complex between the times of encoding, compressing, writing to disk. Such as encoding, compression time, and the length of disk write time cannot be deduced. The length of decompression and decoding time can not be deduced. In the experiment, the coding, compressing and writing disk is found to be a whole, and the optimal process scheme can be illustrated only if the whole efficiency is the highest. Also, the disk is read, decompressed and decoded as a whole. Based on this, the encoding, compressing and writing disk is represented by a variable CC (code and compress), and the formula is: cc=te+tc+tw. The read disc, decompression and decoding are represented by a variable representation DD (decode and decompress), and the formula is: dd=tr+tde+td.
For time sequence data in a period of time, the decision maker makes a decision, performs normalization processing on each index information, and draws a radar chart as follows: CC represents the time taken to encode, compress, and write to disk; DD represents the time spent reading the disk, decompressing, and decoding; CPU represents CPU utilization; RA represents the compression rate; each corner of the radar chart represents a coding, compression scheme. The index information radar chart is shown in fig. 7.
Referring to fig. 3, fig. 3 is a flowchart of a statistical-based dynamic intelligent encoding and compression method for time series data according to an embodiment; the method may comprise the steps of:
in the process of storing the real-time database data, the coding and compression modes of time sequence data storage are determined by a decision maker, and the decision maker has a preset coding and compression method.
The decision maker periodically analyzes the time sequence data, and when the sequence data amount acquired by the real-time database reaches X, the decision maker analyzes the original data; the coding and compression methods supported by the exhaustive real-time database are combined in a crossing way to form a plurality of coding and compression schemes, and different schemes are used for coding and compressing the sequence data.
And counting the encoding, compressing, decoding and decompressing time and compression ratio index of various schemes, and selecting an optimal scheme according to index information.
The decision maker recodes and compresses the data according to the optimal scheme, and changes the preset coding and compression modes into the optimal scheme.
FIG. 4 is a flow chart of a decision maker for a statistical-based dynamic intelligent encoding and compression method for time series data according to an embodiment;
the decision maker periodically analyzes the time sequence, and when the sequence data amount is larger than X, the decision maker analyzes the original data. X is an increasing value, increasing with increasing data size. The exhaustive system supports different coding and compression methods to form a plurality of schemes, and different schemes are used for coding and compressing the sequences. The encoding, compression, decoding, decompression time and compression ratio index of various schemes are counted. And selecting an optimal scheme according to the index information. And recoding and compressing the data, reducing the storage space of the data and improving the query speed of the data.
FIG. 5 is a schematic diagram of a decision maker for a statistical-based dynamic intelligent encoding and compression method for time series data according to an embodiment;
taking a CISDigital-TimeS real-time database self-developed by a certain company as an example, the whole coding and compression subsystem comprises an encoder, a compressor, a pipeline controller and a coding and compression analysis decision maker. The system input is time series data, the time series data is composed of time and value, in the figure, T represents time, and S represents time series value. In order to store the same type of data together to improve read-write efficiency, T and S are stored in columns.
The encoder converts time sequence data into a binary code stream, wherein one piece of time sequence data consists of time T and a value S, and the encoder encodes the time T and the value S respectively; the encoder supports the expansion of the coding modes through a plug-in architecture, and 8 coding modes are supported as shown in fig. 5; the compressor is similar to the encoder, and compresses the encoded data according to a specified compression method; and finally outputting the compressed data and storing the compressed data as a time sequence data file in a specific format.
The pipeline controller is used for encoding and compression parallel processing, and the encoder and the compressor are divided into two stages of a pipeline through pipeline technology, so that the utilization rate of a CPU and a memory and the IO read-write rate are improved.
The decision maker is a core module of the system, and dynamically discovers the most suitable coding and compression method of a certain time sequence according to the historical data of coding and compression rate in the running process of the system, so as to decide which coding and compression method the system should use in a future period of time. The decision maker periodically analyzes the data stored in the database, encodes and compresses the data in different modes, counts related data, determines an optimal scheme through the lowest method of data selection, encoding, decoding, compression, decompression, CPU time consumption and the like, re-encodes and compresses the data according to the optimal scheme, and completes the storage of time sequence data files in a specific format.
From this, the present embodiment discloses a statistics-based time series data dynamic intelligent coding and compression method, which mainly comprises the following steps: in the process of storing the real-time database data, the coding and compression modes of time sequence data storage are determined by a decision maker, and the decision maker has a preset coding and compression method. The decision maker periodically analyzes the time sequence data, and when the sequence data amount acquired by the real-time database reaches X, the decision maker analyzes the original data; the coding and compression methods supported by the exhaustive real-time database are combined in a crossing way to form a plurality of coding and compression schemes, and different schemes are used for coding and compressing the sequence data. And counting the encoding, compressing, decoding and decompressing time and compression ratio index of various schemes, and selecting an optimal scheme according to index information. The decision maker recodes and compresses the data according to the optimal scheme, and changes the preset coding and compression modes into the optimal scheme. The embodiment can effectively utilize the time sequence data characteristics of the real-time database, and timely determine and update the coding and compression strategies of the time sequence data through intelligent analysis, so that the storage space of the data is saved, and the query speed of the data is improved.
In summary, the present application provides a method for encoding and compressing time series data, which uses a first encoding mode and a first compressing mode preset in a decision maker to perform initial encoding and initial compression on the time series data in the process of storing the time series data into a database; the time sequence data is periodically analyzed, and after the time sequence data stored in the database is greater than or equal to the preset data quantity, a plurality of coding compression schemes are generated based on the coding mode and the compression mode supported by the database in a cross combination mode; each coding compression scheme comprises a coding mode and a compression mode; counting the coding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization rate and compression rate of each coding compression scheme, and calculating the score value of each coding compression scheme based on the corresponding weight value; screening the coding mode and the compression mode corresponding to the coding compression scheme with the highest score value, and respectively marking the coding mode and the compression mode as a second coding mode and a second compression mode; then the second coding mode is used for replacing the first coding mode in the decision maker, and the second compression mode is used for replacing the first compression mode in the decision maker, so that the coding mode and the compression mode in the decision maker are updated; and finally, based on a decision maker for updating the coding mode and the compression mode, coding and compressing the time sequence data stored in the database. Therefore, the method intelligently selects proper coding and compression methods by periodically analyzing the time sequence data of the real-time database, thereby achieving the purposes of saving storage space and improving query efficiency. The method can effectively utilize the time sequence data characteristics of the real-time database, and can determine and update the coding and compression strategies of the time sequence data in time through intelligent analysis, so that the storage space of the data can be saved, and the query speed of the data can be improved.
As shown in fig. 9, the present application further provides a system for encoding and compressing time-series data, the system comprising:
the initial encoding and compression module 910 is configured to perform initial encoding and initial compression on the time-series data by using a first encoding mode and a first compression mode preset in the decision maker in the process of storing the time-series data in the database;
the period analysis module 920 is configured to perform periodic analysis on the time-series data, and generate multiple coding compression schemes based on a coding mode and a compression mode supported by the database in a cross combination manner after the time-series data stored in the database is greater than or equal to a preset data amount; each coding compression scheme comprises a coding mode and a compression mode;
the score calculating module 930 is configured to count an encoding time, a decoding time, a compression time, a decompression time, a disk writing time, a disk reading time, a processor utilization rate and a compression rate of each encoding compression scheme, and calculate a score value of each encoding compression scheme based on the corresponding weight value;
the screening module 940 is configured to screen a coding mode and a compression mode corresponding to the coding compression scheme with the highest score value, and record the coding mode and the compression mode as a second coding mode and a second compression mode respectively;
An updating and replacing module 950, configured to replace the first coding mode in the decision maker with the second coding mode, and replace the first compression mode in the decision maker with the second compression mode, and update the coding mode and the compression mode in the decision maker;
the dynamic encoding and compression module 960 is configured to encode and compress the time-series data stored in the database according to the decision maker that updates the encoding mode and the compression mode.
Therefore, according to the embodiment, the time sequence data of the real-time database is analyzed periodically, and the proper coding and compression method is selected intelligently, so that the purposes of saving the storage space and improving the query efficiency can be achieved. The embodiment can effectively utilize the time sequence data characteristics of the real-time database, and timely determine and update the coding and compression strategies of the time sequence data through intelligent analysis, so that the storage space of the data can be saved, and the query speed of the data can be improved. The time sequence data in the embodiment comprises data generated by a sensor, a PLC, a DCS, a robot and the like; it can be seen that the time series data described in this embodiment can be derived from various software and hardware data sources for data storage, analysis and intelligent application. The coding and compressing method of the time sequence data described in the embodiment can be applied to the scene of the internet of things, namely, the data generated by the equipment and the software of the internet of things are coded and compressed and then stored in a database; can also be applied to production control scenes, namely, the sensor data of the production control system is encoded and compressed and then stored in a database
In an exemplary embodiment, the process of counting the encoding time, decoding time, compression time, decompression time, writing disk time, reading disk time, processor utilization and compression rate of each encoding compression scheme, and calculating the score value of each encoding compression scheme based on the corresponding weight value includes: score i =W1*(1-TE i )+W2*(1-TD i )+W3*(1-TC i )+W4*(1-TDe i )+W5*(1-TW i )+W6*(1-TR i )+W7*(1-CPU i )+W8*(1-RA i ). In TE i Representing the coding time of the ith coding compression scheme; TD (time division) i Representing the ith coding pressureThe decoding time of the reduction scheme; TC (TC) i Representing the compression time of the ith coding compression scheme; TDe (time Domain reflectometry) i Representing the decompression time of the ith coding compression scheme; TW (twinning tag) i Representing the time to write the disc for the ith encoding compression scheme; TR (TR) i Representing the disk read time of the ith code compression scheme; CPU (Central processing Unit) i Representing the processor utilization of the ith coding compression scheme; RA (RA) i Representing the compression rate of the ith coding compression scheme; w1, W2, W3, W4, W5, W6, W7, W8 respectively represent the weight values corresponding to the encoding time, decoding time, compression time, decompression time, writing disk time, reading disk time, processor utilization, and compression rate, and w1+w2+w3+w4+w5+w6+w7+w8=1.
In an exemplary embodiment, before calculating the score value of each encoding compression scheme based on the corresponding weight value, the present embodiment may further include: carrying out normalization processing on the encoding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization rate and compression rate of each encoding compression scheme by adopting a linear normalization formula; wherein, the linear normalization formula is: Wherein X' represents a normalized value of a certain parameter in the coding compression scheme; x represents an original value before normalization of a certain parameter in the coding compression scheme; x_min represents the minimum value of a certain parameter in the coding compression scheme; x_max represents the maximum value of a certain parameter in the encoding compression scheme. Some parameters in the coding compression scheme include: encoding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization and compression rate.
In an exemplary embodiment, when calculating the score value of each coding compression scheme based on the corresponding weight value, the embodiment may further include: correlating the encoding time, the compression time and the disk writing time to generate a first variable time; and correlating the decoding time, the decompressing time and the disk reading time to generate a second variable time; and counting the first variable time, the second variable time, the processor utilization rate and the compression rate of each coding compression scheme, and calculating the score value of each coding compression scheme based on the corresponding weight value. Specifically, the process of associating the encoding time, the compression time and the disk writing time to generate the first variable time includes: cc=te+tc+tw; the process of associating the decoding time, the decompression time and the disk reading time to generate the second variable time includes: dd=tr+tde+td. Wherein CC represents a first variable time; TE represents the encoding time; TC represents compression time; TW represents disk write time; TR represents a disk read time; TDe represents decompression time; TD represents the decoding time.
In an exemplary embodiment, after storing the time series data in the database, the present embodiment may further include: acquiring time sequence data in a preset time period in a database; taking the coding compression scheme corresponding to the coding mode and the compression mode in the decision maker as an abscissa and the score value corresponding to the coding compression scheme as an ordinate; a decision maker score histogram is generated based on the abscissa and the ordinate. As an example, a decision maker score histogram is shown in fig. 8.
In summary, the present application provides a time series data encoding and compressing system, which performs initial encoding and initial compression on time series data by using a first encoding mode and a first compressing mode preset in a decision maker in the process of storing the time series data into a database; the time sequence data is periodically analyzed, and after the time sequence data stored in the database is greater than or equal to the preset data quantity, a plurality of coding compression schemes are generated based on the coding mode and the compression mode supported by the database in a cross combination mode; each coding compression scheme comprises a coding mode and a compression mode; counting the coding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization rate and compression rate of each coding compression scheme, and calculating the score value of each coding compression scheme based on the corresponding weight value; screening the coding mode and the compression mode corresponding to the coding compression scheme with the highest score value, and respectively marking the coding mode and the compression mode as a second coding mode and a second compression mode; then the second coding mode is used for replacing the first coding mode in the decision maker, and the second compression mode is used for replacing the first compression mode in the decision maker, so that the coding mode and the compression mode in the decision maker are updated; and finally, based on a decision maker for updating the coding mode and the compression mode, coding and compressing the time sequence data stored in the database. Therefore, the system intelligently selects proper coding and compression methods by periodically analyzing the time sequence data of the real-time database, thereby achieving the purposes of saving storage space and improving query efficiency. The system can effectively utilize the time sequence data characteristics of the real-time database, and timely determine and update the coding and compression strategies of the time sequence data through intelligent analysis, so that the storage space of the data can be saved, and the query speed of the data can be improved.
It should be noted that, the encoding and compression system of the time series data provided by the above embodiment and the encoding and compression method of the time series data provided by the above embodiment belong to the same concept, and the specific manner in which each module and unit perform the operation has been described in detail in the method embodiment, which is not repeated here. In practical application, the encoding and compression system for time series data provided in the above embodiment may allocate the functions to different functional modules according to needs, that is, the internal structure of the system is divided into different functional modules to complete all or part of the functions described above, which is not limited herein.
The embodiment of the application also provides a device for encoding and compressing time sequence data, which can comprise: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the encoding and compression method of time series data described in fig. 2. Fig. 10 shows a schematic structure of a time-series data encoding and compressing apparatus 1000. Referring to fig. 10, the encoding and compression apparatus 1000 of time-series data includes: processor 1010, memory 1020, power supply 1030, display unit 1040, and input unit 1060.
The processor 1010 is a control center of the time series data encoding and compression apparatus 1000, connects respective components using various interfaces and lines, and performs various functions of the time series data encoding and compression apparatus 1000 by running or executing software programs and/or data stored in the memory 1020, thereby performing overall monitoring of the time series data encoding and compression apparatus 1000. In an embodiment of the present application, the processor 1010 executes the encoding and compression methods of the time series data as described in fig. 2 when it invokes a computer program stored in the memory 1020. In the alternative, processor 1010 may include one or more processing units; preferably, the processor 1010 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. In some embodiments, the processor, memory, may be implemented on a single chip, and in some embodiments, they may be implemented separately on separate chips.
The memory 1020 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, various applications, etc.; the storage data area may store data created according to the encoding of time-series data and the use of the compression apparatus 1000, etc. In addition, memory 1020 may include high-speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state memory device, among others.
The encoding and compression device 1000 for time series data further includes a power source 1030 (e.g., a battery) for powering the various components, which may be logically connected to the processor 1010 via a power management system so as to perform functions such as managing charge, discharge, and power consumption via the power management system.
The display unit 1040 may be used for displaying information input by a user or information provided to the user, various menus of the encoding and compression apparatus 1000 for time series data, and the like, and is mainly used for displaying a display interface of each application in the encoding and compression apparatus 1000 for time series data and objects such as texts and pictures displayed in the display interface in the embodiment of the present application. The display unit 1040 may include a display panel 1050. The display panel 1050 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an Organic Light-Emitting Diode (OLED), or the like.
The input unit 1060 may be used to receive information such as numbers or characters input by a user. The input unit 1060 may include a touch panel 1070 and other input devices 1080. Wherein the touch panel 1070, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 1070 or thereabout by using any suitable object or accessory such as a finger, a stylus, etc.).
Specifically, the touch panel 1070 may detect a touch operation by a user, detect signals resulting from the touch operation, convert the signals into coordinates of contacts, send the coordinates to the processor 1010, and receive and execute commands sent from the processor 1010. In addition, the touch panel 1070 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. Other input devices 1080 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, power on and off keys, etc.), a trackball, mouse, joystick, etc.
Of course, the touch panel 1070 may overlay the display panel 1050, and when a touch operation is detected on or near the touch panel 1070, the touch operation is transmitted to the processor 1010 to determine the type of touch event, and then the processor 1010 provides a corresponding visual output on the display panel 1050 according to the type of touch event. Although in fig. 10, the touch panel 1070 and the display panel 1050 are used as two independent components to implement the input and output functions of the time series data encoding and compressing apparatus 1000, in some embodiments, the touch panel 1070 and the display panel 1050 may be integrated to implement the input and output functions of the time series data encoding and compressing apparatus 1000.
The encoding and compression apparatus 1000 of the time series data may also include one or more sensors, such as pressure sensors, gravitational acceleration sensors, proximity light sensors, and the like. Of course, the above-described time series data encoding and compression device 1000 may also include other components such as cameras, as desired in a particular application.
Embodiments of the present application also provide a computer readable storage medium having instructions stored therein, which when executed by one or more processors, enable the apparatus to perform the method for encoding and compressing time series data according to the present application as described in fig. 2.
It will be appreciated by those skilled in the art that fig. 10 is merely an example of a time series data encoding and compression apparatus and is not limiting of the apparatus, and the apparatus may include more or less components than illustrated, or may combine some components, or may be different components. For convenience of description, the above parts are described as being functionally divided into modules (or units) respectively. Of course, in implementing the present application, the functions of each module (or unit) may be implemented in the same piece or pieces of software or hardware.
It will be appreciated by those skilled in the art that the application can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application, which are desirably implemented by computer program instructions, each flowchart and/or block diagram illustration, and combinations of flowchart illustrations and/or block diagrams. These computer program instructions may be applied to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be understood that although the terms first, second, third, etc. may be used to describe the preset ranges, etc. in the embodiments of the present application, these preset ranges should not be limited to these terms. These terms are only used to distinguish one preset range from another. For example, a first preset range may also be referred to as a second preset range, and similarly, a second preset range may also be referred to as a first preset range without departing from the scope of embodiments of the present application.
The above embodiments are merely illustrative of the principles of the present application and its effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the application. Accordingly, it is intended that all equivalent modifications and variations of the application be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims (10)

1. A method of encoding and compressing time series data, the method comprising the steps of:
in the process of storing time sequence data into a database, using a first coding mode and a first compression mode which are preset in a decision maker to perform initial coding and initial compression on the time sequence data;
Periodically analyzing the time sequence data, and generating a plurality of coding compression schemes by cross combination based on a coding mode and a compression mode supported by the database after the time sequence data stored by the database is greater than or equal to a preset data volume; each coding compression scheme comprises a coding mode and a compression mode;
counting the coding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization rate and compression rate of each coding compression scheme, and calculating the score value of each coding compression scheme based on the corresponding weight value;
the coding mode and the compression mode corresponding to the coding compression scheme with the highest screening score value are respectively marked as a second coding mode and a second compression mode;
replacing a first coding mode in the decision maker by using the second coding mode, and replacing the first compression mode in the decision maker by using the second compression mode to update the coding mode and the compression mode in the decision maker;
and based on a decision maker for updating the coding mode and the compression mode, coding and compressing the time sequence data stored in the database.
2. The method of encoding and compressing time series data as recited in claim 1, wherein the process of counting the encoding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization and compression rate of each encoding and compression scheme and calculating the score value of each encoding and compression scheme based on the corresponding weight value comprises:
Score i =W1*(1-TE i )+W2*(1-TD i )+W3*(1-TC i )+W4*(1-TDe i )+W5*(1-TW i )+W6*(1-TR i )+W7*(1-CPU i )+W8*(1-RA i );
in TE i Representing the coding time of the ith coding compression scheme;
TD i representing the decoding time of the ith coding compression scheme;
TC i representing the compression time of the ith coding compression scheme;
TDe i representing the decompression time of the ith coding compression scheme;
TW i representing the time to write the disc for the ith encoding compression scheme;
TR i representing the disk read time of the ith code compression scheme;
CPU i representing the processor utilization of the ith coding compression scheme;
RA i representing the compression rate of the ith coding compression scheme;
w1, W2, W3, W4, W5, W6, W7, W8 respectively represent the weight values corresponding to the encoding time, decoding time, compression time, decompression time, writing disk time, reading disk time, processor utilization, and compression rate, and w1+w2+w3+w4+w5+w6+w7+w8=1.
3. The method of encoding and compressing time series data according to claim 1, wherein before calculating the score value of each encoding compression scheme based on the corresponding weight value, the method further comprises:
Carrying out normalization processing on the encoding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization rate and compression rate of each encoding compression scheme by adopting a linear normalization formula; wherein, the linear normalization formula is:
wherein X' represents a normalized value of a certain parameter in the coding compression scheme;
x represents an original value before normalization of the certain parameter in the coding compression scheme;
x_min represents the minimum value of the certain parameter in the coding compression scheme;
x_max represents the maximum value of the certain parameter in the encoding compression scheme.
4. A method of encoding and compressing time series data according to any one of claims 1 to 3, wherein when calculating the score value of each encoding compression scheme based on the corresponding weight value, the method further comprises:
correlating the encoding time, the compression time and the disk writing time to generate a first variable time; the method comprises the steps of,
correlating the decoding time, the decompressing time and the disk reading time to generate a second variable time;
and counting the first variable time, the second variable time, the processor utilization rate and the compression rate of each coding compression scheme, and calculating the score value of each coding compression scheme based on the corresponding weight value.
5. The method of encoding and compressing time series data as recited in claim 4, wherein the process of associating the encoding time, the compressing time and the writing disk time to generate the first variable time includes:
CC=TE+TC+TW;
the decoding time, the decompressing time and the disk reading time are associated, and the process of generating the second variable time comprises the following steps:
DD=TR+TDe+TD;
wherein CC represents a first variable time;
TE represents the encoding time;
TC represents compression time;
TW represents disk write time;
DD represents a second variable time;
TR represents a disk read time;
TDe represents decompression time;
TD represents the decoding time.
6. A method of encoding and compressing time series data according to any one of claims 1 to 3, wherein after storing the time series data in a database, the method further comprises:
acquiring time sequence data in a preset time period in the database;
taking the coding compression scheme corresponding to the coding mode and the compression mode in the decision maker as an abscissa and the score value corresponding to the coding compression scheme as an ordinate; a decision maker score histogram is generated based on the abscissa and the ordinate.
7. A method of encoding and compressing time series data according to any one of claims 1 to 3, wherein the time series data includes data columns recorded in time series by the same index; the data in the same data column have the same caliber and are comparable.
8. A system for encoding and compressing time series data, said system comprising:
the initial coding and compressing module is used for carrying out initial coding and initial compression on the time sequence data by utilizing a first coding mode and a first compressing mode which are preset in the decision maker in the process of storing the time sequence data into the database;
the period analysis module is used for periodically analyzing the time sequence data, and generating a plurality of coding compression schemes by cross combination based on a coding mode and a compression mode supported by the database after the time sequence data stored in the database is greater than or equal to a preset data volume; each coding compression scheme comprises a coding mode and a compression mode;
the score calculation module is used for counting the coding time, decoding time, compression time, decompression time, disk writing time, disk reading time, processor utilization rate and compression rate of each coding compression scheme, and calculating the score value of each coding compression scheme based on the corresponding weight value;
the screening module is used for screening the coding mode and the compression mode corresponding to the coding compression scheme with the highest score value, and the coding mode and the compression mode are respectively marked as a second coding mode and a second compression mode;
An updating and replacing module, configured to replace a first coding mode in the decision maker with the second coding mode, and replace a first compression mode in the decision maker with the second compression mode, and update the coding mode and the compression mode in the decision maker;
and the dynamic coding and compressing module is used for coding and compressing the time sequence data stored in the database according to the decision maker for updating the coding mode and the compressing mode.
9. An apparatus for encoding and compressing time series data, comprising:
a processor; and, a step of, in the first embodiment,
a computer readable medium storing instructions that, when executed by the processor, cause the apparatus to perform the method of encoding and compressing time series data as claimed in any one of claims 1 to 7.
10. A computer readable medium having instructions stored thereon, the instructions being loaded by a processor and performing the method of encoding and compressing time series data according to any one of claims 1 to 7.
CN202310684878.0A 2023-06-09 2023-06-09 Time sequence data coding and compressing method, system, equipment and medium Pending CN116680269A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310684878.0A CN116680269A (en) 2023-06-09 2023-06-09 Time sequence data coding and compressing method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310684878.0A CN116680269A (en) 2023-06-09 2023-06-09 Time sequence data coding and compressing method, system, equipment and medium

Publications (1)

Publication Number Publication Date
CN116680269A true CN116680269A (en) 2023-09-01

Family

ID=87778818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310684878.0A Pending CN116680269A (en) 2023-06-09 2023-06-09 Time sequence data coding and compressing method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN116680269A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117278046A (en) * 2023-09-18 2023-12-22 武汉含秋数据科技有限公司 Time sequence data compression storage method and device, electronic equipment and storage medium
CN117555494A (en) * 2024-01-12 2024-02-13 南京荧火泰讯信息科技有限公司 Coding management system for signal processing board

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117278046A (en) * 2023-09-18 2023-12-22 武汉含秋数据科技有限公司 Time sequence data compression storage method and device, electronic equipment and storage medium
CN117278046B (en) * 2023-09-18 2024-06-11 武汉含秋数据科技有限公司 Time sequence data compression storage method and device, electronic equipment and storage medium
CN117555494A (en) * 2024-01-12 2024-02-13 南京荧火泰讯信息科技有限公司 Coding management system for signal processing board
CN117555494B (en) * 2024-01-12 2024-03-22 南京荧火泰讯信息科技有限公司 Coding management system for signal processing board

Similar Documents

Publication Publication Date Title
CN116680269A (en) Time sequence data coding and compressing method, system, equipment and medium
CN102112986B (en) Efficient large-scale processing of column based data encoded structures
WO2018132414A1 (en) Data deduplication using multi-chunk predictive encoding
CN103995887A (en) Bitmap index compressing method and bitmap index decompressing method
Xu et al. An adaptive algorithm for online time series segmentation with error bound guarantee
CN116506073B (en) Industrial computer platform data rapid transmission method and system
WO2020204904A1 (en) Learning compressible features
CN103729429A (en) Hbase based compression method
EP3872703A2 (en) Method and device for classifying face image, electronic device and storage medium
CN109428602A (en) A kind of data-encoding scheme, device and storage medium
Zou et al. Performance optimization for relative-error-bounded lossy compression on scientific data
CN104125475A (en) Multi-dimensional quantum data compressing and uncompressing method and apparatus
US20200242467A1 (en) Calculation method and calculation device for sparse neural network, electronic device, computer readable storage medium, and computer program product
Yin et al. Llm as a system service on mobile devices
CN102811113A (en) Character-type message compression method
CN103455434A (en) Method and system for establishing cache directory
Eldstål-Ahrens et al. L2C: Combining lossy and lossless compression on memory and I/O
CN115811317A (en) Stream processing method and system based on self-adaptive non-decompression direct calculation
CN112054805B (en) Model data compression method, system and related equipment
KR20240011778A (en) Dynamic activation sparsity in neural networks
CN103809933A (en) Reconfigurable instruction encoding method, execution method and electronic device
Kim et al. Low-overhead compressibility prediction for high-performance lossless data compression
CN113609313A (en) Data processing method and device, electronic equipment and storage medium
CN116540990B (en) Code integration method and device for realizing electronic product based on embedded mode
CN118338006A (en) Transform coefficient encoding method, transform coefficient decoding method and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination