CN110868221B

CN110868221B - Multi-mode data automatic compression method

Info

Publication number: CN110868221B
Application number: CN201911054526.7A
Authority: CN
Inventors: 张可; 柴毅; 叶胜强
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2019-10-31
Filing date: 2019-10-31
Publication date: 2021-06-25
Anticipated expiration: 2039-10-31
Also published as: CN110868221A

Abstract

A multi-modal data automatic compression method comprises the following steps: collecting data, dividing the data into a plurality of modes according to data types, and searching a result corresponding to each mode data in a data compression standard library; if the data of each mode contains the type of the data which is not stored, or the result corresponding to the data of each mode contains the preset compression ratio in the data compression standard library, modeling and calculating the optimal compression ratio of each mode, updating the calculated result into the corresponding result in the data compression standard library, replacing the optimal compression ratio or the preset compression ratio, and executing compression. It can provide an efficient compression scheme for multimodal data.

Description

Multi-mode data automatic compression method

Technical Field

The invention relates to a data compression method, in particular to a multi-mode data automatic compression method.

Background

Currently, a multi-terminal distributed engineering system is widely applied in various fields, and such system objects usually have a plurality of terminals located on the site for basic work. Each terminal interacts with a remote control center through a distributed network or a centralized network, the functions of the terminals are relatively single, and the terminals receive remote supervision and have high self-organization performance in self operation. Because various types of operation data generated by each terminal in the operation period can be directly used for reflecting the overall state characteristics of the system, the data usually need to be recorded in detail and organized orderly, and transmitted to a remote decision control center according to the period or the requirement, so as to facilitate overall decision, and the requirements of high importance and timeliness and real-time transmission as much as possible need to be met.

However, since the data storage condition of the field terminal is only the remote transmission condition (especially for the decentralized network), this causes an obstacle to the remote overall decision using large-scale data support, and its difficulties mainly include two:

1. the data size is too large. Currently, there are many evaluation indexes for each terminal function point, including measurement values generated by inspection means, signals recorded by instruments, image and video type unstructured data, report results presented in text form, narrative data, metadata text, and the like. The operation monitoring data collected by the sensor in unit time is greatly increased, and if the working condition of one operation period is reflected, a large amount of storage space is consumed for the data scale, and remote real-time transmission is not convenient;

2. the data mode is various. Due to functional diversity, each terminal also comprises a plurality of subsystems at lower levels, so that the data types for monitoring the operation of the terminal have uncertain dimensions (the monitoring data have different semantics) while having various modes, wherein the data types comprise continuous type, discrete type, enumerated type, Boolean type, structural body, ordered key value set, text file, binary file, class and the like, and the data are difficult to organize according to certain specifications so as to be convenient for decision making.

Data compression is a technical method capable of reducing data scale and orderly organizing, and a solution is provided for saving storage space by adopting a compression algorithm to reduce the proportion of mass data. The compression efficiency is closely related to the data compression ratio, but is influenced by uncertainty of both the modality and the dimension of the monitored data, the high compression ratio may not meet the requirements of equal-time-length data, equal-size scale and periodic division, and if the high compression ratio is according to the predefined fixed compression ratio, the waste of computing resources and transmission resources may be obviously caused, and the orderly organization of the data is also not facilitated.

Disclosure of Invention

The invention aims to provide an automatic multi-modal data compression method, which can provide an efficient compression scheme for multi-modal data.

The purpose of the invention is realized by the technical scheme, and the specific method comprises the following steps:

collecting data, dividing the data into a plurality of modes according to data types, and searching a result corresponding to each mode data in a data compression standard library;

if the modal data belong to the stored data types in the data compression standard library and the result corresponding to the modal data is the updated optimal compression ratio, the optimal compression ratio corresponding to the modal data is obtained, and compression is executed;

if the data of each mode contains the type of the data which is not stored, or the result corresponding to the data of each mode contains the preset compression ratio in the data compression standard library, modeling and calculating the optimal compression ratio of each mode, updating the calculated result into the corresponding result in the data compression standard library, replacing the optimal compression ratio or the preset compression ratio, and executing compression.

Further, the data acquisition is to acquire multi-modal data generated in one period and detect that the data size is D; the data are classified into n data modes and the data size corresponding to the data acquisition is D_i(i＝1,2,L,n)。

Further, the data compression specification library gives out the corresponding most effective lossless compression algorithm and expected compression ratio according to the type of the modality

And unit data compression period function T corresponding to the compression algorithm_i＝f_i(p_i) (i ═ 1,2, L, n), where i denotes the i-th mode, p_iIndicating the compression ratio corresponding to the ith mode.

Further, the specific method for calculating the optimal compression ratio of each mode by modeling comprises the following steps:

setting a compression standard expected target, and setting an evaluation index parameter of the compression standard expected target; selecting a compression efficiency index

As the expected target of the compression standard, the expected compression time consumption t and the expected data compression ratio h are used as evaluation index parameters, wherein i represents the ith mode and p_iShows the compression ratio and D corresponding to the i-th mode_iData size T representing the ith mode_i＝f_i(p_i) Indicating the unit data compression period corresponding to the compression algorithm;

introducing dynamically adjustable parameters into a compression algorithm to control the overall compression efficiency of data, wherein the dynamically adjustable parameters are compression ratios;

establishing an optimization control model to meet the expected target of a compression specification:

wherein (i ═ 1,2, L, n)

In the formula, T_i＝f_i(p_i) Representing a unit data compression period function corresponding to the compression algorithm;

the compression ratio of each modal data is used as a decision variable to feedback control the overall compression efficiency of the data, and the compression ratio is made to accord with the compression expected target through optimization and adjustment; compression ratio p_iControlled at maximum compression ratio P_maxAnd minimum compression ratio of and_minwithin the range, setting the acceptable compression time consumption as t and the compressed data ratio h, and establishing the constraint conditions existing in the data compression process:

wherein, t_compressTime is consumed for data integral compression; d_iThe data scale of the ith mode is shown; t is_iCompressing period for the ith mode data unit; q_iThe compression efficiency index of the ith modal data;

and controlling the overall compression efficiency index by taking the compression ratio as a decision variable according to the control optimization model and the constraint condition, and solving the optimal compression ratio and the minimum overall compression index according to an optimization target.

Further, according to the constraint conditions, solving the control optimization model by adopting a genetic algorithm to obtain the optimal compression ratio, and the specific steps are as follows:

s1, encoding: compression ratio range according to each mode is [ P ]_min,P_max]Using binary codes of length k, total 2^kDifferent codes are coded, adjacent codes are spaced by

S2, generation of an initial population: randomly generating N string structure data as an initial population to start evolution, namely generating N initial compression ratios p (p ═ p) coded by binary₁,p₂,…,p_n]) As an initial population;

s3, fitness evaluation: selecting a fitness function as [ Q, t ]_compress,D_compress]Calculating the fitness function value of each initial individual in the population;

s4, natural selection: fitness function value [ t ] in an individual_compress,D_compress]Individuals which meet the constraint condition and have Q smaller than the average value of the population are reserved and used as the individuals with strong adaptability to be added into a new population;

s5, crossover and mutation: the crossing is to exchange partial codes between individuals, the variation is to randomly select an individual in the population and randomly change a certain character in the code with a small probability, and the purpose of the crossing and the variation is to obtain a new individual to be added into a new population;

s6, whether evolution is stopped: termination conditions were as follows: the evolution times are set times or the variance of the population fitness Q is smaller than the set variance; if the termination condition is met, stopping evolution, otherwise, returning the updated population to the step S3;

s7, decoding: and converting the selected and left optimal individual into an original parameter through coding to serve as an optimal compression ratio, and obtaining optimal fitness at the same time.

Further, the specific method for updating the calculated result to the corresponding result in the data compression specification library to replace the optimal compression ratio or the preset compression ratio comprises the following steps:

the optimal compression ratio p of each modal obtained by solution_iAnd the initial compression ratio in the database

Comparing, namely comparing in a compression efficiency index mode; if it is

The initial expected compression ratio in the specification library is updated and marked

Compressing the data by adopting the optimal compression ratio; otherwise, the original or preset compression ratio is kept and marked as

And compressing the data by adopting the original or preset compression ratio.

Due to the adoption of the technical scheme, the invention has the following advantages:

the invention provides a compression scheme for multi-modal data by adopting a database application mode, and calculates the optimal compression ratio through an original modeling formula and a calculation mode, thereby ensuring that the optimal compression ratio provided in the database is the compression scheme with the highest efficiency. The compression efficiency can be improved by introducing a compression mode of a dynamic compression ratio, which aims at different data modes and scales, the compression time consumption is related to the data compression degree, and the compression ratio suitable for a single data mode cannot be qualified for other data modes; the compression mode of introducing the dynamic compression ratio can provide powerful support for subsequent information processing, because for a large-scale data production system, output data of the system are monitored and fed back to a control center, so that real-time data feedback and online data support are provided for a decision layer

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof.

Drawings

The drawings of the present invention are described below.

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of the optimal process by genetic algorithm according to the present invention.

Detailed Description

The invention is further illustrated by the following figures and examples.

A method for automatically compressing multi-modal data, as shown in FIG. 1, includes the following steps:

s1: establishing a predefined data compression standard library adapted to a security information system according to prior knowledge such as the mode of monitoring data generated by the system;

s2: detecting the mode and scale of monitoring data generated by a security information system;

s3: selecting a corresponding applicable lossless compression algorithm from a predefined data compression standard library;

s4: dynamically adjustable parameters are introduced into a compression algorithm to control the overall compression efficiency of data;

s5: establishing an optimization model and constraint conditions which take the overall compression efficiency as the best;

s6: the compression ratio is used as a decision variable to control the overall data compression efficiency, so that the optimal compression ratio is solved;

s7: and feeding back the corresponding compression method and compression ratio of the various modal data to a predefined data compression specification library and updating.

S8: and updating and marking all the predefined compression ratios in the predefined data compression specification library.

S9: and compressing the data in the same mode by adopting the updated compression ratio until the new type of data appears.

The method comprises the steps of establishing a predefined data compression standard library suitable for a security information system, wherein the predefined data compression standard library is established according to the data type and the data characteristics output by the system, and the database comprises the most effective lossless compression algorithm and the expected compression ratio corresponding to different data modes

And unit data compression period function T corresponding to the compression algorithm_i＝f_i(p_i) (i ═ 1,2, L, n) and the like, where i denotes the i-th mode, p_iIndicating the compression ratio corresponding to the ith mode. The information form of the database is shown in the following table:

information storage form table of data compression standard library

Modality of data	Optimal compression algorithm	Compressing a periodic function T	Anticipated compression ratio
				Numerical value	Compression algorithm 1	T₁＝f₁(p)	P₁ ⁽⁰⁾
Switching value	Compression algorithm 2	T₂＝f₂(p)	P₂ ⁽⁰⁾
				……	……	……	……

The type and scale of the detected data are that multi-mode monitoring data are output by aiming at a security monitoring system to obtain n data types and corresponding data scale D_i(i＝1,2,L,n)。

Selecting a corresponding applicable lossless compression algorithm according to the predefined data compression specification library created in S1Lossless compression algorithm for data modality and corresponding compression time-consuming function T_i＝f_i(p_i)(i＝1,2,L,n)。

Dynamic compression is introduced, optimization of data compression storage in a traditional mode is mainly realized by improving the compression ratio of a compression algorithm, but for the same type of compression algorithm, the improvement of the compression ratio inevitably increases compression time consumption, and on the basis, the compression ratios of different modal data of the compression algorithm need to be adjusted, so that the compression efficiency of multi-modal data is maximized, and the compression efficiency of the multi-modal data needs to be adjusted by introducing a dynamically adjustable compression ratio into the compression algorithm.

The optimization model is established, the compression degree of the monitoring data is not larger and better after a compression algorithm is introduced, the high compression ratio inevitably causes sudden increase of compression time, and meanwhile, the compression time consumption of different data types is inconsistent. Therefore, an optimization model with the overall compression efficiency index as an optimal target needs to be established. The following objective function can be obtained:

wherein (i ═ 1,2, L, n)

Thus, D is accounted for in different modalities_iThe optimization model has only one decision variable p under certain conditions_i；

Compression ratio p_iControlled at [1.5,50 ]]Within the range, setting the acceptable compression time consumption as t and the compressed data ratio h, and simultaneously setting the overall ratio D of the multi-modal data_totalIt is certain, so the following constraints exist in the data compression process:

wherein, t_compressTime consuming data whole compression; d_compressThe size of the data after integral compression is obtained; d_i-the data size of the ith modality; t is_i-compression period of the ith modality data unit; q_i-compression efficiency index of the ith modality data.

And solving the optimal compression ratio of the optimization model, namely controlling the overall compression efficiency index by taking the compression ratio as a decision variable according to the established optimization model and the constraint condition, wherein when the target, namely the overall compression efficiency index, reaches the minimum, the compression ratios corresponding to various modes are the optimal compression ratio. In order to solve the optimal compression ratio and the minimum overall compression index more accurately and effectively, the invention adopts an evolutionary algorithm, namely a genetic algorithm, to solve the nonlinear programming problem.

The flow chart of the genetic algorithm is shown in FIG. 2, and the specific steps are as follows:

s61: and (3) encoding: compression ratio range according to each mode is [1.5,50 ]]Using binary codes of length k, total 2^kDifferent codes are coded, adjacent codes are spaced by

S62: generation of an initial population: randomly generating N string structure data as an initial population to start evolution, namely generating N initial compression ratios p (p ═ p) coded by binary₁,p₂,…,p_n]) As an initial population;

s63: and (3) fitness evaluation: selecting a fitness function as [ Q, t ]_compress,D_compress]Calculating the fitness function value of each initial individual in the population;

s64: selecting: fitness function value [ t ] in an individual_compress,D_compress]Individuals which meet the constraint condition and have Q smaller than the average value of the population are reserved and used as the individuals with strong adaptability to be added into a new population;

s65: crossover and mutation: crossover is the exchange of partial codes among individuals, and mutation is the random selection of an individual in a population to randomly change a certain character in the code with little probability. The purpose of crossover and mutation is to obtain new individuals to be added into a new population;

s66: whether to stop the evolution: termination conditions were as follows: the evolution times are the set times or the variance of the population fitness Q is smaller than the set variance. If the termination condition is met, stopping evolution, otherwise, returning the updated population to the step S63;

s67: and (3) decoding: and converting the selected and left optimal individual into an original parameter through coding to serve as an optimal compression ratio, and obtaining optimal fitness at the same time.

Further, the predefined data compression specification library is updated, and the optimal compression ratio p is solved according to the S6_iWith the initial expected compression ratio in the specification library

And comparing the compression efficiency indexes so as to judge whether to update the data compression ratio.

The method comprises the following specific steps:

s71: the optimal compression ratio p of each modal solved by S6_iAnd the initial compression ratio in the database

Comparing, namely comparing in a compression efficiency index mode;

s72: if it is

Compressing the data by adopting the optimal compression ratio; otherwise, the original initial expected compression ratio is retained and marked

The data is compressed using the initial expected compression ratio.

S73: and updating the expected compression ratio in the data compression specification library according to the judgment result obtained in the S72, and providing decision assistance for the next compression control.

Further, the updating and marking of the expected compression ratio in the predefined data compression specification library means that all modal data of the predefined data specification library in the security information system are solved through an optimization model to obtain an optimal compression ratio, the compression ratios are updated into the specification library, and the mark of S7 is marked into a corresponding position.

Until all modal data are solved into the optimal compression ratio through the optimization model

Further, the same type of data is compressed by adopting the updated compression ratio, namely, aiming at multi-mode data to be compressed and stored in the security system, the data is compressed by updating the compression algorithm and the compression ratio in the standard library according to different mode matching without recalculating the optimal compression ratio, so that the compression decision step is greatly simplified, and the mass data compression efficiency and the storage efficiency of the security system are improved.

The method comprises the following specific steps:

s91: obtaining the modal type n and the data scale D of each modal of the multi-modal data set to be compressed and stored_i(i＝1,2,L,n)；

S92: if the n modes in the data set are marked in the data compression standard library, compressing the data set by adopting a data compression algorithm and a compression ratio matched with the data compression standard library;

s93: if there are both the number m of data modalities marked in the data compression specification library and the unlabeled data modality type (n-m) in the data set, the whole data set is substituted into the mathematical model established in step S5 to obtain the optimal compression ratio p of each modality_i(i＝1,2,L,n)；

S94: corresponding compression ratio p of unmarked data modality_iAnd (i ═ m +1, m +2, L, n) is updated to the data compression specification library. Compression ratio p of simultaneous to-be-marked data modality_iAnd (i is 1,2, L, m) and the corresponding compression ratio in the database are compared in a compression efficiency index mode, and the corresponding compression ratio in the database is updated according to the comparison result.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A multi-modal data automatic compression method is characterized by comprising the following specific steps:

if the modal data belong to the stored data types in the data compression standard library and the result corresponding to the modal data is the updated optimal compression ratio, executing compression according to the optimal compression ratio corresponding to the modal data;

if the data of each mode contains the type of the data which is not stored, or the result corresponding to the data of each mode contains the preset compression ratio in the data compression standard library, modeling and calculating the optimal compression ratio of each mode, comparing the mode of pressing the compression efficiency index of the calculated result with the preset compression ratio, and updating the optimal compression ratio of the data of each mode into the corresponding result in the data compression standard library to replace the preset compression ratio and execute compression when the compression efficiency index of the optimal compression ratio is greater than the compression efficiency index of the preset compression ratio;

the data compression standard library gives out the most effective lossless compression algorithm and the preset compression ratio P according to the type of the mode_i ⁽⁰⁾And unit data compression period function T corresponding to the compression algorithm_i＝f_i(p_i) 1,2, …, n, where i denotes the i-th mode, p_iRepresenting the compression ratio corresponding to the ith mode;

the specific method for calculating the optimal compression ratio of each mode by modeling comprises the following steps:

As the expected target of the compression specification, the expected compression time consumption t and the expected data compression ratio eta are used as evaluation index parameters, wherein i represents the ith mode and p_iShows the compression ratio and D corresponding to the i-th mode_iData size T representing the ith mode_i＝f_i(p_i) Indicating the unit data compression period corresponding to the compression algorithm;

in the formula, T_i＝f_i(p_i) Expressing a unit data compression period function corresponding to the compression algorithm, wherein Q represents a compression efficiency index of the whole data, and is the sum of data compression efficiency indexes corresponding to all data modes in form;

the compression ratio of each modal data is used as a decision variable to feedback control the overall compression efficiency of the data, and the compression ratio is made to accord with the compression expected target through optimization and adjustment; compression ratio p_iControlled at maximum compression ratio P_maxAnd minimum compression ratio of and_minwithin the range, setting the acceptable compression time as t and the expected data compression ratio eta, and establishing the constraint conditions existing in the data compression process:

according to the control optimization model and the constraint conditions, the compression ratio is used as a decision variable to control the overall compression efficiency index, and the optimal compression ratio and the minimum overall compression index are solved according to the optimization target;

according to the constraint conditions, solving the control optimization model by adopting a genetic algorithm to obtain the optimal compression ratio, and the method comprises the following specific steps:

2. The method of claim 1, wherein the collecting data is collecting multi-modal data generated in one cycle and detecting the size of the multi-modal data as D; the data are classified into n data modes and the data size corresponding to the data acquisition is D_iAnd i is 1,2, …, n, wherein i represents the data of the ith mode, and n represents the total number of types of data modes.

3. The method according to claim 1, wherein the step of updating the calculated result into the corresponding result in the data compression specification library to replace the optimal compression ratio or the preset compression ratio comprises the following steps:

the optimal compression ratio p of each modal obtained by solution_iWith a preset compression ratio P in the database_i ⁽⁰⁾Comparing, namely comparing in a compression efficiency index mode; if it is

The initial preset compression ratio in the specification library is updated and the value of the initial preset compression ratio is replaced with P_i ⁽¹⁾Compressing the data by adopting the optimal compression ratio; if it is

Keeping the original initial preset compression ratio, and compressing the data by adopting the initial preset compression ratio; p_i ⁽⁰⁾Represents a preset compression ratio corresponding to the i-th mode, and P_i ⁽¹⁾Representing the optimal compression ratio for the ith modality that needs to be updated in the specification library.