CN110688385A - Data processing method and electronic equipment - Google Patents
Data processing method and electronic equipment Download PDFInfo
- Publication number
- CN110688385A CN110688385A CN201910933439.2A CN201910933439A CN110688385A CN 110688385 A CN110688385 A CN 110688385A CN 201910933439 A CN201910933439 A CN 201910933439A CN 110688385 A CN110688385 A CN 110688385A
- Authority
- CN
- China
- Prior art keywords
- data
- processing
- target
- determining
- data block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 19
- 238000000034 method Methods 0.000 claims abstract description 28
- 230000006837 decompression Effects 0.000 claims description 11
- 238000007667 floating Methods 0.000 description 11
- 241000282575 Gorilla Species 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 238000003491 array Methods 0.000 description 3
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000007493 shaping process Methods 0.000 description 2
- 230000003139 buffering effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2291—User-Defined Types; Storage management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Fuzzy Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The application provides a data processing method and electronic equipment, wherein the method comprises the following steps: determining target time series data, and dividing the target time series data into at least two data blocks; determining data characteristics of the data block; determining a target processing mode matched with the data characteristics of the data block in a plurality of processing modes; processing the data block by adopting the target processing mode; therefore, in the application, for the target time series data, a matched processing mode can be allocated to each data block according to the characteristics of the data block in a mode of dividing the data block, that is, the same target time series data can adopt a plurality of processing modes, so that the processing of variable time series data is effectively realized, and the processing flexibility is improved.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method and an electronic device.
Background
Time series data is usually mass data, and needs to be processed by adopting a processing mode, and at present, a plurality of processing modes are available.
In a specific processing process, a processing algorithm is generally automatically selected for time series data in the implementation of a current database, that is, only one fixed processing mode can be adopted for time series data of an object, but each processing mode has different processing effects on the same time series data, so that the mode is not suitable for a scene that the same time series data is variable at different times, for example, when the same time series data has different data characteristics at different time periods, the processing on the variable time series data cannot be effectively realized.
Disclosure of Invention
In view of the above, the present application provides a data processing method and an electronic device to solve the above technical problems.
In order to achieve the above purpose, the present application provides the following technical solutions:
a data processing method, comprising:
determining target time series data, and dividing the target time series data into at least two data blocks;
determining data characteristics of the data block;
determining a target processing mode matched with the data characteristics of the data block in a plurality of processing modes;
and processing the data block by adopting the target processing mode.
Preferably, the dividing the target time-series data into at least two data blocks includes:
acquiring the target time sequence data into data blocks in each time period according to a preset time period;
or, acquiring a data block with a preset data size from the target time series data according to the preset data size.
Preferably, the method further comprises the following steps:
determining a data type of the target time series data;
correspondingly, the determining a target processing manner matched with the data characteristics of the data block includes:
and determining a target processing mode matched with the data type and the data characteristics of the data block.
Preferably, the method further comprises the following steps:
and adding processing information representing the target processing mode to the data block.
Preferably, the processing the data block by using the target processing manner includes: compressing the data block by the target processing mode;
correspondingly, the method further comprises the following steps:
and during decompression, determining the target processing mode based on the processing information, and decompressing the data block by adopting the target processing mode.
Preferably, the determining a target processing manner matched with the data type and the data characteristics of the data block includes:
when the data type is determined to be integer time sequence data, determining a target processing mode matched with the data characteristics of the data block by adopting a first matching mode;
and when the data type is determined to be the floating-point time series data, determining a target processing mode matched with the data characteristics of the data block by adopting a second matching mode.
An electronic device, comprising:
a memory;
the processor is used for determining target time series data, dividing the target time series data into at least two data blocks, storing the data blocks in the memory, determining the data characteristics of the data blocks, determining a target processing mode matched with the data characteristics of the data blocks in multiple processing modes, and processing the data blocks by adopting the target processing mode.
Preferably, the method further comprises the following steps:
the timer is used for timing;
the processor is specifically configured to acquire the data blocks in each time period from the target time series data according to a preset time period based on the timing time of the timer.
Preferably, the processor is specifically configured to obtain a data block having a preset data size from the target time series data according to the preset data size.
An electronic device, comprising:
a first determination unit configured to determine target time-series data, the target time-series data being divided into at least two data blocks;
a second determining unit, configured to determine a data characteristic of the data block;
a third determining unit, configured to determine, among multiple processing manners, a target processing manner that matches with the data feature of the data block;
and the first processing unit is used for processing the data block by adopting the target processing mode.
According to the technical scheme, the data processing method is characterized in that target time series data are determined, the target time series data can be divided into at least two data blocks, so that the data characteristics of the data blocks are determined, a target processing mode matched with the data characteristics of the data blocks is determined in multiple processing modes, and the data blocks are processed in the target processing mode; therefore, in the application, for the target time series data, a matched processing mode can be allocated to each data block according to the characteristics of the data block in a mode of dividing the data block, that is, the same target time series data can adopt a plurality of processing modes, so that the processing of variable time series data is effectively realized, and the processing flexibility is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a data processing method according to a second embodiment of the present application;
fig. 3 is a schematic flow chart of a data processing method according to a third embodiment of the present application;
fig. 4 is a structural diagram of a data block according to a third embodiment of the present invention;
fig. 5 is a schematic flowchart of a data processing method according to a fourth embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
An embodiment of the present application provides a data processing method, as shown in fig. 1, the method includes the following steps:
step 101: determining target time series data, and dividing the target time series data into at least two data blocks;
the time-series data is data collected at different times, and reflects a change state or degree of a certain object, phenomenon, or the like with time. And the target time-series data are data collected at different times for the same target object, which may be referred to as an arbitrary object.
In this application, when dividing the target time-series data into at least two data blocks, the target time-series data may be buffered into the data blocks, and then the data blocks may be stored in a memory, such as a disk.
There are various ways to divide the target time series data into at least two data blocks, including at least the following two ways:
1. and acquiring the target time sequence data into data blocks in each time period according to a preset time period.
In a specific implementation process, timing may be started when the target time series data starts to be collected, data buffering is performed, the buffered data block is stored into the memory when a preset time period is reached, the collected target time series data continues to be buffered according to the preset time period, and the data block stored into the memory may be processed in the subsequent step 102 and 104, or may not be stored into the memory, and the subsequent processing is directly performed.
The parameter of the preset time period is not limited in this application, for example, the preset time period is 2 hours.
2. And acquiring the target time sequence data into a data block with a preset data size according to the preset data size.
In a specific implementation process, the collected target time series data may be buffered, and when the buffered data amount reaches a preset data size, the buffered data amount is stored in the memory, and whether the buffered target time series data reaches the preset data size is continuously monitored, and the subsequent steps 102 to 104 may be performed on the data block stored in the memory, or the subsequent processing may be directly performed without storing the data block in the memory.
The parameter of the preset data size is not limited in this application, for example, the preset data size is 10000 data points.
The present application is not limited to the above two ways of dividing the data block, and the data block may be divided directly in the memory.
Step 102: determining data characteristics of the data block;
the data characteristics of the data block are used to characterize the change of the data within the collection time of the data block, and specifically, there are various ways for characterizing the data characteristics, such as variance, continuity of data points, monotonicity of difference value arrays, and the like.
Step 103: determining a target processing mode matched with the data characteristics of the data block in a plurality of processing modes;
specifically, the matching relationship between different data features and different processing modes may be stored in advance, so that the target processing mode matched with the data features of the data block is determined based on the matching relationship.
Step 104: and processing the data block by adopting the target processing mode.
In this embodiment, for the target time series data, a matched processing mode can be allocated to each data block according to the characteristics of the data block in a manner of dividing the data block, that is, the same target time series data can adopt multiple processing modes, so that processing of variable time series data is effectively realized, and the processing flexibility is improved.
In the process of implementing the present application, through research, the applicant finds that time-series data has different data types, and the different data types also have different processing effects on the same processing manner, and in order to further improve the processing effects, a second embodiment of the present application provides a data processing method, as shown in fig. 2, including the following steps:
step 201: determining target time series data, and dividing the target time series data into at least two data blocks;
step 202: determining data characteristics of the data block;
step 203: determining a data type of the target time series data;
it should be noted that, in steps 203 and 201, "divide the target time series data into at least two data blocks" and step 202 do not have a limitation on the execution order in sequence. In another embodiment, after determining the target time-series data, the data type of the target time-series data may be determined, and then the target time-series data may be divided into at least two data blocks, the data characteristics of the data blocks may be determined, or the data blocks may be executed in parallel, or the like.
The data type of the target time-series data includes at least floating point type time-series data and integer type time-series data, and when determining the data type of the target time-series data, it may be determined whether it is floating point type time-series data or integer type time-series data based on a parameter value of the target time-series data, or the data type of the target time-series data is looked up in a table created for the target time-series data.
Step 204: determining a target processing mode matched with the data type and the data characteristics of the data block;
in the present application, there are various types of data of the target time-series data, and in the present embodiment, the target time-series data at least includes integer time-series data and floating point time-series data. The data types are different, the matching modes are different, and specifically:
1. and when the data type is determined to be integer time sequence data, determining a target processing mode matched with the data characteristics of the data block by adopting a first matching mode.
2. And when the data type is determined to be the floating-point time series data, determining a target processing mode matched with the data characteristics of the data block by adopting a second matching mode.
The data characteristics and processing manners in the first matching manner and the second matching manner are not exactly the same, and for easy understanding, the first matching manner and the second matching manner are described in detail below by way of example.
And under the condition that the data type is integer time sequence data, the data characteristics at least comprise one or more of variance, maximum value and minimum value of the data block, positive number, negative number, the repetition degree of continuous data points and the repetition degree of discontinuous data points.
Wherein consecutive data points are characterized by adjacent data points being the same; the discrete data points are different from adjacent data points.
The processing mode at least comprises the following steps: one or more of a differential coding mode, a Varint variable length coding mode, a Zigzag coding mode, an RLE run coding mode and a Dictionary coding mode.
(1) And when the variance of the data block is determined to be smaller than a first threshold (representing that the fluctuation change of the data block is small), determining that the target processing mode is a differential coding mode.
(2) And when the variance of the data blocks is larger than a first threshold value, the maximum value is smaller than a second threshold value (the maximum value in the characteristic data blocks is smaller), and the difference between the number of positive numbers and the number of negative numbers is larger than a third threshold value (the largest data points of the characteristic data blocks are all positive numbers), determining that the target processing mode is the Varint variable length coding mode. And when the maximum value is smaller than the second threshold value, the absolute value of the minimum value is smaller than the second threshold value, and the difference between the number of positive numbers and the number of negative numbers is smaller than a third threshold value (the difference between the number of data points representing the positive numbers in the data block and the number of data points representing the negative numbers is not large), determining that the target processing mode is the Zigzag encoding mode.
(3) And when the continuous repetition degree is determined to be larger than a fourth threshold value (representing that a plurality of continuous repetition values exist in the data block), determining that the target processing mode is an RLE run-length coding mode.
(4) And when the discontinuous repetition degree is determined to be larger than a fifth threshold value (representing that a plurality of complex values exist in the data block but are discontinuous), determining that the target processing mode is a Differential encoding mode.
In the above matching manner, the matching may be performed in a sequential order, that is, the matching is performed in the order of (1), (2), (3) and (4), and when it is determined that the condition of (1) is not satisfied, the matching is performed again in (2), and so on.
It should be noted that, for data blocks that do not satisfy the above conditions, a processing mode in which the data characteristics are within the error range may be selected. Or determining which processing mode to select based on the difference between the data features and the threshold values in different processing modes, for example, scoring may be performed, where the difference is large and the score is low, and the difference is small and the score is large, so as to select the processing mode corresponding to the large score.
And when the data type is floating point time sequence data, the data characteristics at least comprise one or more of variance, repetition degree of continuous data points and difference array.
Wherein consecutive data points are characterized by adjacent data points being the same.
The processing mode at least comprises the following steps: one or more of a Gorilla coding mode, an RLE run-length coding mode and a Differential coding mode.
(1) And when the variance is determined to be smaller than the first threshold value, determining that the target processing mode is a Gorilla coding mode.
(2) And when the variance is determined to be larger than the first threshold value and when the continuous repetition degree is determined to be larger than a fourth threshold value (representing that a plurality of continuous repetition values exist in the data block), determining that the target processing mode is an RLE run-length coding mode.
(3) And when the difference value array is determined to be monotonically increasing or monotonically decreasing, determining that the target processing mode is a Differential encoding mode.
In the above matching manner, the matching may be performed in a sequential order, that is, the matching is performed in the order of (1), (2) and (3), and when it is determined that the condition of (1) is not satisfied, the matching is performed again in (2), and so on.
It should be noted that, for data blocks that do not satisfy the above conditions, a processing mode in which the data characteristics are within the error range may be selected. Or determining which processing mode to select based on the difference between the data features and the threshold values in different processing modes, for example, scoring may be performed, where the difference is large and the score is low, and the difference is small and the score is large, so as to select the processing mode corresponding to the large score.
Step 205: and processing the data block by adopting the target processing mode.
In this embodiment, for the target time series data, a matched processing mode can be allocated to each data block according to the characteristics of the data block and the data type of the target time series data by adopting a data block dividing mode, that is, the same target time series data can adopt multiple processing modes, so that the processing of variable time series data is effectively realized, and the processing flexibility is improved.
An embodiment of the present application method provides a data processing method, as shown in fig. 3, including the following steps:
step 301: determining target time series data, and dividing the target time series data into at least two data blocks;
step 302: determining data characteristics of the data block;
step 303: determining a target processing mode matched with the data characteristics of the data block in a plurality of processing modes;
step 304: adding processing information representing the target processing mode to the data block;
the processing information is used to characterize the target processing mode, and may be in the form of encoded metadata information, such as adding the processing information to the header of the data block, as shown in fig. 4. Of course, the data block can also be added to the tail of the data block, and the adding position is not limited in the application.
The processing information may provide assistance in subsequent processing modes for finding the data block.
Step 305: and processing the data block by adopting the target processing mode.
In this embodiment, for the target time series data, a matched processing mode can be allocated to each data block according to the characteristics of the data block by adopting a data block dividing mode, that is, the same target time series data can adopt multiple processing modes, so that processing of variable time series data is effectively realized, and the processing flexibility is improved; in addition, processing information representing a target processing mode can be added to the divided data blocks, so that convenience of searching the processing mode of the data blocks is improved.
An embodiment of the method of the present application provides a data processing method, as shown in fig. 5, including the following steps:
step 501: determining target time series data, and dividing the target time series data into at least two data blocks;
step 502: determining data characteristics of the data block;
step 503: determining a target processing mode matched with the data characteristics of the data block in a plurality of processing modes;
step 504: adding processing information representing the target processing mode to the data block;
the processing information is used to characterize the target processing mode, and may be in the form of encoded metadata information, such as adding the processing information to the header of the data block, as shown in fig. 4.
Of course, the data block can also be added to the tail of the data block, and the adding position is not limited in the application.
In this embodiment, the processing information may determine a processing manner of the data block when the data block is subsequently decompressed.
Step 505: compressing the data block by adopting the target processing mode;
step 506: and during decompression, determining the target processing mode based on the processing information, and decompressing the data block by adopting the target processing mode.
When the processing information is added to the head of the data block, the processing information may be acquired at the head of the data block, thereby determining the target processing manner based on the processing information.
When the processing information is added at the end of the data block, the processing information may be acquired at the end of the data block, thereby determining the target processing manner based on the processing information.
In this embodiment, for the target time series data, a matched processing mode can be allocated to each data block according to the characteristics of the data block by adopting a data block dividing mode, that is, the same target time series data can adopt multiple processing modes, so that processing of variable time series data is effectively realized, and the processing flexibility is improved; in addition, when the target processing mode is used for compression, the target processing mode can be determined through the processing information added to the data block during decompression, and the convenience of decompression is improved.
Corresponding to the above data processing method, the embodiment of the apparatus of the present application further provides an electronic device, which is described below with several embodiments.
An embodiment of the apparatus of the present application provides an electronic device, as shown in fig. 6, where the electronic device includes: memory 110, processor 120;
wherein the memory 110 is used for storing data blocks.
The processor 120 is configured to determine target time series data, divide the target time series data into at least two data blocks, store the data blocks in the memory, determine data characteristics of the data blocks, determine a target processing manner matching the data characteristics of the data blocks among multiple processing manners, and process the data blocks by using the target processing manner.
The time-series data is data collected at different times, and reflects a change state or degree of a certain object, phenomenon, or the like with time. And the target time-series data are data collected at different times for the same target object, which may be referred to as an arbitrary object.
In this application, when dividing the target time-series data into at least two data blocks, the target time-series data may be buffered into the data blocks, and then the data blocks may be stored in a memory, such as a disk.
There are various ways to divide the target time series data into at least two data blocks, including at least the following two ways:
1. the electronic device further comprises a timer for timing. The processor 120 is specifically configured to obtain the data blocks in each time period according to a preset time period from the target time series data based on the timing time of the timer.
In a specific implementation process, timing can be started when target time series data are collected, data caching is carried out, cached data blocks are stored into a memory when a preset time period is reached, the collected target time series data are continuously cached according to the preset time period, and the data blocks stored into the memory can be subjected to subsequent processing or can not be stored into the memory, so that the subsequent processing is directly carried out.
The parameter of the preset time period is not limited in this application, for example, the preset time period is 2 hours.
2. The processor 120 is specifically configured to obtain the target time series data according to a preset data size to obtain a data block having the preset data size.
In a specific implementation process, the collected target time series data may be buffered, and when the buffered data amount reaches a preset data size, the buffered data amount is stored in the memory, and whether the buffered target time series data reaches the preset data size is continuously monitored, and subsequent processing may be performed on the data block stored in the memory, or the subsequent processing may be directly performed without storing the data block in the memory.
The parameter of the preset data size is not limited in this application, for example, the preset data size is 10000 data points.
The present application is not limited to the above two ways of dividing the data block, and the data block may be divided directly in the memory.
In the present application, the data characteristics of the data block are used to characterize the change of the data within the collection time of the data block, and specifically, there are various ways for characterizing the data characteristics, such as variance, continuity of data points, monotonicity of difference value arrays, and the like. .
In this embodiment, for the target time series data, a matched processing mode can be allocated to each data block according to the characteristics of the data block in a manner of dividing the data block, that is, the same target time series data can adopt multiple processing modes, so that processing of variable time series data is effectively realized, and the processing flexibility is improved.
In the process of implementing the present application, through research, the applicant finds that time-series data has different data types, and different data types also have different processing effects on the same processing method.
Wherein the data type of the target time-series data includes at least floating point type time-series data and integer type time-series data, and when determining the data type of the target time-series data, it may be determined whether it is floating point type time-series data or integer type time-series data based on a parameter value of the target time-series data, or the data type of the target time-series data is looked up in a table created for the target time-series data.
In the present application, there are various types of data of the target time-series data, and in the present embodiment, the target time-series data at least includes integer time-series data and floating point time-series data. The data types are different, the matching modes are different, and specifically:
1. and when the data type is determined to be integer time sequence data, determining a target processing mode matched with the data characteristics of the data block by adopting a first matching mode.
2. When the data type is determined to be the floating point type time series data, determining a target processing mode matched with the data characteristics of the data block by adopting a second matching mode;
the data characteristics and processing manners in the first matching manner and the second matching manner are not exactly the same, and for easy understanding, the first matching manner and the second matching manner are described in detail below by way of example.
And under the condition that the data type is integer time sequence data, the data characteristics at least comprise one or more of variance, maximum value and minimum value of the data block, positive number, negative number, the repetition degree of continuous data points and the repetition degree of discontinuous data points.
Wherein consecutive data points are characterized by adjacent data points being the same; the discrete data points are different from adjacent data points.
The processing mode at least comprises the following steps: one or more of a differential coding mode, a Varint variable length coding mode, a Zigzag coding mode, an RLE run coding mode and a Dictionary coding mode.
(1) And when the variance of the data block is determined to be smaller than a first threshold (representing that the fluctuation change of the data block is small), determining that the target processing mode is a differential coding mode.
(2) And when the variance of the data blocks is larger than a first threshold value, the maximum value is smaller than a second threshold value (the maximum value in the characteristic data blocks is smaller), and the difference between the number of positive numbers and the number of negative numbers is larger than a third threshold value (the largest data points of the characteristic data blocks are all positive numbers), determining that the target processing mode is the Varint variable length coding mode. And when the maximum value is smaller than the second threshold value, the absolute value of the minimum value is smaller than the second threshold value, and the difference between the number of positive numbers and the number of negative numbers is smaller than a third threshold value (the difference between the number of data points representing the positive numbers in the data block and the number of data points representing the negative numbers is not large), determining that the target processing mode is the Zigzag encoding mode.
(3) And when the continuous repetition degree is determined to be larger than a fourth threshold value (representing that a plurality of continuous repetition values exist in the data block), determining that the target processing mode is an RLE run-length coding mode.
(4) And when the discontinuous repetition degree is determined to be larger than a fifth threshold value (representing that a plurality of complex values exist in the data block but are discontinuous), determining that the target processing mode is a Differential encoding mode.
In the above matching manner, the matching may be performed in a sequential order, that is, the matching is performed in the order of (1), (2), (3) and (4), and when it is determined that the condition of (1) is not satisfied, the matching is performed again in (2), and so on.
It should be noted that, for data blocks that do not satisfy the above conditions, a processing mode in which the data characteristics are within the error range may be selected. Or determining which processing mode to select based on the difference between the data features and the threshold values in different processing modes, for example, scoring may be performed, where the difference is large and the score is low, and the difference is small and the score is large, so as to select the processing mode corresponding to the large score.
And when the data type is floating point time sequence data, the data characteristics at least comprise one or more of variance, repetition degree of continuous data points and difference array.
Wherein consecutive data points are characterized by adjacent data points being the same.
The processing mode at least comprises the following steps: one or more of a Gorilla coding mode, an RLE run-length coding mode and a Differential coding mode.
(1) When the variance is determined to be smaller than a first threshold value (representing that the fluctuation change of the data block is small), the target processing mode is determined to be the Gorilla coding mode.
(2) When the variance is determined to be larger than the first threshold value, and when the continuous repetition degree is determined to be larger than a fourth threshold value (representing that a plurality of continuous repetition values exist in the data block), the target processing mode is determined to be the RLE run-length coding mode.
(3) And when the difference value array is determined to be monotonically increasing or monotonically decreasing, determining that the target processing mode is a Differential encoding mode.
In the above matching manner, the matching may be performed in a sequential order, that is, the matching is performed in the order of (1), (2) and (3), and when it is determined that the condition of (1) is not satisfied, the matching is performed again in (2), and so on.
It should be noted that, for data blocks that do not satisfy the above conditions, a processing mode in which the data characteristics are within the error range may be selected. Or determining which processing mode to select based on the difference between the data features and the threshold values in different processing modes, for example, scoring may be performed, where the difference is large and the score is low, and the difference is small and the score is large, so as to select the processing mode corresponding to the large score.
In this embodiment, for the target time series data, a matched processing mode can be determined for each data block according to the characteristics of the data block and the data type of the target time series data by adopting a data block dividing mode, that is, multiple processing modes can be adopted for the same target time series data, so that processing of variable time series data is effectively realized, and the processing flexibility is improved.
In a third embodiment of the apparatus of the present application, the processor is further configured to add processing information representing the target processing manner to the data block.
The processing information is used for characterizing the target processing mode, and may exist in the form of encoded metadata information, for example, the processing information is added to the head of the data block, and may also be added to the tail of the data block, and the adding position is not limited in the present application.
The processing information may provide assistance in subsequent processing modes for finding the data block.
In the embodiment, the convenience of searching the processing mode of the data block is improved by adding the processing information representing the target processing mode to the divided data block.
In a fourth embodiment of the apparatus of the present application, the processor is further configured to add processing information representing the target processing manner to the data block, so that the data block is compressed by using the target processing manner, and when decompressing, the target processing manner is determined based on the processing information, and the data block is decompressed by using the target processing manner.
When the processing information is added to the head of the data block, the processing information may be acquired at the head of the data block, thereby determining the target processing manner based on the processing information.
When the processing information is added at the end of the data block, the processing information may be acquired at the end of the data block, thereby determining the target processing manner based on the processing information.
In this embodiment, when compression is performed in the target processing mode, the target processing mode can be determined by processing information added to the data block during decompression, and convenience in decompression is improved.
In a fifth embodiment of the apparatus of the present application, as shown in fig. 7, an electronic device includes: a first determination unit 701, a second determination unit 702, a third determination unit 703, and a first processing unit 704; wherein:
a first determining unit 701, configured to determine target time-series data, and divide the target time-series data into at least two data blocks.
The time-series data is data collected at different times, and reflects a change state or degree of a certain object, phenomenon, or the like with time. And the target time-series data are data collected at different times for the same target object, which may be referred to as an arbitrary object.
In this application, when dividing the target time-series data into at least two data blocks, the target time-series data may be buffered into the data blocks, and then the data blocks may be stored in a memory, such as a disk.
The first determining unit 701 may divide the target time-series data into at least two data blocks in various ways, specifically:
the first determining unit 701 is specifically configured to obtain the target time series data into a data block in each time period according to a preset time period.
In a specific implementation process, timing can be started when target time series data are collected, data caching is carried out, cached data blocks are stored into a memory when a preset time period is reached, the collected target time series data are continuously cached according to the preset time period, and the data blocks stored into the memory can be subjected to subsequent processing or can not be stored into the memory, so that the subsequent processing is directly carried out.
The parameter of the preset time period is not limited in this application, for example, the preset time period is 2 hours.
Alternatively, the first determining unit 702 is specifically configured to obtain the target time-series data according to a preset data size to obtain a data block having the preset data size.
In a specific implementation process, the collected target time series data may be buffered, and when the buffered data amount reaches a preset data size, the buffered data amount is stored in the memory, and whether the buffered target time series data reaches the preset data size is continuously monitored, and subsequent processing may be performed on the data block stored in the memory, or the subsequent processing may be directly performed without storing the data block in the memory.
The parameter of the preset data size is not limited in this application, for example, the preset data size is 10000 data points.
The present application is not limited to the above two ways of dividing the data block, and the data block may be divided directly in the memory.
A second determining unit 702, configured to determine a data characteristic of the data block;
the data characteristics of the data block are used for representing the change of data in the collection time of the data block, specifically, the modes for representing the data characteristics are various, such as variance, continuity of data points, monotonicity of difference value arrays and the like, and different data characteristics can be matched with different processing modes.
A third determining unit 703, configured to determine, among multiple processing manners, a target processing manner that matches the data feature of the data block;
specifically, the matching relationship between different data features and different processing modes may be stored in advance, so that the target processing mode matched with the data features of the data block is determined based on the matching relationship.
A first processing unit 704, configured to process the data block in the target processing manner.
In this embodiment, for the target time series data, a matched processing mode can be allocated to each data block according to the characteristics of the data block in a manner of dividing the data block, that is, the same target time series data can adopt multiple processing modes, so that processing of variable time series data is effectively realized, and the processing flexibility is improved.
In the process of implementing the present application, through research, the applicant finds that time-series data has different data types, and the different data types also exhibit different processing effects on the same processing method, and in order to further improve the processing effects, in a sixth embodiment of the apparatus in the present application, the electronic device further includes: a fourth determination unit. Wherein:
a fourth determination unit for determining a data type of the target time-series data
And the third determining unit is used for determining a target processing mode matched with the data type and the data characteristics of the data block in multiple processing modes.
The data type of the target time-series data includes at least floating-point type time-series data and shaping time-series data, and when the data type of the target time-series data is determined, it may be determined whether it is floating-point type time-series data or shaping time-series data based on a parameter value of the target time-series data, or the data type of the target time-series data is looked up in a table created for the target time-series data.
In the present application, there are various types of data of the target time-series data, and in the present embodiment, the target time-series data at least includes integer time-series data and floating point time-series data. The data types are different, the matching modes are different, and specifically:
the third determining unit is specifically configured to determine, when the data type is determined to be integer time-series data, a target processing manner that matches the data feature of the data block by using the first matching manner, and determine, when the data type is determined to be floating-point time-series data, a target processing manner that matches the data feature of the data block by using the second matching manner.
The data characteristics and processing manners in the first matching manner and the second matching manner are not exactly the same, and for easy understanding, the first matching manner and the second matching manner are described in detail below by way of example.
And under the condition that the data type is integer time sequence data, the data characteristics at least comprise one or more of variance, maximum value and minimum value of the data block, positive number, negative number, the repetition degree of continuous data points and the repetition degree of discontinuous data points.
Wherein consecutive data points are characterized by adjacent data points being the same; the discrete data points are different from adjacent data points.
The processing mode at least comprises the following steps: one or more of a differential coding mode, a Varint variable length coding mode, a Zigzag coding mode, an RLE run coding mode and a Dictionary coding mode.
(1) And when the variance of the data block is determined to be smaller than a first threshold (representing that the fluctuation change of the data block is small), determining that the target processing mode is a differential coding mode.
(2) And when the variance of the data blocks is larger than a first threshold value, the maximum value is smaller than a second threshold value (the maximum value in the characteristic data blocks is smaller), and the difference between the number of positive numbers and the number of negative numbers is larger than a third threshold value (the largest data points of the characteristic data blocks are all positive numbers), determining that the target processing mode is the Varint variable length coding mode. And when the maximum value is smaller than the second threshold value, the absolute value of the minimum value is smaller than the second threshold value, and the difference between the number of positive numbers and the number of negative numbers is smaller than a third threshold value (the difference between the number of data points representing the positive numbers in the data block and the number of data points representing the negative numbers is not large), determining that the target processing mode is the Zigzag encoding mode.
(3) And when the continuous repetition degree is determined to be larger than a fourth threshold value (representing that a plurality of continuous repetition values exist in the data block), determining that the target processing mode is an RLE run-length coding mode.
(4) And when the discontinuous repetition degree is determined to be larger than a fifth threshold value (representing that a plurality of complex values exist in the data block but are discontinuous), determining that the target processing mode is a Differential encoding mode.
In the above matching manner, the matching may be performed in a sequential order, that is, the matching is performed in the order of (1), (2), (3) and (4), and when it is determined that the condition of (1) is not satisfied, the matching is performed again in (2), and so on.
It should be noted that, for data blocks that do not satisfy the above conditions, a processing mode in which the data characteristics are within the error range may be selected. Or determining which processing mode to select based on the difference between the data features and the threshold values in different processing modes, for example, scoring may be performed, where the difference is large and the score is low, and the difference is small and the score is large, so as to select the processing mode corresponding to the large score.
And when the data type is floating point time sequence data, the data characteristics at least comprise one or more of variance, repetition degree of continuous data points and difference array.
Wherein consecutive data points are characterized by adjacent data points being the same.
The processing mode at least comprises the following steps: one or more of a Gorilla coding mode, an RLE run-length coding mode and a Differential coding mode.
(1) When the variance is determined to be smaller than a first threshold value (representing that the fluctuation change of the data block is small), the target processing mode is determined to be the Gorilla coding mode.
(2) When the variance is determined to be larger than the first threshold value, and when the continuous repetition degree is determined to be larger than a fourth threshold value (representing that a plurality of continuous repetition values exist in the data block), the target processing mode is determined to be the RLE run-length coding mode.
(3) And when the difference value array is determined to be monotonically increasing or monotonically decreasing, determining that the target processing mode is a Differential encoding mode.
In the above matching manner, the matching may be performed in a sequential order, that is, the matching is performed in the order of (1), (2) and (3), and when it is determined that the condition of (1) is not satisfied, the matching is performed again in (2), and so on.
It should be noted that, for data blocks that do not satisfy the above conditions, a processing mode in which the data characteristics are within the error range may be selected. Alternatively, which processing mode to select is determined based on the difference between the data characteristic and the threshold value in the different processing modes. For example, scoring may be performed, and a processing mode corresponding to a large score may be selected because a large difference score is low and a small difference score is large.
In this embodiment, for the target time series data, a matched processing mode can be allocated to each data block according to the characteristics of the data block and the data type of the target time series data by adopting a data block dividing mode, that is, the same target time series data can adopt multiple processing modes, so that the processing of variable time series data is effectively realized, and the processing flexibility is improved.
In a seventh embodiment of the apparatus of the present application, an electronic device further includes a first adding unit, configured to add, to the data block, processing information representing the target processing manner.
The processing information is used for characterizing the target processing mode, and may exist in the form of encoded metadata information, for example, the processing information is added to the head of the data block, and may also be added to the tail of the data block, and the adding position is not limited in the present application.
The processing information may provide assistance in subsequent processing modes for finding the data block.
In the embodiment, the convenience of searching the processing mode of the data block is improved by adding the processing information representing the target processing mode to the divided data block.
In an eighth embodiment of the apparatus of the present application, an electronic device includes: the device comprises a first determining unit, a second determining unit, a third determining unit, a first processing unit, a first adding unit and a first decompressing unit; wherein:
a first determining unit for determining target time-series data, the target time-series data being divided into at least two data blocks.
A second determining unit for determining data characteristics of the data block.
A third determining unit, configured to determine, among multiple processing manners, a target processing manner that matches with the data feature of the data block;
a first adding unit, configured to add, to the data block, processing information representing the target processing manner.
The first processing unit is used for compressing the data block by adopting the target processing mode;
and the first decompression unit is used for determining the target processing mode based on the processing information during decompression and decompressing the data block by adopting the target processing mode.
In this embodiment, for the target time series data, a matched processing mode can be allocated to each data block according to the characteristics of the data block by adopting a data block dividing mode, that is, the same target time series data can adopt multiple processing modes, so that processing of variable time series data is effectively realized, and the processing flexibility is improved; in addition, when the target processing mode is used for compression, the target processing mode can be determined through the processing information added to the data block during decompression, and the convenience of decompression is improved.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A data processing method, comprising:
determining target time series data, and dividing the target time series data into at least two data blocks;
determining data characteristics of the data block;
determining a target processing mode matched with the data characteristics of the data block in a plurality of processing modes;
and processing the data block by adopting the target processing mode.
2. The method of claim 1, wherein the dividing the target time series data into at least two data blocks comprises:
acquiring the target time sequence data into data blocks in each time period according to a preset time period;
or, acquiring a data block with a preset data size from the target time series data according to the preset data size.
3. The method of claim 1, further comprising:
determining a data type of the target time series data;
correspondingly, the determining a target processing manner matched with the data characteristics of the data block includes:
and determining a target processing mode matched with the data type and the data characteristics of the data block.
4. The method of claim 1, further comprising:
and adding processing information representing the target processing mode to the data block.
5. The method of claim 4, wherein the processing the data block in the target processing manner comprises: compressing the data block by adopting the target processing mode;
correspondingly, the method further comprises the following steps:
and during decompression, determining the target processing mode based on the processing information, and decompressing the data block by adopting the target processing mode.
6. The method of claim 3, wherein determining a target treatment that matches the data type and the data characteristics of the data block comprises:
when the data type is determined to be integer time sequence data, determining a target processing mode matched with the data characteristics of the data block by adopting a first matching mode;
and when the data type is determined to be the floating-point time series data, determining a target processing mode matched with the data characteristics of the data block by adopting a second matching mode.
7. An electronic device, comprising:
a memory;
the processor is used for determining target time series data, dividing the target time series data into at least two data blocks, storing the data blocks in the memory, determining the data characteristics of the data blocks, determining a target processing mode matched with the data characteristics of the data blocks in multiple processing modes, and processing the data blocks by adopting the target processing mode.
8. The electronic device of claim 7, further comprising:
the timer is used for timing;
the processor is specifically configured to acquire the data blocks in each time period from the target time series data according to a preset time period based on the timing time of the timer.
9. The electronic device of claim 7, wherein the processor is specifically configured to obtain the target time-series data according to a preset data size to obtain a data block having the preset data size.
10. An electronic device, comprising:
a first determination unit configured to determine target time-series data, the target time-series data being divided into at least two data blocks;
a second determining unit, configured to determine a data characteristic of the data block;
a third determining unit, configured to determine, among multiple processing manners, a target processing manner that matches with the data feature of the data block;
and the first processing unit is used for processing the data block by adopting the target processing mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910933439.2A CN110688385A (en) | 2019-09-29 | 2019-09-29 | Data processing method and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910933439.2A CN110688385A (en) | 2019-09-29 | 2019-09-29 | Data processing method and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110688385A true CN110688385A (en) | 2020-01-14 |
Family
ID=69110931
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910933439.2A Pending CN110688385A (en) | 2019-09-29 | 2019-09-29 | Data processing method and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110688385A (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1516184A (en) * | 2003-01-10 | 2004-07-28 | 华为技术有限公司 | Processing method of multi-media data |
CN105009585A (en) * | 2013-04-02 | 2015-10-28 | 明达半导体股份有限公司 | Method and apparatus for processing video |
US20160110478A1 (en) * | 2014-10-17 | 2016-04-21 | General Electric Company | System and methods for quantization and featurization of time-series data |
CN106528565A (en) * | 2015-09-11 | 2017-03-22 | 北京邮电大学 | Data processing method and apparatus for monitoring system |
CN107577697A (en) * | 2017-07-18 | 2018-01-12 | 阿里巴巴集团控股有限公司 | A kind of data processing method, device and equipment |
CN107609702A (en) * | 2017-09-15 | 2018-01-19 | 郑州云海信息技术有限公司 | A kind of process meteorological data method and device |
CN108062376A (en) * | 2017-12-12 | 2018-05-22 | 清华大学 | A kind of Time Series Compression storage method and system based on similar operating condition |
US20180205963A1 (en) * | 2017-01-17 | 2018-07-19 | Seiko Epson Corporation | Encoding Free View Point Data in Movie Data Container |
CN108470071A (en) * | 2018-03-29 | 2018-08-31 | 联想(北京)有限公司 | A kind of data processing method and device |
CN109164980A (en) * | 2018-08-03 | 2019-01-08 | 北京涛思数据科技有限公司 | A kind of optimizing polymerization processing method of time series data |
CN109582708A (en) * | 2018-11-19 | 2019-04-05 | 冶金自动化研究设计院 | A kind of time series database system |
CN109962710A (en) * | 2017-12-14 | 2019-07-02 | 阿里巴巴集团控股有限公司 | Data compression method, electronic equipment and computer readable storage medium |
-
2019
- 2019-09-29 CN CN201910933439.2A patent/CN110688385A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1516184A (en) * | 2003-01-10 | 2004-07-28 | 华为技术有限公司 | Processing method of multi-media data |
CN105009585A (en) * | 2013-04-02 | 2015-10-28 | 明达半导体股份有限公司 | Method and apparatus for processing video |
US20160110478A1 (en) * | 2014-10-17 | 2016-04-21 | General Electric Company | System and methods for quantization and featurization of time-series data |
CN106528565A (en) * | 2015-09-11 | 2017-03-22 | 北京邮电大学 | Data processing method and apparatus for monitoring system |
US20180205963A1 (en) * | 2017-01-17 | 2018-07-19 | Seiko Epson Corporation | Encoding Free View Point Data in Movie Data Container |
CN107577697A (en) * | 2017-07-18 | 2018-01-12 | 阿里巴巴集团控股有限公司 | A kind of data processing method, device and equipment |
CN107609702A (en) * | 2017-09-15 | 2018-01-19 | 郑州云海信息技术有限公司 | A kind of process meteorological data method and device |
CN108062376A (en) * | 2017-12-12 | 2018-05-22 | 清华大学 | A kind of Time Series Compression storage method and system based on similar operating condition |
CN109962710A (en) * | 2017-12-14 | 2019-07-02 | 阿里巴巴集团控股有限公司 | Data compression method, electronic equipment and computer readable storage medium |
CN108470071A (en) * | 2018-03-29 | 2018-08-31 | 联想(北京)有限公司 | A kind of data processing method and device |
CN109164980A (en) * | 2018-08-03 | 2019-01-08 | 北京涛思数据科技有限公司 | A kind of optimizing polymerization processing method of time series data |
CN109582708A (en) * | 2018-11-19 | 2019-04-05 | 冶金自动化研究设计院 | A kind of time series database system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11044495B1 (en) | Systems and methods for variable length codeword based data encoding and decoding using dynamic memory allocation | |
CN109716658B (en) | Method and system for deleting repeated data based on similarity | |
CN110799959B (en) | Data compression method, decompression method and related equipment | |
WO2012033498A1 (en) | Systems and methods for data compression | |
CN107465413B (en) | Self-adaptive data compression system and method thereof | |
CN102143039B (en) | Data segmentation method and equipment for data compression | |
CN108616280B (en) | Compression method for real-time acquired data of unsteady-state data | |
CN116170027B (en) | Data management system and processing method for poison detection equipment | |
CN116346289A (en) | Data processing method for computer network center | |
CN112544038A (en) | Method, device and equipment for compressing data of storage system and readable storage medium | |
JP2012506665A (en) | Method and apparatus for compressing and decompressing data records | |
CN114520659A (en) | Method for lossless compression and decoding of data by combining rANS and LZ4 encoding | |
CN113630125A (en) | Data compression method, data encoding method, data decompression method, data encoding device, data decompression device, electronic equipment and storage medium | |
CN108880559B (en) | Data compression method, data decompression method, compression equipment and decompression equipment | |
CN115695564A (en) | Efficient transmission method for data of Internet of things | |
CN109687875B (en) | Time sequence data processing method | |
CN110688385A (en) | Data processing method and electronic equipment | |
CN111865324B (en) | Data compression and decompression method, device, system and storage device | |
CN111061428B (en) | Data compression method and device | |
CN110288666B (en) | Data compression method and device | |
CN107783990B (en) | Data compression method and terminal | |
CN115470186A (en) | Data slicing method, device and system | |
CN109255090B (en) | Index data compression method of web graph | |
CN112054805B (en) | Model data compression method, system and related equipment | |
CN113708772A (en) | Huffman coding method, system, device and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200114 |