CN113704268B - Data processing method, device, storage medium and equipment - Google Patents
Data processing method, device, storage medium and equipment Download PDFInfo
- Publication number
- CN113704268B CN113704268B CN202111026654.8A CN202111026654A CN113704268B CN 113704268 B CN113704268 B CN 113704268B CN 202111026654 A CN202111026654 A CN 202111026654A CN 113704268 B CN113704268 B CN 113704268B
- Authority
- CN
- China
- Prior art keywords
- data
- date
- full
- effective
- sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 20
- 238000000034 method Methods 0.000 claims description 21
- 238000012163 sequencing technique Methods 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 11
- 238000010276 construction Methods 0.000 claims description 3
- 238000013500 data storage Methods 0.000 abstract description 9
- 230000006870 function Effects 0.000 description 16
- 238000010586 diagram Methods 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 101100496854 Caenorhabditis elegans col-12 gene Proteins 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000013075 data extraction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
Abstract
The application discloses a data processing method, a device, a storage medium and equipment, which are used for acquiring all total data pre-stored in a main gear table, classifying all the total data and obtaining a plurality of total data groups. And selecting the full data with the earliest data date from the full data shown in each full data packet as effective data. And classifying each effective data to obtain a plurality of effective data packets. For each valid data packet, ordering the valid data according to the sequence from the early date to the late date of the data to obtain a valid data sequence. For each valid data sequence, an attribute is added to each valid data shown for the valid data sequence. And constructing a data pull chain table based on each effective data and the effective date and the expiration date of each effective data. By utilizing the scheme disclosed by the application, the data is stored by utilizing the data pull chain table, and the consumption of hardware resources corresponding to the data storage can be effectively reduced based on the characteristics of the data pull chain table.
Description
Technical Field
The present application relates to the field of big data, and in particular, to a data processing method, apparatus, storage medium, and device.
Background
With the widespread use of big data technology, the traffic data volume of large application programs is rapidly increasing. In some situations, such as financial transaction data, customer data, etc., analysis of historical data needs to be implemented, trends of data changes are extracted, and prediction of business development or risk early warning is performed. For this reason, how to implement the storage and query of the history data under the limited storage space and computing power becomes a research hotspot in the field.
At present, the common data storage modes are: slice data is saved in full at some temporal granularity (typically daily). However, based on the existing data storage method, more redundant data is generated, and a large amount of storage space is occupied by massive redundant data, so that a large amount of hardware resources are consumed, and the hardware cost is increased.
Disclosure of Invention
The application provides a data processing method, a data processing device, a storage medium and data processing equipment, and aims to reduce consumption of hardware resources.
In order to achieve the above object, the present application provides the following technical solutions:
A data processing method, comprising:
acquiring all data pre-stored in a main gear table; the full data comprises a primary key, a field and a data date;
classifying the full data to obtain a plurality of full data packets; the plurality of full-volume data with the same main key and the same field are divided into the same full-volume data packet;
selecting full data with earliest data date from the full data shown in each full data packet as effective data of each full data packet;
classifying each effective data to obtain a plurality of effective data groups; the method comprises the steps that a plurality of valid data with the same main key are divided into the same valid data packet;
for each effective data packet, ordering the effective data shown by the effective data packet according to the order of the data date from early to late to obtain an effective data sequence corresponding to the effective data packet;
adding an attribute to each valid data shown in the valid data sequence for each valid data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the effective data; the expiration date of the last valid data in the valid data sequence is set as a preset date; the expiration date of the valid data arranged in the n-1 th bit in the valid data sequence is set as a first date; n=1, 2,3,..m-1; m represents the number of valid data contained in the valid data sequence; the first date is one day later than the effective date of the n-th valid data arranged in the valid data sequence;
And constructing a data pull chain table based on each effective data and the effective date and the expiration date of each effective data.
Optionally, selecting, from the respective full-size data shown in each of the full-size data packets, the full-size data with the earliest data date as the valid data of each of the full-size data packets, including:
for each full data packet, sequencing all full data shown by the full data packet according to the sequence from early to late of the data date to obtain a full data sequence corresponding to each full data packet;
and selecting the full data arranged at the first position in the full data sequence as the effective data of each full data sequence for each full data sequence.
Optionally, the method further comprises:
under the condition that the main gear table is detected to generate incremental data, extracting effective data with the same main key as the main key of the incremental data from the data pull chain table, marking the effective data as historical data, and deleting the historical data from the data pull chain table; the incremental data comprises a primary key, a field and a data date;
writing the incremental data and the historical data into a preset temporary table;
Classifying each data shown in the temporary table to obtain a plurality of data packets; a plurality of data with the same main key and the same field are divided into the same data packet;
selecting the data with the earliest data date from the data shown in each data packet as target data of each data packet;
classifying each target data to obtain a plurality of target data packets; multiple target data with the same main key are divided into the same target data group;
for each target data packet, sequencing each target data shown by the target data packet according to the sequence from the early date to the late date of the data to obtain a target data sequence corresponding to the target data packet;
adding an attribute to each target data shown in the target data sequence for each target data sequence; wherein the effective date of the target data is the same as the data date shown by the target data; setting the expiration date of the last target data in the target data sequence as the preset date; setting the expiration date of the s-1 th bit of target data arranged in the target data sequence as a second date; s=1, 2,3, k-1; k represents the number of target data contained in the target data sequence; the second date is one day later than the effective date of the target data arranged at the s-th position in the target data sequence;
And writing the target data into the data pull chain table.
Optionally, the method further comprises:
under the condition that a data reading instruction sent by a user is received, extracting data with the valid period covering a time node shown by the data reading instruction from each data shown by the data pull chain table as reply data, and sending the reply data to the user; wherein the expiration date is used for indicating a period of time from the date of effectiveness of the data to the end of the date of expiration of the data.
A data processing apparatus comprising:
the acquisition unit is used for acquiring all data pre-stored in the main gear table; the full data comprises a primary key, a field and a data date;
the first classification unit is used for classifying the full data to obtain a plurality of full data packets; the plurality of full-volume data with the same main key and the same field are divided into the same full-volume data packet;
a selecting unit, configured to select, from among the respective full-size data shown in each of the full-size data packets, full-size data having an earliest data date as valid data of each of the full-size data packets;
A second classification unit, configured to classify each of the valid data to obtain a plurality of valid data packets; the method comprises the steps that a plurality of valid data with the same main key are divided into the same valid data packet;
the ordering unit is used for ordering the effective data shown by the effective data packets according to the order of the data date from early to late for each effective data packet to obtain an effective data sequence corresponding to the effective data packet;
an adding unit, configured to add, for each valid data sequence, an attribute to each valid data shown by the valid data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the effective data; the expiration date of the last valid data in the valid data sequence is set as a preset date; the expiration date of the valid data arranged in the n-1 th bit in the valid data sequence is set as a first date; n=1, 2,3,..m-1; m represents the number of valid data contained in the valid data sequence; the first date is one day later than the effective date of the n-th valid data arranged in the valid data sequence;
And the construction unit is used for constructing a data pull chain table based on the effective data and the effective date and the expiration date of each effective data.
Optionally, the selecting unit is specifically configured to:
for each full data packet, sequencing all full data shown by the full data packet according to the sequence from early to late of the data date to obtain a full data sequence corresponding to each full data packet;
and selecting the full data arranged at the first position in the full data sequence as the effective data of each full data sequence for each full data sequence.
Optionally, the method further comprises:
an incremental data loading unit for:
under the condition that the main gear table is detected to generate incremental data, extracting effective data with the same main key as the main key of the incremental data from the data pull chain table, marking the effective data as historical data, and deleting the historical data from the data pull chain table; the incremental data comprises a primary key, a field and a data date;
writing the incremental data and the historical data into a preset temporary table;
classifying each data shown in the temporary table to obtain a plurality of data packets; a plurality of data with the same main key and the same field are divided into the same data packet;
Selecting the data with the earliest data date from the data shown in each data packet as target data of each data packet;
classifying each target data to obtain a plurality of target data packets; multiple target data with the same main key are divided into the same target data group;
for each target data packet, sequencing each target data shown by the target data packet according to the sequence from the early date to the late date of the data to obtain a target data sequence corresponding to the target data packet;
adding an attribute to each target data shown in the target data sequence for each target data sequence; wherein the effective date of the target data is the same as the data date shown by the target data; setting the expiration date of the last target data in the target data sequence as the preset date; setting the expiration date of the s-1 th bit of target data arranged in the target data sequence as a second date; s=1, 2,3, k-1; k represents the number of target data contained in the target data sequence; the second date is one day later than the effective date of the target data arranged at the s-th position in the target data sequence;
And writing the target data into the data pull chain table.
Optionally, the method further comprises:
the data reading unit is used for extracting data with the validity period covering a time node shown by the data reading instruction from each data shown by the data pull chain table as reply data under the condition that the data reading instruction sent by a user is received, and sending the reply data to the user; wherein the expiration date is used for indicating a period of time from the date of effectiveness of the data to the end of the date of expiration of the data.
A computer-readable storage medium including a stored program, wherein the program performs the data processing method.
A data processing apparatus comprising: a processor, a memory, and a bus; the processor is connected with the memory through the bus;
the memory is used for storing a program, and the processor is used for running the program, wherein the data processing method is executed when the program runs.
According to the technical scheme provided by the application, all the total data pre-stored in the main gear table are acquired, and all the total data are classified to obtain a plurality of total data groups. And selecting the full data with the earliest data date from the full data shown in each full data packet as the effective data of each full data packet. And classifying each effective data to obtain a plurality of effective data packets. For each valid data packet, the valid data shown by the valid data packet are ordered according to the order of the data date from early to late, and a valid data sequence corresponding to the valid data packet is obtained. For each valid data sequence, an attribute is added to each valid data shown for the valid data sequence. And constructing a data pull chain table based on each valid data and the effective date and the expiration date of each valid data. By utilizing the scheme disclosed by the application, the effective date and the expiration date can be added for the data recorded in the main file table, the data pull chain table is constructed based on each data and the effective date and the expiration date of each data, the data is stored by utilizing the data pull chain table, and the consumption of corresponding hardware resources for data storage can be effectively reduced based on the characteristics of the data pull chain table.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1a is a schematic diagram of a data processing method according to an embodiment of the present application;
FIG. 1b is a schematic diagram of a data processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of another data processing method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a data processing apparatus according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The scheme provided by the embodiment of the application realizes the data storage by using the data pull chain table, and can effectively reduce the consumption of hardware resources corresponding to the data storage based on the characteristics of the data pull chain table.
The following are terms and corresponding explanations that may be involved in the present application:
analysis function: a powerful function dedicated to solving the statistical requirements of complex report forms, it can group in data and then calculate some statistics on a group basis, and each row of each group can return one statistic. When using an analysis function, an analysis function return value is added to each line of query returns.
row_number function: the partitionyby clauses are grouped, the data within the group is ordered by orderby clause, and then each line is assigned a number, forming a sequence that is accumulated back starting at 1.
lead function: is an offset function, partitionby clause groupings, the data within the group ordered by orderby clause, and the field values for N rows down from the current row are queried.
A main gear table: different from the detail list, a certain main body is used as a main key to store data, such as client information, account information and the like, and when the main body information is changed, the main file list can only store the current latest data.
Data pull linked list: a data history storage mode is characterized in that on the basis of an original master file table, flag information such as effective time and failure time of data is added, and the validity period of the data is stored.
Date of validation: date of the data generation.
Expiration date: date when a piece of data changed. Typically the expiration date of one record is equal to the effective date of the next record of the data. The effective date and expiration date of all records of a piece of data are one continuous, non-overlapping time record. The data remains unchanged during a time record.
As shown in fig. 1a and fig. 1b, a schematic diagram of a data processing method according to an embodiment of the present application includes the following steps:
s101: and acquiring all data pre-stored in a main gear table.
Wherein the full data includes a primary key, a field, and a date of the data.
It should be noted that, by running the SQL statement, the data extraction may be performed on the main file table, so as to obtain all the total data pre-stored in the main file table. Specifically, the total data extracted from the master table can be referred to as table 1.
TABLE 1
Main key 1 | Main key 2 | Field 1 | Field 2 | Field 3 | Date of data |
Key_11 | Key_21 | Col_11 | Col_21 | Col_31 | 2020-01-23 |
Key_11 | Key_21 | Col_11 | Col_21 | Col_31 | 2020-01-23 |
Key_11 | Key_21 | Col_11 | Col_21 | Col_32 | 2020-02-20 |
Key_11 | Key_21 | Col_11 | Col_22 | Col_32 | 2020-03-15 |
Key_22 | Key_22 | Col_12 | Col_22 | Col_33 | 2020-05-02 |
It should be noted that the contents shown in table 1 above are only for illustration.
S102: and classifying each full-volume data to obtain a plurality of full-volume data packets.
Wherein, the plurality of full data with the same main key and the same field are all divided into the same full data packet.
S103: for each full data packet, sorting the full data shown by the full data packets according to the order of the data date from early to late to obtain a full data sequence corresponding to each full data packet.
Wherein, each full data shown in each full data sequence is provided with a corresponding serial number.
Specifically, taking the full data shown in table 1 as an example, the full data shown in the full data packets are sorted according to the order of the data dates from early to late, so as to obtain full data sequences corresponding to each full data packet, and the full data sequences can be shown in table 2.
TABLE 2
Main key 1 | Main key 2 | Field 1 | Field 2 | Field 3 | Date of data | Sequence number |
Key_11 | Key_21 | Col_11 | Col_21 | Col_31 | 2020-01-23 | 1 |
Key_11 | Key_21 | Col_11 | Col_21 | Col_31 | 2020-01-23 | 2 |
Key_11 | Key_21 | Col_11 | Col_21 | Col_32 | 2020-02-20 | 1 |
Key_11 | Key_21 | Col_11 | Col_22 | Col_32 | 2020-03-15 | 1 |
Key_22 | Key_22 | Col_12 | Col_22 | Col_33 | 2020-05-02 | 1 |
It should be noted that the contents shown in table 2 above are only for illustration.
S104: for each full-volume data sequence, selecting the first full-volume data in the full-volume data sequence as the effective data of each full-volume data sequence.
The flow shown in S102, S103, and S104 may be understood as performing a deduplication operation on all the full data shown in the main gear table, that is, filtering out all the other full data except the valid data, and only retaining all the valid data.
It should be noted that, because the data is abnormal in the process of loading the full data by the main file table, the same situation of the main key occurs, so that the duplicate removal operation needs to be performed on each full data, so as to avoid the situation that an abnormal data zipper is formed in the later stage and prevent the situation that the main key is repeated when the data is read in the later stage.
In the embodiment of the present application, the flows shown in S102, S103, and S104 may be implemented by calling a row_number function (an existing analysis function), where specific calling logic of the row_number function is: row_number (over (partitionby [ primary key 1, primary key 2. ] orderby [ field 1, field 2 … ])).
Specifically, taking the respective full-size data sequences shown in table 2 as an example, the respective effective data obtained via S102, S103, and S104 can be referred to as shown in table 3.
TABLE 3 Table 3
It should be noted that the contents shown in table 3 above are only for illustration.
S105: and classifying each effective data to obtain a plurality of effective data packets.
Wherein, the same plurality of valid data of the main key are all divided into the same valid data packet.
S106: for each valid data packet, the valid data shown by the valid data packet are ordered according to the order of the data date from early to late, and a valid data sequence corresponding to the valid data packet is obtained.
S107: for each valid data sequence, an attribute is added to each valid data shown for the valid data sequence.
The attributes include an effective date and an expiration date, and the effective date is the same as the data date shown by the effective data. The expiration date of the last valid data in the valid data sequence is set to a preset date, the expiration date of the n-1 th valid data in the valid data sequence is set to a first date, n=1, 2, 3.
The flows shown in S105, S106, and S107 may be implemented by calling a lead function (an existing analysis function), where specific calling logic of the lead function is: lead (TO_CHAR (TO_DATE (DATE of data, 'YYYY-MM-DD') -1, 'YYYY-MM-DD'), 1, '2999-12-31') over (partitioniby [ primary key 1, primary key 2. ] orderby [ field 1, field 2 … ]).
Specifically, taking the effective data shown in table 3 as an example, the respective effective data obtained via S105, S106, and S107 can be referred to as shown in table 4.
TABLE 4 Table 4
Main key 1 | Main key 2 | Field 1 | Field 2 | Field 3 | Date of data | Date of effectiveness | Expiration date |
Key_11 | Key_21 | Col_11 | Col_21 | Col_31 | 2020-01-23 | 2020-01-23 | 2020-02-19 |
Key_11 | Key_21 | Col_11 | Col_21 | Col_32 | 2020-02-20 | 2020-02-20 | 2020-03-14 |
Key_11 | Key_21 | Col_11 | Col_22 | Col_32 | 2020-03-15 | 2020-03-15 | 2999-12-31 |
Key_22 | Key_22 | Col_12 | Col_22 | Col_33 | 2020-05-02 | 2020-05-02 | 2999-12-31 |
In Table 4, 2999-12-31 represents a preset date.
Further, the contents shown in table 4 above are for illustration only.
S108: and constructing a data pull chain table based on each valid data and the effective date and the expiration date of each valid data.
S109: and under the condition that the fact that the main gear table generates the incremental data is detected, extracting valid data with the same main key as the main key of the incremental data from the data pull chain table, marking the valid data as historical data, and deleting the historical data from the data pull chain table.
Wherein the delta data includes a primary key, a field, and a date of the data. In addition, the master profile generates incremental data, both representing the business project and new data, and writes the new data into the master profile.
S110: and writing the incremental data and the historical data into a preset temporary table.
S111: and classifying each data shown in the temporary table to obtain a plurality of data packets.
Wherein, the plurality of data with the same main key and the same field are all divided into the same data packet.
S112: from the respective data shown in each data packet, the data with the earliest data date is selected as the target data of each data packet.
The flow shown in S111 and S112 may be implemented by calling a row_number function.
S113: and classifying each target data to obtain a plurality of target data packets.
Wherein, a plurality of target data with the same main key are all divided into the same target data packet.
S114: for each target data packet, sequencing the target data shown by the target data packet according to the sequence from the early date to the late date of the data to obtain a target data sequence corresponding to the target data packet.
S115: for each target data sequence, an attribute is added to each target data shown for the target data sequence.
Wherein the effective date of the target data is the same as the data date shown by the target data. The expiration date of the last-bit target data in the target data sequence is set to a preset date, the expiration date of the s-1 th-bit target data in the target data sequence is set to a second date, s=1, 2,3,.
The flows shown in S113, S114, and S115 may be implemented by calling the lead function.
S116: and writing each target data into the data pull chain table.
S117: and under the condition that the data reading instruction sent by the user is received, extracting data with the validity period covering the time node shown by the data reading instruction from each data shown by the data pull chain table as reply data, and sending the reply data to the user.
The validity period is used for indicating a period from the date of validity of the data to the end of the date of expiration of the data.
In summary, by using the scheme shown in the embodiment, the effective date and the expiration date can be added to the data (including the full data and the incremental data) recorded in the main file table, and the data pull chain table is constructed based on each data and the effective date and the expiration date of each data, and the data is stored by using the data pull chain table, so that the consumption of the hardware resources corresponding to the data storage can be effectively reduced based on the characteristics of the data pull chain table.
It should be noted that S109 mentioned in the foregoing embodiment is an alternative implementation of the data processing method shown in the present application. In addition, S117 mentioned in the foregoing embodiment is also an alternative implementation of the data processing method shown in the present application. For this reason, the flow mentioned in the above embodiment can be summarized as the method shown in fig. 2.
As shown in fig. 2, a schematic diagram of another data processing method according to an embodiment of the present application includes the following steps:
s201: and acquiring all data pre-stored in a main gear table.
Wherein the full data includes a primary key, a field, and a date of the data.
S202: and classifying each full-volume data to obtain a plurality of full-volume data packets.
Wherein, the plurality of full data with the same main key and the same field are all divided into the same full data packet.
S203: and selecting the full data with the earliest data date from the full data shown in each full data packet as the effective data of each full data packet.
S204: and classifying each effective data to obtain a plurality of effective data packets.
Wherein, the same plurality of valid data of the main key are all divided into the same valid data packet.
S205: for each valid data packet, the valid data shown by the valid data packet are ordered according to the order of the data date from early to late, and a valid data sequence corresponding to the valid data packet is obtained.
S206: for each valid data sequence, an attribute is added to each valid data shown for the valid data sequence.
Wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the effective data; the expiration date of the last valid data in the valid data sequence is set as a preset date; the expiration date of the valid data arranged in the n-1 th bit in the valid data sequence is set as the first date; n=1, 2,3,..m-1; m represents the number of valid data contained in the valid data sequence; the first date is one day later than the effective date of the n-th valid data arranged in the valid data sequence.
S207: and constructing a data pull chain table based on each valid data and the effective date and the expiration date of each valid data.
In summary, by using the scheme shown in the embodiment, the effective date and the expiration date can be added to the data recorded in the main file table, the data pull chain table is constructed based on each data and the effective date and the expiration date of each data, and the data is stored by using the data pull chain table, so that the consumption of hardware resources corresponding to the data storage can be effectively reduced based on the characteristics of the data pull chain table.
Corresponding to the data processing method provided by the embodiment of the application, the embodiment of the application also provides a data processing device.
Fig. 3 is a schematic diagram of an architecture of a data processing apparatus according to an embodiment of the present application, including:
an obtaining unit 100, configured to obtain all the total data pre-stored in the master file table; the full data includes a primary key, a field, and a date of the data.
A first classification unit 200, configured to classify each full-size data to obtain a plurality of full-size data packets; multiple full data with the same primary key and the same field are all divided into the same full data packet.
And a selecting unit 300 for selecting, from among the respective full data shown in each full data packet, the full data having the earliest data date as the effective data of each full data packet.
The selecting unit 300 is specifically configured to: for each full data packet, sequencing all full data shown by the full data packet according to the sequence from early to late of the data date to obtain a full data sequence corresponding to each full data packet; for each full-volume data sequence, selecting the first full-volume data in the full-volume data sequence as the effective data of each full-volume data sequence.
A second classifying unit 400, configured to classify each valid data to obtain a plurality of valid data packets; multiple valid data with the same primary key are all divided into the same valid data packet.
The sorting unit 500 is configured to sort, for each valid data packet, the valid data shown in the valid data packet according to the order from early to late of the data date, so as to obtain a valid data sequence corresponding to the valid data packet.
An adding unit 600 for adding, for each valid data sequence, an attribute to each valid data shown by the valid data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the effective data; the expiration date of the last valid data in the valid data sequence is set as a preset date; the expiration date of the valid data arranged in the n-1 th bit in the valid data sequence is set as the first date; n=1, 2,3,..m-1; m represents the number of valid data contained in the valid data sequence; the first date is one day later than the effective date of the n-th valid data arranged in the valid data sequence.
A construction unit 700, configured to construct a data pull chain table based on each valid data, and the effective date and the expiration date of each valid data.
An incremental data loading unit 800 for: under the condition that the main gear table is detected to generate incremental data, extracting effective data with the same main key as the main key of the incremental data from the data pull chain table, marking the effective data as historical data, and deleting the historical data from the data pull chain table; the incremental data includes a primary key, a field, and a data date; loading the increment data and the history data into a preset temporary table; classifying each data shown in the temporary table to obtain a plurality of data packets; multiple data with the same main key and the same field are divided into the same data packet; selecting the data with the earliest data date from the data shown in each data packet as target data of each data packet; classifying each target data to obtain a plurality of target data groups; multiple target data with the same main key are divided into the same target data group; for each target data packet, sequencing each target data shown by the target data packet according to the sequence from the early date to the late date of the data to obtain a target data sequence corresponding to the target data packet; adding an attribute to each target data shown by the target data sequence for each target data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the target data; setting the expiration date of the last target data in the target data sequence as a preset date; setting the expiration date of the target data arranged in the s-1 th bit in the target data sequence as a second date; s=1, 2,3, k-1; k represents the number of target data contained in the target data sequence; the second date is one day later than the effective date of the target data arranged at the s-th position in the target data sequence; and loading each target data into a data pull chain table.
The data reading unit 900 is configured to extract, when receiving a data reading instruction sent by a user, data whose validity period covers a time node indicated by the data reading instruction from the data pull chain table, and send the data to the user; the validity period is used for indicating a period from the date of validity of the data to the end of the date of expiration of the data.
In summary, by using the scheme shown in the embodiment, the effective date and the expiration date can be added to the data recorded in the main file table, the data pull chain table is constructed based on each data and the effective date and the expiration date of each data, and the data is stored by using the data pull chain table, so that the consumption of hardware resources corresponding to the data storage can be effectively reduced based on the characteristics of the data pull chain table.
The present application also provides a computer readable storage medium including a stored program, wherein the program executes the data processing method provided by the present application.
The application also provides a data processing device comprising: a processor, a memory, and a bus. The processor is connected with the memory through a bus, the memory is used for storing a program, and the processor is used for running the program, wherein the data processing method provided by the application is executed when the program runs, and the method comprises the following steps:
Acquiring all data pre-stored in a main gear table; the full data comprises a primary key, a field and a data date;
classifying the full data to obtain a plurality of full data packets; the plurality of full-volume data with the same main key and the same field are divided into the same full-volume data packet;
selecting full data with earliest data date from the full data shown in each full data packet as effective data of each full data packet;
classifying each effective data to obtain a plurality of effective data groups; the method comprises the steps that a plurality of valid data with the same main key are divided into the same valid data packet;
for each effective data packet, ordering the effective data shown by the effective data packet according to the order of the data date from early to late to obtain an effective data sequence corresponding to the effective data packet;
adding an attribute to each valid data shown in the valid data sequence for each valid data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the effective data; the expiration date of the last valid data in the valid data sequence is set as a preset date; the expiration date of the valid data arranged in the n-1 th bit in the valid data sequence is set as a first date; n=1, 2,3,..m-1; m represents the number of valid data contained in the valid data sequence; the first date is one day later than the effective date of the n-th valid data arranged in the valid data sequence;
And constructing a data pull chain table based on each effective data and the effective date and the expiration date of each effective data.
Optionally, selecting, from the respective full-size data shown in each of the full-size data packets, the full-size data with the earliest data date as the valid data of each of the full-size data packets, including:
for each full data packet, sequencing all full data shown by the full data packet according to the sequence from early to late of the data date to obtain a full data sequence corresponding to each full data packet;
and selecting the full data arranged at the first position in the full data sequence as the effective data of each full data sequence for each full data sequence.
Optionally, the method further comprises:
under the condition that the main gear table is detected to generate incremental data, extracting effective data with the same main key as the main key of the incremental data from the data pull chain table, marking the effective data as historical data, and deleting the historical data from the data pull chain table; the incremental data comprises a primary key, a field and a data date;
loading the incremental data and the historical data into a preset temporary table;
Classifying each data shown in the temporary table to obtain a plurality of data packets; a plurality of data with the same main key and the same field are divided into the same data packet;
selecting the data with the earliest data date from the data shown in each data packet as target data of each data packet;
classifying each target data to obtain a plurality of target data packets; multiple target data with the same main key are divided into the same target data group;
for each target data packet, sequencing each target data shown by the target data packet according to the sequence from the early date to the late date of the data to obtain a target data sequence corresponding to the target data packet;
adding an attribute to each target data shown in the target data sequence for each target data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the target data; setting the expiration date of the last target data in the target data sequence as the preset date; setting the expiration date of the s-1 th bit of target data arranged in the target data sequence as a second date; s=1, 2,3, k-1; k represents the number of target data contained in the target data sequence; the second date is one day later than the effective date of the target data arranged at the s-th position in the target data sequence;
And loading each target data into the data pull chain table.
Optionally, the method further comprises:
under the condition that a data reading instruction sent by a user is received, extracting data with the validity period covering a time node shown by the data reading instruction from the data pull chain table, and sending the data to the user; wherein the expiration date is used for indicating a period of time from the date of effectiveness of the data to the end of the date of expiration of the data.
The functions of the methods of embodiments of the present application, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored on a computing device readable storage medium. Based on such understanding, a part of the present application that contributes to the prior art or a part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device, etc.) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A method of data processing, comprising:
acquiring all data pre-stored in a main gear table; the full data comprises a primary key, a field and a data date;
classifying the full data to obtain a plurality of full data packets; the plurality of full-volume data with the same main key and the same field are divided into the same full-volume data packet;
selecting full data with earliest data date from the full data shown in each full data packet as effective data of each full data packet;
Classifying each effective data to obtain a plurality of effective data groups; the method comprises the steps that a plurality of valid data with the same main key are divided into the same valid data packet;
for each effective data packet, ordering the effective data shown by the effective data packet according to the order of the data date from early to late to obtain an effective data sequence corresponding to the effective data packet;
adding an attribute to each valid data shown in the valid data sequence for each valid data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the effective data; the expiration date of the last valid data in the valid data sequence is set as a preset date; the expiration date of the valid data arranged in the n-1 th bit in the valid data sequence is set as a first date; n=1, 2,3,..m-1; m represents the number of valid data contained in the valid data sequence; the first date is one day later than the effective date of the n-th valid data arranged in the valid data sequence;
and constructing a data pull chain table based on each effective data and the effective date and the expiration date of each effective data.
2. The method according to claim 1, wherein selecting, from among the respective full data shown in each of the full data packets, the full data having the earliest data date as the valid data of each of the full data packets, comprises:
for each full data packet, sequencing all full data shown by the full data packet according to the sequence from early to late of the data date to obtain a full data sequence corresponding to each full data packet;
and selecting the full data arranged at the first position in the full data sequence as the effective data of each full data sequence for each full data sequence.
3. The method as recited in claim 1, further comprising:
under the condition that the main gear table is detected to generate incremental data, extracting effective data with the same main key as the main key of the incremental data from the data pull chain table, marking the effective data as historical data, and deleting the historical data from the data pull chain table; the incremental data comprises a primary key, a field and a data date;
writing the incremental data and the historical data into a preset temporary table;
Classifying each data shown in the temporary table to obtain a plurality of data packets; a plurality of data with the same main key and the same field are divided into the same data packet;
selecting the data with the earliest data date from the data shown in each data packet as target data of each data packet;
classifying each target data to obtain a plurality of target data packets; multiple target data with the same main key are divided into the same target data group;
for each target data packet, sequencing each target data shown by the target data packet according to the sequence from the early date to the late date of the data to obtain a target data sequence corresponding to the target data packet;
adding an attribute to each target data shown in the target data sequence for each target data sequence; wherein the effective date of the target data is the same as the data date shown by the target data; setting the expiration date of the last target data in the target data sequence as the preset date; setting the expiration date of the s-1 th bit of target data arranged in the target data sequence as a second date; s=1, 2,3, k-1; k represents the number of target data contained in the target data sequence; the second date is one day later than the effective date of the target data arranged at the s-th position in the target data sequence;
And writing the target data into the data pull chain table.
4. The method as recited in claim 1, further comprising:
under the condition that a data reading instruction sent by a user is received, extracting data with the valid period covering a time node shown by the data reading instruction from each data shown by the data pull chain table as reply data, and sending the reply data to the user; wherein the expiration date is used for indicating a period of time from the date of effectiveness of the data to the end of the date of expiration of the data.
5. A data processing apparatus, comprising:
the acquisition unit is used for acquiring all data pre-stored in the main gear table; the full data comprises a primary key, a field and a data date;
the first classification unit is used for classifying the full data to obtain a plurality of full data packets; the plurality of full-volume data with the same main key and the same field are divided into the same full-volume data packet;
a selecting unit, configured to select, from among the respective full-size data shown in each of the full-size data packets, full-size data having an earliest data date as valid data of each of the full-size data packets;
A second classification unit, configured to classify each of the valid data to obtain a plurality of valid data packets; the method comprises the steps that a plurality of valid data with the same main key are divided into the same valid data packet;
the ordering unit is used for ordering the effective data shown by the effective data packets according to the order of the data date from early to late for each effective data packet to obtain an effective data sequence corresponding to the effective data packet;
an adding unit, configured to add, for each valid data sequence, an attribute to each valid data shown by the valid data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the effective data; the expiration date of the last valid data in the valid data sequence is set as a preset date; the expiration date of the valid data arranged in the n-1 th bit in the valid data sequence is set as a first date; n=1, 2,3,..m-1; m represents the number of valid data contained in the valid data sequence; the first date is one day later than the effective date of the n-th valid data arranged in the valid data sequence;
And the construction unit is used for constructing a data pull chain table based on the effective data and the effective date and the expiration date of each effective data.
6. The apparatus according to claim 5, wherein the selection unit is specifically configured to:
for each full data packet, sequencing all full data shown by the full data packet according to the sequence from early to late of the data date to obtain a full data sequence corresponding to each full data packet;
and selecting the full data arranged at the first position in the full data sequence as the effective data of each full data sequence for each full data sequence.
7. The apparatus as recited in claim 5, further comprising:
an incremental data loading unit for:
under the condition that the main gear table is detected to generate incremental data, extracting effective data with the same main key as the main key of the incremental data from the data pull chain table, marking the effective data as historical data, and deleting the historical data from the data pull chain table; the incremental data comprises a primary key, a field and a data date;
writing the incremental data and the historical data into a preset temporary table;
Classifying each data shown in the temporary table to obtain a plurality of data packets; a plurality of data with the same main key and the same field are divided into the same data packet;
selecting the data with the earliest data date from the data shown in each data packet as target data of each data packet;
classifying each target data to obtain a plurality of target data packets; multiple target data with the same main key are divided into the same target data group;
for each target data packet, sequencing each target data shown by the target data packet according to the sequence from the early date to the late date of the data to obtain a target data sequence corresponding to the target data packet;
adding an attribute to each target data shown in the target data sequence for each target data sequence; wherein the effective date of the target data is the same as the data date shown by the target data; setting the expiration date of the last target data in the target data sequence as the preset date; setting the expiration date of the s-1 th bit of target data arranged in the target data sequence as a second date; s=1, 2,3, k-1; k represents the number of target data contained in the target data sequence; the second date is one day later than the effective date of the target data arranged at the s-th position in the target data sequence;
And writing the target data into the data pull chain table.
8. The apparatus as recited in claim 5, further comprising:
the data reading unit is used for extracting data with the validity period covering a time node shown by the data reading instruction from each data shown by the data pull chain table as reply data under the condition that the data reading instruction sent by a user is received, and sending the reply data to the user; wherein the expiration date is used for indicating a period of time from the date of effectiveness of the data to the end of the date of expiration of the data.
9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program performs the data processing method of any one of claims 1-4.
10. A data processing apparatus, comprising: a processor, a memory, and a bus; the processor is connected with the memory through the bus;
the memory is used for storing a program, and the processor is used for running the program, wherein the program runs to execute the data processing method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111026654.8A CN113704268B (en) | 2021-09-02 | 2021-09-02 | Data processing method, device, storage medium and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111026654.8A CN113704268B (en) | 2021-09-02 | 2021-09-02 | Data processing method, device, storage medium and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113704268A CN113704268A (en) | 2021-11-26 |
CN113704268B true CN113704268B (en) | 2023-12-08 |
Family
ID=78657408
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111026654.8A Active CN113704268B (en) | 2021-09-02 | 2021-09-02 | Data processing method, device, storage medium and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113704268B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111400304A (en) * | 2020-02-19 | 2020-07-10 | 中国建设银行股份有限公司 | Method and device for acquiring total data of section dates, electronic equipment and storage medium |
CN112765135A (en) * | 2021-01-29 | 2021-05-07 | 北京达佳互联信息技术有限公司 | Data processing method and device, electronic equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8386541B2 (en) * | 2008-09-16 | 2013-02-26 | Bank Of America Corporation | Dynamic change data capture process |
-
2021
- 2021-09-02 CN CN202111026654.8A patent/CN113704268B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111400304A (en) * | 2020-02-19 | 2020-07-10 | 中国建设银行股份有限公司 | Method and device for acquiring total data of section dates, electronic equipment and storage medium |
CN112765135A (en) * | 2021-01-29 | 2021-05-07 | 北京达佳互联信息技术有限公司 | Data processing method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113704268A (en) | 2021-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106202569A (en) | A kind of cleaning method based on big data quantity | |
CN106909642B (en) | Database indexing method and system | |
CN102637178A (en) | Music recommending method, music recommending device and music recommending system | |
WO2009010950A1 (en) | System and method for predicting a measure of anomalousness and similarity of records in relation to a set of reference records | |
CN107832333B (en) | Method and system for constructing user network data fingerprint based on distributed processing and DPI data | |
KR20130036094A (en) | Managing storage of individually accessible data units | |
CN111914294B (en) | Database sensitive data identification method and system | |
CN114238360A (en) | User behavior analysis system | |
CN111367956A (en) | Data statistical method and device | |
CN113704268B (en) | Data processing method, device, storage medium and equipment | |
CN116821053B (en) | Data reporting method, device, computer equipment and storage medium | |
CN113609389A (en) | Community platform information pushing method and system | |
CN106919566A (en) | A kind of query statistic method and system based on mass data | |
CN112632154B (en) | Method and device for determining parallel service quantity and time interval based on time data | |
CN101799803B (en) | Method, module and system for processing information | |
CN114357082A (en) | Cloud computing-based big data analysis method and system | |
Kalfus et al. | A selective data retention approach in massive databases | |
CN114416731A (en) | Data storage method, data reading method, data storage device, electronic device and medium | |
CN108256839B (en) | Numerical resource rollback method, device, server and storage medium | |
CN110991823A (en) | Method and device for processing service data, computer equipment and storage medium | |
CN112181994A (en) | Method, device and medium for refreshing distributed memory database of operation and maintenance big data | |
CN112131215A (en) | Bottom-up database information acquisition method and device | |
CN111221824B (en) | Storage optimization method, device, equipment and medium for storage space | |
CN109872181B (en) | Commercial information processing method, device and storage medium | |
CN112732194B (en) | Irregular data storage method, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |