CN113704268B - Data processing method, device, storage medium and equipment - Google Patents

Data processing method, device, storage medium and equipment Download PDF

Info

Publication number
CN113704268B
CN113704268B CN202111026654.8A CN202111026654A CN113704268B CN 113704268 B CN113704268 B CN 113704268B CN 202111026654 A CN202111026654 A CN 202111026654A CN 113704268 B CN113704268 B CN 113704268B
Authority
CN
China
Prior art keywords
data
date
full
effective
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111026654.8A
Other languages
Chinese (zh)
Other versions
CN113704268A (en
Inventor
仪明锋
赵玮
李亮
李聪依
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202111026654.8A priority Critical patent/CN113704268B/en
Publication of CN113704268A publication Critical patent/CN113704268A/en
Application granted granted Critical
Publication of CN113704268B publication Critical patent/CN113704268B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Abstract

The application discloses a data processing method, a device, a storage medium and equipment, which are used for acquiring all total data pre-stored in a main gear table, classifying all the total data and obtaining a plurality of total data groups. And selecting the full data with the earliest data date from the full data shown in each full data packet as effective data. And classifying each effective data to obtain a plurality of effective data packets. For each valid data packet, ordering the valid data according to the sequence from the early date to the late date of the data to obtain a valid data sequence. For each valid data sequence, an attribute is added to each valid data shown for the valid data sequence. And constructing a data pull chain table based on each effective data and the effective date and the expiration date of each effective data. By utilizing the scheme disclosed by the application, the data is stored by utilizing the data pull chain table, and the consumption of hardware resources corresponding to the data storage can be effectively reduced based on the characteristics of the data pull chain table.

Description

Data processing method, device, storage medium and equipment
Technical Field
The present application relates to the field of big data, and in particular, to a data processing method, apparatus, storage medium, and device.
Background
With the widespread use of big data technology, the traffic data volume of large application programs is rapidly increasing. In some situations, such as financial transaction data, customer data, etc., analysis of historical data needs to be implemented, trends of data changes are extracted, and prediction of business development or risk early warning is performed. For this reason, how to implement the storage and query of the history data under the limited storage space and computing power becomes a research hotspot in the field.
At present, the common data storage modes are: slice data is saved in full at some temporal granularity (typically daily). However, based on the existing data storage method, more redundant data is generated, and a large amount of storage space is occupied by massive redundant data, so that a large amount of hardware resources are consumed, and the hardware cost is increased.
Disclosure of Invention
The application provides a data processing method, a data processing device, a storage medium and data processing equipment, and aims to reduce consumption of hardware resources.
In order to achieve the above object, the present application provides the following technical solutions:
A data processing method, comprising:
acquiring all data pre-stored in a main gear table; the full data comprises a primary key, a field and a data date;
classifying the full data to obtain a plurality of full data packets; the plurality of full-volume data with the same main key and the same field are divided into the same full-volume data packet;
selecting full data with earliest data date from the full data shown in each full data packet as effective data of each full data packet;
classifying each effective data to obtain a plurality of effective data groups; the method comprises the steps that a plurality of valid data with the same main key are divided into the same valid data packet;
for each effective data packet, ordering the effective data shown by the effective data packet according to the order of the data date from early to late to obtain an effective data sequence corresponding to the effective data packet;
adding an attribute to each valid data shown in the valid data sequence for each valid data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the effective data; the expiration date of the last valid data in the valid data sequence is set as a preset date; the expiration date of the valid data arranged in the n-1 th bit in the valid data sequence is set as a first date; n=1, 2,3,..m-1; m represents the number of valid data contained in the valid data sequence; the first date is one day later than the effective date of the n-th valid data arranged in the valid data sequence;
And constructing a data pull chain table based on each effective data and the effective date and the expiration date of each effective data.
Optionally, selecting, from the respective full-size data shown in each of the full-size data packets, the full-size data with the earliest data date as the valid data of each of the full-size data packets, including:
for each full data packet, sequencing all full data shown by the full data packet according to the sequence from early to late of the data date to obtain a full data sequence corresponding to each full data packet;
and selecting the full data arranged at the first position in the full data sequence as the effective data of each full data sequence for each full data sequence.
Optionally, the method further comprises:
under the condition that the main gear table is detected to generate incremental data, extracting effective data with the same main key as the main key of the incremental data from the data pull chain table, marking the effective data as historical data, and deleting the historical data from the data pull chain table; the incremental data comprises a primary key, a field and a data date;
writing the incremental data and the historical data into a preset temporary table;
Classifying each data shown in the temporary table to obtain a plurality of data packets; a plurality of data with the same main key and the same field are divided into the same data packet;
selecting the data with the earliest data date from the data shown in each data packet as target data of each data packet;
classifying each target data to obtain a plurality of target data packets; multiple target data with the same main key are divided into the same target data group;
for each target data packet, sequencing each target data shown by the target data packet according to the sequence from the early date to the late date of the data to obtain a target data sequence corresponding to the target data packet;
adding an attribute to each target data shown in the target data sequence for each target data sequence; wherein the effective date of the target data is the same as the data date shown by the target data; setting the expiration date of the last target data in the target data sequence as the preset date; setting the expiration date of the s-1 th bit of target data arranged in the target data sequence as a second date; s=1, 2,3, k-1; k represents the number of target data contained in the target data sequence; the second date is one day later than the effective date of the target data arranged at the s-th position in the target data sequence;
And writing the target data into the data pull chain table.
Optionally, the method further comprises:
under the condition that a data reading instruction sent by a user is received, extracting data with the valid period covering a time node shown by the data reading instruction from each data shown by the data pull chain table as reply data, and sending the reply data to the user; wherein the expiration date is used for indicating a period of time from the date of effectiveness of the data to the end of the date of expiration of the data.
A data processing apparatus comprising:
the acquisition unit is used for acquiring all data pre-stored in the main gear table; the full data comprises a primary key, a field and a data date;
the first classification unit is used for classifying the full data to obtain a plurality of full data packets; the plurality of full-volume data with the same main key and the same field are divided into the same full-volume data packet;
a selecting unit, configured to select, from among the respective full-size data shown in each of the full-size data packets, full-size data having an earliest data date as valid data of each of the full-size data packets;
A second classification unit, configured to classify each of the valid data to obtain a plurality of valid data packets; the method comprises the steps that a plurality of valid data with the same main key are divided into the same valid data packet;
the ordering unit is used for ordering the effective data shown by the effective data packets according to the order of the data date from early to late for each effective data packet to obtain an effective data sequence corresponding to the effective data packet;
an adding unit, configured to add, for each valid data sequence, an attribute to each valid data shown by the valid data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the effective data; the expiration date of the last valid data in the valid data sequence is set as a preset date; the expiration date of the valid data arranged in the n-1 th bit in the valid data sequence is set as a first date; n=1, 2,3,..m-1; m represents the number of valid data contained in the valid data sequence; the first date is one day later than the effective date of the n-th valid data arranged in the valid data sequence;
And the construction unit is used for constructing a data pull chain table based on the effective data and the effective date and the expiration date of each effective data.
Optionally, the selecting unit is specifically configured to:
for each full data packet, sequencing all full data shown by the full data packet according to the sequence from early to late of the data date to obtain a full data sequence corresponding to each full data packet;
and selecting the full data arranged at the first position in the full data sequence as the effective data of each full data sequence for each full data sequence.
Optionally, the method further comprises:
an incremental data loading unit for:
under the condition that the main gear table is detected to generate incremental data, extracting effective data with the same main key as the main key of the incremental data from the data pull chain table, marking the effective data as historical data, and deleting the historical data from the data pull chain table; the incremental data comprises a primary key, a field and a data date;
writing the incremental data and the historical data into a preset temporary table;
classifying each data shown in the temporary table to obtain a plurality of data packets; a plurality of data with the same main key and the same field are divided into the same data packet;
Selecting the data with the earliest data date from the data shown in each data packet as target data of each data packet;
classifying each target data to obtain a plurality of target data packets; multiple target data with the same main key are divided into the same target data group;
for each target data packet, sequencing each target data shown by the target data packet according to the sequence from the early date to the late date of the data to obtain a target data sequence corresponding to the target data packet;
adding an attribute to each target data shown in the target data sequence for each target data sequence; wherein the effective date of the target data is the same as the data date shown by the target data; setting the expiration date of the last target data in the target data sequence as the preset date; setting the expiration date of the s-1 th bit of target data arranged in the target data sequence as a second date; s=1, 2,3, k-1; k represents the number of target data contained in the target data sequence; the second date is one day later than the effective date of the target data arranged at the s-th position in the target data sequence;
And writing the target data into the data pull chain table.
Optionally, the method further comprises:
the data reading unit is used for extracting data with the validity period covering a time node shown by the data reading instruction from each data shown by the data pull chain table as reply data under the condition that the data reading instruction sent by a user is received, and sending the reply data to the user; wherein the expiration date is used for indicating a period of time from the date of effectiveness of the data to the end of the date of expiration of the data.
A computer-readable storage medium including a stored program, wherein the program performs the data processing method.
A data processing apparatus comprising: a processor, a memory, and a bus; the processor is connected with the memory through the bus;
the memory is used for storing a program, and the processor is used for running the program, wherein the data processing method is executed when the program runs.
According to the technical scheme provided by the application, all the total data pre-stored in the main gear table are acquired, and all the total data are classified to obtain a plurality of total data groups. And selecting the full data with the earliest data date from the full data shown in each full data packet as the effective data of each full data packet. And classifying each effective data to obtain a plurality of effective data packets. For each valid data packet, the valid data shown by the valid data packet are ordered according to the order of the data date from early to late, and a valid data sequence corresponding to the valid data packet is obtained. For each valid data sequence, an attribute is added to each valid data shown for the valid data sequence. And constructing a data pull chain table based on each valid data and the effective date and the expiration date of each valid data. By utilizing the scheme disclosed by the application, the effective date and the expiration date can be added for the data recorded in the main file table, the data pull chain table is constructed based on each data and the effective date and the expiration date of each data, the data is stored by utilizing the data pull chain table, and the consumption of corresponding hardware resources for data storage can be effectively reduced based on the characteristics of the data pull chain table.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1a is a schematic diagram of a data processing method according to an embodiment of the present application;
FIG. 1b is a schematic diagram of a data processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of another data processing method according to an embodiment of the present application;
fig. 3 is a schematic diagram of a data processing apparatus according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The scheme provided by the embodiment of the application realizes the data storage by using the data pull chain table, and can effectively reduce the consumption of hardware resources corresponding to the data storage based on the characteristics of the data pull chain table.
The following are terms and corresponding explanations that may be involved in the present application:
analysis function: a powerful function dedicated to solving the statistical requirements of complex report forms, it can group in data and then calculate some statistics on a group basis, and each row of each group can return one statistic. When using an analysis function, an analysis function return value is added to each line of query returns.
row_number function: the partitionyby clauses are grouped, the data within the group is ordered by orderby clause, and then each line is assigned a number, forming a sequence that is accumulated back starting at 1.
lead function: is an offset function, partitionby clause groupings, the data within the group ordered by orderby clause, and the field values for N rows down from the current row are queried.
A main gear table: different from the detail list, a certain main body is used as a main key to store data, such as client information, account information and the like, and when the main body information is changed, the main file list can only store the current latest data.
Data pull linked list: a data history storage mode is characterized in that on the basis of an original master file table, flag information such as effective time and failure time of data is added, and the validity period of the data is stored.
Date of validation: date of the data generation.
Expiration date: date when a piece of data changed. Typically the expiration date of one record is equal to the effective date of the next record of the data. The effective date and expiration date of all records of a piece of data are one continuous, non-overlapping time record. The data remains unchanged during a time record.
As shown in fig. 1a and fig. 1b, a schematic diagram of a data processing method according to an embodiment of the present application includes the following steps:
s101: and acquiring all data pre-stored in a main gear table.
Wherein the full data includes a primary key, a field, and a date of the data.
It should be noted that, by running the SQL statement, the data extraction may be performed on the main file table, so as to obtain all the total data pre-stored in the main file table. Specifically, the total data extracted from the master table can be referred to as table 1.
TABLE 1
Main key 1 Main key 2 Field 1 Field 2 Field 3 Date of data
Key_11 Key_21 Col_11 Col_21 Col_31 2020-01-23
Key_11 Key_21 Col_11 Col_21 Col_31 2020-01-23
Key_11 Key_21 Col_11 Col_21 Col_32 2020-02-20
Key_11 Key_21 Col_11 Col_22 Col_32 2020-03-15
Key_22 Key_22 Col_12 Col_22 Col_33 2020-05-02
It should be noted that the contents shown in table 1 above are only for illustration.
S102: and classifying each full-volume data to obtain a plurality of full-volume data packets.
Wherein, the plurality of full data with the same main key and the same field are all divided into the same full data packet.
S103: for each full data packet, sorting the full data shown by the full data packets according to the order of the data date from early to late to obtain a full data sequence corresponding to each full data packet.
Wherein, each full data shown in each full data sequence is provided with a corresponding serial number.
Specifically, taking the full data shown in table 1 as an example, the full data shown in the full data packets are sorted according to the order of the data dates from early to late, so as to obtain full data sequences corresponding to each full data packet, and the full data sequences can be shown in table 2.
TABLE 2
Main key 1 Main key 2 Field 1 Field 2 Field 3 Date of data Sequence number
Key_11 Key_21 Col_11 Col_21 Col_31 2020-01-23 1
Key_11 Key_21 Col_11 Col_21 Col_31 2020-01-23 2
Key_11 Key_21 Col_11 Col_21 Col_32 2020-02-20 1
Key_11 Key_21 Col_11 Col_22 Col_32 2020-03-15 1
Key_22 Key_22 Col_12 Col_22 Col_33 2020-05-02 1
It should be noted that the contents shown in table 2 above are only for illustration.
S104: for each full-volume data sequence, selecting the first full-volume data in the full-volume data sequence as the effective data of each full-volume data sequence.
The flow shown in S102, S103, and S104 may be understood as performing a deduplication operation on all the full data shown in the main gear table, that is, filtering out all the other full data except the valid data, and only retaining all the valid data.
It should be noted that, because the data is abnormal in the process of loading the full data by the main file table, the same situation of the main key occurs, so that the duplicate removal operation needs to be performed on each full data, so as to avoid the situation that an abnormal data zipper is formed in the later stage and prevent the situation that the main key is repeated when the data is read in the later stage.
In the embodiment of the present application, the flows shown in S102, S103, and S104 may be implemented by calling a row_number function (an existing analysis function), where specific calling logic of the row_number function is: row_number (over (partitionby [ primary key 1, primary key 2. ] orderby [ field 1, field 2 … ])).
Specifically, taking the respective full-size data sequences shown in table 2 as an example, the respective effective data obtained via S102, S103, and S104 can be referred to as shown in table 3.
TABLE 3 Table 3
It should be noted that the contents shown in table 3 above are only for illustration.
S105: and classifying each effective data to obtain a plurality of effective data packets.
Wherein, the same plurality of valid data of the main key are all divided into the same valid data packet.
S106: for each valid data packet, the valid data shown by the valid data packet are ordered according to the order of the data date from early to late, and a valid data sequence corresponding to the valid data packet is obtained.
S107: for each valid data sequence, an attribute is added to each valid data shown for the valid data sequence.
The attributes include an effective date and an expiration date, and the effective date is the same as the data date shown by the effective data. The expiration date of the last valid data in the valid data sequence is set to a preset date, the expiration date of the n-1 th valid data in the valid data sequence is set to a first date, n=1, 2, 3.
The flows shown in S105, S106, and S107 may be implemented by calling a lead function (an existing analysis function), where specific calling logic of the lead function is: lead (TO_CHAR (TO_DATE (DATE of data, 'YYYY-MM-DD') -1, 'YYYY-MM-DD'), 1, '2999-12-31') over (partitioniby [ primary key 1, primary key 2. ] orderby [ field 1, field 2 … ]).
Specifically, taking the effective data shown in table 3 as an example, the respective effective data obtained via S105, S106, and S107 can be referred to as shown in table 4.
TABLE 4 Table 4
Main key 1 Main key 2 Field 1 Field 2 Field 3 Date of data Date of effectiveness Expiration date
Key_11 Key_21 Col_11 Col_21 Col_31 2020-01-23 2020-01-23 2020-02-19
Key_11 Key_21 Col_11 Col_21 Col_32 2020-02-20 2020-02-20 2020-03-14
Key_11 Key_21 Col_11 Col_22 Col_32 2020-03-15 2020-03-15 2999-12-31
Key_22 Key_22 Col_12 Col_22 Col_33 2020-05-02 2020-05-02 2999-12-31
In Table 4, 2999-12-31 represents a preset date.
Further, the contents shown in table 4 above are for illustration only.
S108: and constructing a data pull chain table based on each valid data and the effective date and the expiration date of each valid data.
S109: and under the condition that the fact that the main gear table generates the incremental data is detected, extracting valid data with the same main key as the main key of the incremental data from the data pull chain table, marking the valid data as historical data, and deleting the historical data from the data pull chain table.
Wherein the delta data includes a primary key, a field, and a date of the data. In addition, the master profile generates incremental data, both representing the business project and new data, and writes the new data into the master profile.
S110: and writing the incremental data and the historical data into a preset temporary table.
S111: and classifying each data shown in the temporary table to obtain a plurality of data packets.
Wherein, the plurality of data with the same main key and the same field are all divided into the same data packet.
S112: from the respective data shown in each data packet, the data with the earliest data date is selected as the target data of each data packet.
The flow shown in S111 and S112 may be implemented by calling a row_number function.
S113: and classifying each target data to obtain a plurality of target data packets.
Wherein, a plurality of target data with the same main key are all divided into the same target data packet.
S114: for each target data packet, sequencing the target data shown by the target data packet according to the sequence from the early date to the late date of the data to obtain a target data sequence corresponding to the target data packet.
S115: for each target data sequence, an attribute is added to each target data shown for the target data sequence.
Wherein the effective date of the target data is the same as the data date shown by the target data. The expiration date of the last-bit target data in the target data sequence is set to a preset date, the expiration date of the s-1 th-bit target data in the target data sequence is set to a second date, s=1, 2,3,.
The flows shown in S113, S114, and S115 may be implemented by calling the lead function.
S116: and writing each target data into the data pull chain table.
S117: and under the condition that the data reading instruction sent by the user is received, extracting data with the validity period covering the time node shown by the data reading instruction from each data shown by the data pull chain table as reply data, and sending the reply data to the user.
The validity period is used for indicating a period from the date of validity of the data to the end of the date of expiration of the data.
In summary, by using the scheme shown in the embodiment, the effective date and the expiration date can be added to the data (including the full data and the incremental data) recorded in the main file table, and the data pull chain table is constructed based on each data and the effective date and the expiration date of each data, and the data is stored by using the data pull chain table, so that the consumption of the hardware resources corresponding to the data storage can be effectively reduced based on the characteristics of the data pull chain table.
It should be noted that S109 mentioned in the foregoing embodiment is an alternative implementation of the data processing method shown in the present application. In addition, S117 mentioned in the foregoing embodiment is also an alternative implementation of the data processing method shown in the present application. For this reason, the flow mentioned in the above embodiment can be summarized as the method shown in fig. 2.
As shown in fig. 2, a schematic diagram of another data processing method according to an embodiment of the present application includes the following steps:
s201: and acquiring all data pre-stored in a main gear table.
Wherein the full data includes a primary key, a field, and a date of the data.
S202: and classifying each full-volume data to obtain a plurality of full-volume data packets.
Wherein, the plurality of full data with the same main key and the same field are all divided into the same full data packet.
S203: and selecting the full data with the earliest data date from the full data shown in each full data packet as the effective data of each full data packet.
S204: and classifying each effective data to obtain a plurality of effective data packets.
Wherein, the same plurality of valid data of the main key are all divided into the same valid data packet.
S205: for each valid data packet, the valid data shown by the valid data packet are ordered according to the order of the data date from early to late, and a valid data sequence corresponding to the valid data packet is obtained.
S206: for each valid data sequence, an attribute is added to each valid data shown for the valid data sequence.
Wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the effective data; the expiration date of the last valid data in the valid data sequence is set as a preset date; the expiration date of the valid data arranged in the n-1 th bit in the valid data sequence is set as the first date; n=1, 2,3,..m-1; m represents the number of valid data contained in the valid data sequence; the first date is one day later than the effective date of the n-th valid data arranged in the valid data sequence.
S207: and constructing a data pull chain table based on each valid data and the effective date and the expiration date of each valid data.
In summary, by using the scheme shown in the embodiment, the effective date and the expiration date can be added to the data recorded in the main file table, the data pull chain table is constructed based on each data and the effective date and the expiration date of each data, and the data is stored by using the data pull chain table, so that the consumption of hardware resources corresponding to the data storage can be effectively reduced based on the characteristics of the data pull chain table.
Corresponding to the data processing method provided by the embodiment of the application, the embodiment of the application also provides a data processing device.
Fig. 3 is a schematic diagram of an architecture of a data processing apparatus according to an embodiment of the present application, including:
an obtaining unit 100, configured to obtain all the total data pre-stored in the master file table; the full data includes a primary key, a field, and a date of the data.
A first classification unit 200, configured to classify each full-size data to obtain a plurality of full-size data packets; multiple full data with the same primary key and the same field are all divided into the same full data packet.
And a selecting unit 300 for selecting, from among the respective full data shown in each full data packet, the full data having the earliest data date as the effective data of each full data packet.
The selecting unit 300 is specifically configured to: for each full data packet, sequencing all full data shown by the full data packet according to the sequence from early to late of the data date to obtain a full data sequence corresponding to each full data packet; for each full-volume data sequence, selecting the first full-volume data in the full-volume data sequence as the effective data of each full-volume data sequence.
A second classifying unit 400, configured to classify each valid data to obtain a plurality of valid data packets; multiple valid data with the same primary key are all divided into the same valid data packet.
The sorting unit 500 is configured to sort, for each valid data packet, the valid data shown in the valid data packet according to the order from early to late of the data date, so as to obtain a valid data sequence corresponding to the valid data packet.
An adding unit 600 for adding, for each valid data sequence, an attribute to each valid data shown by the valid data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the effective data; the expiration date of the last valid data in the valid data sequence is set as a preset date; the expiration date of the valid data arranged in the n-1 th bit in the valid data sequence is set as the first date; n=1, 2,3,..m-1; m represents the number of valid data contained in the valid data sequence; the first date is one day later than the effective date of the n-th valid data arranged in the valid data sequence.
A construction unit 700, configured to construct a data pull chain table based on each valid data, and the effective date and the expiration date of each valid data.
An incremental data loading unit 800 for: under the condition that the main gear table is detected to generate incremental data, extracting effective data with the same main key as the main key of the incremental data from the data pull chain table, marking the effective data as historical data, and deleting the historical data from the data pull chain table; the incremental data includes a primary key, a field, and a data date; loading the increment data and the history data into a preset temporary table; classifying each data shown in the temporary table to obtain a plurality of data packets; multiple data with the same main key and the same field are divided into the same data packet; selecting the data with the earliest data date from the data shown in each data packet as target data of each data packet; classifying each target data to obtain a plurality of target data groups; multiple target data with the same main key are divided into the same target data group; for each target data packet, sequencing each target data shown by the target data packet according to the sequence from the early date to the late date of the data to obtain a target data sequence corresponding to the target data packet; adding an attribute to each target data shown by the target data sequence for each target data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the target data; setting the expiration date of the last target data in the target data sequence as a preset date; setting the expiration date of the target data arranged in the s-1 th bit in the target data sequence as a second date; s=1, 2,3, k-1; k represents the number of target data contained in the target data sequence; the second date is one day later than the effective date of the target data arranged at the s-th position in the target data sequence; and loading each target data into a data pull chain table.
The data reading unit 900 is configured to extract, when receiving a data reading instruction sent by a user, data whose validity period covers a time node indicated by the data reading instruction from the data pull chain table, and send the data to the user; the validity period is used for indicating a period from the date of validity of the data to the end of the date of expiration of the data.
In summary, by using the scheme shown in the embodiment, the effective date and the expiration date can be added to the data recorded in the main file table, the data pull chain table is constructed based on each data and the effective date and the expiration date of each data, and the data is stored by using the data pull chain table, so that the consumption of hardware resources corresponding to the data storage can be effectively reduced based on the characteristics of the data pull chain table.
The present application also provides a computer readable storage medium including a stored program, wherein the program executes the data processing method provided by the present application.
The application also provides a data processing device comprising: a processor, a memory, and a bus. The processor is connected with the memory through a bus, the memory is used for storing a program, and the processor is used for running the program, wherein the data processing method provided by the application is executed when the program runs, and the method comprises the following steps:
Acquiring all data pre-stored in a main gear table; the full data comprises a primary key, a field and a data date;
classifying the full data to obtain a plurality of full data packets; the plurality of full-volume data with the same main key and the same field are divided into the same full-volume data packet;
selecting full data with earliest data date from the full data shown in each full data packet as effective data of each full data packet;
classifying each effective data to obtain a plurality of effective data groups; the method comprises the steps that a plurality of valid data with the same main key are divided into the same valid data packet;
for each effective data packet, ordering the effective data shown by the effective data packet according to the order of the data date from early to late to obtain an effective data sequence corresponding to the effective data packet;
adding an attribute to each valid data shown in the valid data sequence for each valid data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the effective data; the expiration date of the last valid data in the valid data sequence is set as a preset date; the expiration date of the valid data arranged in the n-1 th bit in the valid data sequence is set as a first date; n=1, 2,3,..m-1; m represents the number of valid data contained in the valid data sequence; the first date is one day later than the effective date of the n-th valid data arranged in the valid data sequence;
And constructing a data pull chain table based on each effective data and the effective date and the expiration date of each effective data.
Optionally, selecting, from the respective full-size data shown in each of the full-size data packets, the full-size data with the earliest data date as the valid data of each of the full-size data packets, including:
for each full data packet, sequencing all full data shown by the full data packet according to the sequence from early to late of the data date to obtain a full data sequence corresponding to each full data packet;
and selecting the full data arranged at the first position in the full data sequence as the effective data of each full data sequence for each full data sequence.
Optionally, the method further comprises:
under the condition that the main gear table is detected to generate incremental data, extracting effective data with the same main key as the main key of the incremental data from the data pull chain table, marking the effective data as historical data, and deleting the historical data from the data pull chain table; the incremental data comprises a primary key, a field and a data date;
loading the incremental data and the historical data into a preset temporary table;
Classifying each data shown in the temporary table to obtain a plurality of data packets; a plurality of data with the same main key and the same field are divided into the same data packet;
selecting the data with the earliest data date from the data shown in each data packet as target data of each data packet;
classifying each target data to obtain a plurality of target data packets; multiple target data with the same main key are divided into the same target data group;
for each target data packet, sequencing each target data shown by the target data packet according to the sequence from the early date to the late date of the data to obtain a target data sequence corresponding to the target data packet;
adding an attribute to each target data shown in the target data sequence for each target data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the target data; setting the expiration date of the last target data in the target data sequence as the preset date; setting the expiration date of the s-1 th bit of target data arranged in the target data sequence as a second date; s=1, 2,3, k-1; k represents the number of target data contained in the target data sequence; the second date is one day later than the effective date of the target data arranged at the s-th position in the target data sequence;
And loading each target data into the data pull chain table.
Optionally, the method further comprises:
under the condition that a data reading instruction sent by a user is received, extracting data with the validity period covering a time node shown by the data reading instruction from the data pull chain table, and sending the data to the user; wherein the expiration date is used for indicating a period of time from the date of effectiveness of the data to the end of the date of expiration of the data.
The functions of the methods of embodiments of the present application, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored on a computing device readable storage medium. Based on such understanding, a part of the present application that contributes to the prior art or a part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device, etc.) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of data processing, comprising:
acquiring all data pre-stored in a main gear table; the full data comprises a primary key, a field and a data date;
classifying the full data to obtain a plurality of full data packets; the plurality of full-volume data with the same main key and the same field are divided into the same full-volume data packet;
selecting full data with earliest data date from the full data shown in each full data packet as effective data of each full data packet;
Classifying each effective data to obtain a plurality of effective data groups; the method comprises the steps that a plurality of valid data with the same main key are divided into the same valid data packet;
for each effective data packet, ordering the effective data shown by the effective data packet according to the order of the data date from early to late to obtain an effective data sequence corresponding to the effective data packet;
adding an attribute to each valid data shown in the valid data sequence for each valid data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the effective data; the expiration date of the last valid data in the valid data sequence is set as a preset date; the expiration date of the valid data arranged in the n-1 th bit in the valid data sequence is set as a first date; n=1, 2,3,..m-1; m represents the number of valid data contained in the valid data sequence; the first date is one day later than the effective date of the n-th valid data arranged in the valid data sequence;
and constructing a data pull chain table based on each effective data and the effective date and the expiration date of each effective data.
2. The method according to claim 1, wherein selecting, from among the respective full data shown in each of the full data packets, the full data having the earliest data date as the valid data of each of the full data packets, comprises:
for each full data packet, sequencing all full data shown by the full data packet according to the sequence from early to late of the data date to obtain a full data sequence corresponding to each full data packet;
and selecting the full data arranged at the first position in the full data sequence as the effective data of each full data sequence for each full data sequence.
3. The method as recited in claim 1, further comprising:
under the condition that the main gear table is detected to generate incremental data, extracting effective data with the same main key as the main key of the incremental data from the data pull chain table, marking the effective data as historical data, and deleting the historical data from the data pull chain table; the incremental data comprises a primary key, a field and a data date;
writing the incremental data and the historical data into a preset temporary table;
Classifying each data shown in the temporary table to obtain a plurality of data packets; a plurality of data with the same main key and the same field are divided into the same data packet;
selecting the data with the earliest data date from the data shown in each data packet as target data of each data packet;
classifying each target data to obtain a plurality of target data packets; multiple target data with the same main key are divided into the same target data group;
for each target data packet, sequencing each target data shown by the target data packet according to the sequence from the early date to the late date of the data to obtain a target data sequence corresponding to the target data packet;
adding an attribute to each target data shown in the target data sequence for each target data sequence; wherein the effective date of the target data is the same as the data date shown by the target data; setting the expiration date of the last target data in the target data sequence as the preset date; setting the expiration date of the s-1 th bit of target data arranged in the target data sequence as a second date; s=1, 2,3, k-1; k represents the number of target data contained in the target data sequence; the second date is one day later than the effective date of the target data arranged at the s-th position in the target data sequence;
And writing the target data into the data pull chain table.
4. The method as recited in claim 1, further comprising:
under the condition that a data reading instruction sent by a user is received, extracting data with the valid period covering a time node shown by the data reading instruction from each data shown by the data pull chain table as reply data, and sending the reply data to the user; wherein the expiration date is used for indicating a period of time from the date of effectiveness of the data to the end of the date of expiration of the data.
5. A data processing apparatus, comprising:
the acquisition unit is used for acquiring all data pre-stored in the main gear table; the full data comprises a primary key, a field and a data date;
the first classification unit is used for classifying the full data to obtain a plurality of full data packets; the plurality of full-volume data with the same main key and the same field are divided into the same full-volume data packet;
a selecting unit, configured to select, from among the respective full-size data shown in each of the full-size data packets, full-size data having an earliest data date as valid data of each of the full-size data packets;
A second classification unit, configured to classify each of the valid data to obtain a plurality of valid data packets; the method comprises the steps that a plurality of valid data with the same main key are divided into the same valid data packet;
the ordering unit is used for ordering the effective data shown by the effective data packets according to the order of the data date from early to late for each effective data packet to obtain an effective data sequence corresponding to the effective data packet;
an adding unit, configured to add, for each valid data sequence, an attribute to each valid data shown by the valid data sequence; wherein the attributes include an effective date and an expiration date; the effective date is the same as the data date shown by the effective data; the expiration date of the last valid data in the valid data sequence is set as a preset date; the expiration date of the valid data arranged in the n-1 th bit in the valid data sequence is set as a first date; n=1, 2,3,..m-1; m represents the number of valid data contained in the valid data sequence; the first date is one day later than the effective date of the n-th valid data arranged in the valid data sequence;
And the construction unit is used for constructing a data pull chain table based on the effective data and the effective date and the expiration date of each effective data.
6. The apparatus according to claim 5, wherein the selection unit is specifically configured to:
for each full data packet, sequencing all full data shown by the full data packet according to the sequence from early to late of the data date to obtain a full data sequence corresponding to each full data packet;
and selecting the full data arranged at the first position in the full data sequence as the effective data of each full data sequence for each full data sequence.
7. The apparatus as recited in claim 5, further comprising:
an incremental data loading unit for:
under the condition that the main gear table is detected to generate incremental data, extracting effective data with the same main key as the main key of the incremental data from the data pull chain table, marking the effective data as historical data, and deleting the historical data from the data pull chain table; the incremental data comprises a primary key, a field and a data date;
writing the incremental data and the historical data into a preset temporary table;
Classifying each data shown in the temporary table to obtain a plurality of data packets; a plurality of data with the same main key and the same field are divided into the same data packet;
selecting the data with the earliest data date from the data shown in each data packet as target data of each data packet;
classifying each target data to obtain a plurality of target data packets; multiple target data with the same main key are divided into the same target data group;
for each target data packet, sequencing each target data shown by the target data packet according to the sequence from the early date to the late date of the data to obtain a target data sequence corresponding to the target data packet;
adding an attribute to each target data shown in the target data sequence for each target data sequence; wherein the effective date of the target data is the same as the data date shown by the target data; setting the expiration date of the last target data in the target data sequence as the preset date; setting the expiration date of the s-1 th bit of target data arranged in the target data sequence as a second date; s=1, 2,3, k-1; k represents the number of target data contained in the target data sequence; the second date is one day later than the effective date of the target data arranged at the s-th position in the target data sequence;
And writing the target data into the data pull chain table.
8. The apparatus as recited in claim 5, further comprising:
the data reading unit is used for extracting data with the validity period covering a time node shown by the data reading instruction from each data shown by the data pull chain table as reply data under the condition that the data reading instruction sent by a user is received, and sending the reply data to the user; wherein the expiration date is used for indicating a period of time from the date of effectiveness of the data to the end of the date of expiration of the data.
9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program performs the data processing method of any one of claims 1-4.
10. A data processing apparatus, comprising: a processor, a memory, and a bus; the processor is connected with the memory through the bus;
the memory is used for storing a program, and the processor is used for running the program, wherein the program runs to execute the data processing method according to any one of claims 1 to 4.
CN202111026654.8A 2021-09-02 2021-09-02 Data processing method, device, storage medium and equipment Active CN113704268B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111026654.8A CN113704268B (en) 2021-09-02 2021-09-02 Data processing method, device, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111026654.8A CN113704268B (en) 2021-09-02 2021-09-02 Data processing method, device, storage medium and equipment

Publications (2)

Publication Number Publication Date
CN113704268A CN113704268A (en) 2021-11-26
CN113704268B true CN113704268B (en) 2023-12-08

Family

ID=78657408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111026654.8A Active CN113704268B (en) 2021-09-02 2021-09-02 Data processing method, device, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN113704268B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400304A (en) * 2020-02-19 2020-07-10 中国建设银行股份有限公司 Method and device for acquiring total data of section dates, electronic equipment and storage medium
CN112765135A (en) * 2021-01-29 2021-05-07 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8386541B2 (en) * 2008-09-16 2013-02-26 Bank Of America Corporation Dynamic change data capture process

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400304A (en) * 2020-02-19 2020-07-10 中国建设银行股份有限公司 Method and device for acquiring total data of section dates, electronic equipment and storage medium
CN112765135A (en) * 2021-01-29 2021-05-07 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113704268A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
CN106202569A (en) A kind of cleaning method based on big data quantity
CN106909642B (en) Database indexing method and system
CN102637178A (en) Music recommending method, music recommending device and music recommending system
WO2009010950A1 (en) System and method for predicting a measure of anomalousness and similarity of records in relation to a set of reference records
CN107832333B (en) Method and system for constructing user network data fingerprint based on distributed processing and DPI data
KR20130036094A (en) Managing storage of individually accessible data units
CN111914294B (en) Database sensitive data identification method and system
CN114238360A (en) User behavior analysis system
CN111367956A (en) Data statistical method and device
CN113704268B (en) Data processing method, device, storage medium and equipment
CN116821053B (en) Data reporting method, device, computer equipment and storage medium
CN113609389A (en) Community platform information pushing method and system
CN106919566A (en) A kind of query statistic method and system based on mass data
CN112632154B (en) Method and device for determining parallel service quantity and time interval based on time data
CN101799803B (en) Method, module and system for processing information
CN114357082A (en) Cloud computing-based big data analysis method and system
Kalfus et al. A selective data retention approach in massive databases
CN114416731A (en) Data storage method, data reading method, data storage device, electronic device and medium
CN108256839B (en) Numerical resource rollback method, device, server and storage medium
CN110991823A (en) Method and device for processing service data, computer equipment and storage medium
CN112181994A (en) Method, device and medium for refreshing distributed memory database of operation and maintenance big data
CN112131215A (en) Bottom-up database information acquisition method and device
CN111221824B (en) Storage optimization method, device, equipment and medium for storage space
CN109872181B (en) Commercial information processing method, device and storage medium
CN112732194B (en) Irregular data storage method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant