CN113760888B - Method, device and storage medium for generating data table - Google Patents

Method, device and storage medium for generating data table Download PDF

Info

Publication number
CN113760888B
CN113760888B CN202011237403.XA CN202011237403A CN113760888B CN 113760888 B CN113760888 B CN 113760888B CN 202011237403 A CN202011237403 A CN 202011237403A CN 113760888 B CN113760888 B CN 113760888B
Authority
CN
China
Prior art keywords
dimension
data table
data
change information
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011237403.XA
Other languages
Chinese (zh)
Other versions
CN113760888A (en
Inventor
王子璠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Zhenshi Information Technology Co Ltd
Original Assignee
Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Zhenshi Information Technology Co Ltd filed Critical Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority to CN202011237403.XA priority Critical patent/CN113760888B/en
Publication of CN113760888A publication Critical patent/CN113760888A/en
Application granted granted Critical
Publication of CN113760888B publication Critical patent/CN113760888B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device and a storage medium for generating a data table, which are characterized in that firstly a current partition data set and an incremental data set corresponding to a first data table are acquired, the current partition data set and the incremental data set are associated through a primary key, secondly, whether each dimension in the first data table changes is judged based on the primary key existing in the current partition data set and the incremental data set, first dimension change information is generated when the dimension changes, finally, the first data table carrying the first dimension change information is denormalized to a second data table associated with the first data table, second dimension change information is generated based on the first dimension change information, and the second dimension change information is stored in the second data table. According to the embodiment of the application, the dimension change information is recorded when the dimension is changed, and the dimension change information is denormalized to the second data table, so that a user can acquire the dimension change information through the second data table, and the reason of the data change is known.

Description

Method, device and storage medium for generating data table
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, and a storage medium for generating a data table.
Background
In the field of dimensional modeling of data warehouses, metrics (typically expressed in terms of quantitative values) from business process events are called facts, and operational events occurring in the real world are stored in fact tables with their resulting measurable values. From the lowest level of granularity, a fact table row corresponds to one measurement event, so the fact table design is entirely dependent on physical activity. In addition to the numerical metrics, fact tables always contain foreign keys for associating dimensions with which they relate. Typically, the partially related dimensions will be de-normalized into fact tables.
In particular, the dimension is a field that provides context about "who, what, where, when, why, how" a business process time relates to. Each dimension in the dimension table changes, which causes the old row to fail and the new row to take effect. Classical dimension table implementations, while conveniently reflecting when a row of data is updated with an effective date and an expiration date, cannot reflect why the update. The reasons for the change in the acquired data may involve table association, time ordering, etc., which is very user unfriendly.
Disclosure of Invention
The embodiment of the application provides a method for generating a data table, which solves the problem that a user cannot intuitively check dimension data which changes when acquiring a fact table.
The method comprises the following steps:
acquiring a current partition data set and an incremental data set corresponding to a first data table, and associating the current partition data set and the incremental data set through a main key;
judging whether each dimension in the first data table changes based on the primary key existing in the current partition data set and the incremental data set, and generating first dimension change information when the dimensions change;
and de-normalizing the first data table carrying the first dimension change information to a second data table associated with the first data table, generating second dimension change information based on the first dimension change information, and storing the second dimension change information to the second data table.
Optionally, when the primary key exists only in the incremental dataset, generating the first dimension change information based on the incremental dataset, wherein the first dimension change information includes a dimension field name, a change date and a dimension current value of a changed data row in the first data table, and the dimension current value is a first character string indicating that no old data value exists;
when the primary key exists in both the current partition data set and the incremental data set, sequentially comparing whether the value of each dimension in the data row in the current partition data set and the data row in the incremental data set is equal, and generating the first dimension transformation information when the values are not equal, wherein the first dimension transformation information comprises the dimension field name, the change date and the dimension current value.
Optionally, according to the value of the external key of the second data table to be updated, searching the main key of the first data table, and positioning the main key to the data row in the first data table corresponding to the second data table;
and storing the data of the dimension in the data row into a corresponding field of the second data table.
Optionally, the dimension field name, the change date and the dimension current value contained in the first dimension change information are added to the second dimension change information, and the second dimension change information further includes a dimension identifier, where the dimension identifier is used to represent the same value of the primary key of the first data table as the value of the foreign key of the second data table.
In another embodiment of the present application, there is provided an apparatus for generating a data table, the apparatus including:
the acquisition module is used for acquiring a current partition data set and an incremental data set corresponding to the first data table, and associating the current partition data set and the incremental data set through a main key;
the first generation module is used for judging whether each dimension in the first data table changes based on the main key existing in the current partition data set and the incremental data set, and generating first dimension change information when the dimensions change;
and the second generation module is used for denormalizing the first data table carrying the first dimension change information to a second data table associated with the first data table, generating second dimension change information based on the first dimension change information, and storing the second dimension change information into the second data table.
Optionally, the first generating module includes:
a first generation unit configured to generate, when the primary key exists only in the incremental dataset, the first dimension change information based on the incremental dataset, where the first dimension change information includes a dimension field name, a change date, and a dimension current value of a changed data row in the first data table, and the dimension current value is a first string indicating that no old data value exists;
and the second generation unit is used for sequentially comparing whether the value of each dimension in the data row in the current partition data set and the data row in the increment data set is equal or not when the primary key exists in the current partition data set and the increment data set, and generating the first dimension transformation information when the values of the dimensions in the data row in the current partition data set and the data row in the increment data set are unequal, wherein the first dimension transformation information comprises the dimension field name, the change date and the current dimension value.
Optionally, the second generating module includes:
the positioning unit is used for searching the main key of the first data table according to the value of the external key of the second data table to be updated and positioning the main key to the data row in the first data table corresponding to the second data table;
and the storage unit is used for storing the data of the dimension in the data row to the corresponding field of the second data table.
Optionally, the second generating module is further configured to:
and adding the dimension field name, the change date and the dimension current value contained in the first dimension change information to the second dimension change information, wherein the second dimension change information further comprises a dimension identifier, and the dimension identifier is used for representing the numerical value of the main key of the first data table, which is the same as the numerical value of the external key of the second data table.
In another embodiment of the present application, a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of one of the methods of generating a data table described above is provided.
In another embodiment of the present application, a terminal device is provided, including a processor configured to perform each step of the above-described method for generating a data table.
Based on the above embodiment, firstly, the current partition data set and the incremental data set corresponding to the first data table are acquired, the current partition data set and the incremental data set are associated through the primary key, secondly, whether each dimension in the first data table changes is judged based on the primary key existing in the current partition data set and the incremental data set, first dimension change information is generated when the dimension changes, finally, the first data table carrying the first dimension change information is denormalized to a second data table associated with the first data table, second dimension change information is generated based on the first dimension change information, and the second dimension change information is stored in the second data table. According to the embodiment of the application, the dimension change information is recorded when the dimension is changed, and the dimension change information is denormalized to the second data table, so that a user can acquire the dimension change information through the second data table, and the reason of the data change is known.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for generating a data table according to an embodiment 100 of the present application;
FIG. 2 is a schematic diagram showing a specific flow of a method for generating a data table according to an embodiment 200 of the present application;
FIG. 3 shows a schematic diagram of an apparatus for generating a data table according to an embodiment 300 of the present application;
fig. 4 shows a schematic diagram of a terminal device according to an embodiment 400 of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be capable of being practiced otherwise than as specifically illustrated and described. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Based on the problems in the prior art, the embodiment of the application provides a method for generating a data table, which is mainly applicable to the technical field of computers. By denormalizing fields in the dimension table that reflect dimension change information to the fact table, the generation process of the dimension change information is integrated into the association process of the fact table and the dimension table, which is inherent and conventional during the processing of the fact table, without additional association steps, so as to realize a method for generating the data table. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Fig. 1 is a schematic flow chart of a method for generating a data table according to an embodiment 100 of the present application. Wherein, the detailed steps are as follows:
step S11, a current partition data set and an incremental data set corresponding to the first data table are obtained, and the current partition data set and the incremental data set are associated through a main key.
In this step, for convenience of presentation, the default first data table in the embodiment of the present application is updated daily, and the update frequency is applicable when the update frequency is month or year. In particular, the first data table contains descriptive attributes for filtering and classifying facts. The current partition data set corresponding to the first data table is the latest partition data set generated by the current date, namely yesterday, and the increment data set is the data set which is generated by the current date and is not loaded. Further, a primary key of the first data table is used to associate the current partition dataset with the delta dataset.
Step S12, judging whether each dimension in the first data table changes based on the primary key existing in the current partition data set and the incremental data set, and generating first dimension change information when the dimensions change.
In this step, a primary key is searched for in the current partition dataset and the incremental dataset, and whether the corresponding dimension in each data line in the first data table changes or not is determined by the primary key. Specifically, when a primary key exists in both the current partition data set and the incremental data set, and the values of all dimensions in the data row corresponding to the current partition data set and the data row corresponding to the incremental data set are equal, or only when the primary key exists in the current partition data set, it may be determined that the first data table is unchanged. Further, in addition to the foregoing, it may be determined that the first data table has changed, at which time the first dimension change information is generated based on the dimension change condition.
Step S13, the first data table carrying the first dimension change information is denormalized to the second data table associated with the first data table, the second dimension change information is generated based on the first dimension change information, and the second dimension change information is stored in the second data table.
In this step, the primary key row included in the first data table serves as the foreign key of any second data table associated therewith. Dimensions in the first data table can be added to the second data table by using the association relationship between the second data table and the first data table, and the process is denormalization. With de-normalization, the common dimension can be directly obtained in the second data table without re-associating the second data table with the first data table. Specifically, a first data table carrying first dimension change information is de-normalized to a second data table associated with the first data table. Further, each item of data in the first dimension change information is extracted, second dimension change information which can be displayed in a second data table is generated, and the second dimension change information is stored in the second data table.
As described above, according to the above embodiment, firstly, the current partition dataset and the incremental dataset corresponding to the first data table are acquired, and the current partition dataset and the incremental dataset are associated by the primary key, secondly, whether each dimension in the first data table changes is determined based on the primary key existing in the current partition dataset and the incremental dataset, and first dimension change information is generated when the change occurs, and finally, the first data table carrying the first dimension change information is denormalized to the second data table associated with the first data table, and second dimension change information is generated based on the first dimension change information, and the second dimension change information is stored in the second data table. According to the embodiment of the application, the dimension change information is recorded when the dimension is changed, and the dimension change information is denormalized to the second data table, so that a user can acquire the dimension change information through the second data table, and the reason of the data change is known.
Fig. 2 is a schematic diagram of a specific flow of a method for generating a data table according to an embodiment 200 of the present application. The detailed process of the specific flow is as follows:
s201, acquiring a current partition data set and an incremental data set corresponding to the first data table.
S202, the current partition data set and the increment data set are associated through a main key. The first data table in the embodiment of the application is a dimension table, and the second data table is a fact table.
Here, let the first data table current partition dataset be D1, the delta update dataset that has not yet been loaded today be D2, and associate D1 with D2 using the primary key.
S203, judging whether a main key exists in the current partition data set.
Here, it is determined whether each dimension in the first data table has changed based on the primary key existing in the current partition data set and the incremental data set. Specifically, whether data change occurs between the incremental dataset and the current partition dataset is determined by a primary key between the associated datasets.
S204, judging whether a primary key exists in the incremental dataset.
S205, comparing whether the values of the dimensions in the new and old data rows are equal.
Here, when a primary key exists in both the current partition data set and the incremental data set, it is sequentially compared whether the values of each dimension in the data line in the current partition data set and the data line in the incremental data set are equal. Specifically, when a primary key exists in both D1 and D2, it indicates that an old data line exists in D1 and a new data line exists in D2. At this time, whether the values of each dimension of the new and old data rows are equal or not is compared in sequence. Wherein the new value and the old value are still determined to be equal when they are simultaneously NULL.
S206, the first dimension change information is not generated.
Here, when a primary key exists in the current partition data set and a primary key does not exist in the incremental data set, it is determined that the dimension is not changed, and the first dimension change information is not generated any more.
S207, generating first dimension change information.
In this step, when the primary key exists only in the incremental dataset, the first dimension change information is generated based on the incremental dataset. Specifically, when only the primary key exists in the incremental dataset, it may be confirmed that the data row in the incremental dataset is newly added and there is no corresponding old data row. The first dimension change information comprises a dimension field name, a change date and a dimension current value of a changed data row in the first data table, wherein the dimension current value is a first character string which indicates that no old data value exists. The first string may be identified as 'NULL'. Further, the first dimension change information may be represented as a change_reflection field, each enclosed by "{ }" is referred to as a "dimension change information", and is represented in json format as { "column": name, "start_date": date, "old_value": value }, {... Wherein name is the name of the dimension field which changes; date is the date of change of the new value of the column, and is not the date of change of the whole line of data, and the format is yyyy-MM-dd; value is the old value of the dimension (i.e., the value in D1); column, start_date, and old_value are fixed literal descriptions.
Based on the foregoing case when a certain primary key exists only in D2, it is considered that the dimensions all change, and the change date is the current_date (i.e., change date) of the current period, and the old value is the first string 'NULL'. At this time, the first dimension change information change_reflection of the data line is:
{"column":name1,"start_date":current_date,"old_value":"NULL"},{"column":name2,"start_date":current_date,"old_value":"NULL"},…
if a primary key exists in both the current partition data set and the incremental data set, sequentially comparing whether the values of each dimension in the data row in the current partition data set and the data row in the incremental data set are equal, and generating first dimension conversion information when the values of each dimension are unequal, wherein the first dimension conversion information comprises a dimension field name, a change date and a dimension current value. Specifically, at this time, for all unequal changed fields, the date is updated to the current_date of the day, and the value is the old value. The unchanged field is unchanged, and no change is made, such as date remains unchanged. The first dimension change information change_meason should be written (here, it is assumed that the name1 field is changed and the name2 is unchanged):
{ "column": name1, "start_date": current_date, "old_value": value1}, { "column": name2, "start_date": last_date, "old_value": value2}, …, etc. Here, last_date of name2 is the change time of the last change of the dimension, and is not equal to current_date.
And S208, storing the first dimension change information into a first data table.
Here, the change_request field is updated for each primary key and saved in the current partition dataset of the first data table for the current day.
S209, de-normalizing the first data table carrying the first dimension change information to a second data table associated with the first data table.
Here, the second data table is associated with the first data table. And searching a primary key of the first data table according to the numerical value of the external key of the second data table to be updated, positioning the primary key to a data row in the first data table corresponding to the second data table, and storing the data of the dimension in the data row into a corresponding field of the second data table. Specifically, according to the date partition range of the second data table to be updated, the total data of the first data table under each date partition is obtained. And acquiring the main key in the total data of the first data table according to the value of the corresponding external key in the second data table, so as to position to one row of data row. And acquiring the current value of the dimension needing to be denormalized from the positioned data, and storing the current value into the corresponding field of the second data table.
S210, generating second dimension change information based on the first dimension change information.
In this step, the dimension field name, the change date and the current value of the dimension contained in the first dimension change information are added to the second dimension change information, and the second dimension change information further includes a dimension identifier, where the dimension identifier is used to represent the numerical value of the primary key of the first data table that is the same as the numerical value of the foreign key of the second data table.
Specifically, information of each dimension is extracted from a first dimension change information change_change field in the first data table, and a field dim_change is newly added in the second data table as second dimension change information. Wherein each piece of content surrounded by "{ }" is called a piece of dimension change information, and is expressed as follows in json format:
{"table_name":dim_table_name,"id":dim_id,"column":name,"start_date":date,"old_value":value},{...},...
the dim_table_name is a table name of the first data table, and is used for identifying a unique first data table from the plurality of first data tables. The dimension identifier dim_id is a value of a primary key in the dim_table_name that is equal to a foreign key in the second data table associated with the first data table. When the second data table has a role playing dimension, a plurality of foreign keys associated with the first data table exist, and each dimension change information only comprises a dim_id, and a plurality of dim_ids can be stored by a plurality of dimension change information with equal quantity. The entry must be filled if and only if the role playing dimension is contained in the second data table. The name is a dimension field name of the dimension in the dim_table_name, and is used for identifying the dimension and acquiring corresponding dimension change information from the change_request field. date is the change time of the new value of the name field and is obtained from the corresponding change unit in the change_request field. value is the current value of the dimension of the name field, i.e., the old value, available from the corresponding change element in the change_request field.
In addition, all the dimension change information keeps the content persistence, namely, as long as the dimension corresponding to the dimension change information is unchanged, the content of the dimension change information is unchanged; time persistence, i.e., dimension change information should continue to appear daily, regardless of whether the dimension is changed, unless its corresponding foreign key is not present in the second data table on the same day.
S211, saving the second dimension change information into a second data table.
Here, when the second data table is associated with each dimension table, corresponding second dimension change information is generated, and is combined into the dim_change field, and finally, the dim_change field is stored in a partition which needs to be updated on the same day of the second data table.
The application realizes the method for generating the second data table based on the steps.
Taking table 2 as an example, a site id (site_id), a site name (site_name), and a site location (location) are stored for the first data table dim_site_d of one site. Wherein the start_date field and end_date field are the effective time and dead time of the row level. When 1/2020, two sites with ids 1 and 2 appear for the first time in the first data table, as shown by row 1 and row 2. When 1 st month 4 of 2020, the site with id 1 changes location, the site with id 2 changes name, as shown by row 3 and row 4. When the data line is changed in the dimensions except the site_name and the location on the 1 th month of 2020, the start_date and the end_date of the data line are changed, but the units corresponding to the site_name and the location remain unchanged.
TABLE 1
Table 2 describes the distribution of two orders from the pick-up station to the dispatch station for two pieces of data in a second data table. The two foreign keys of the collecting station and the dispatching station are both associated with the first data table, so that the situation that the role playing dimension exists in the flow is considered.
TABLE 2
For single number 528, the time of occurrence is 3 months 1 day, but according to the principle of "keep-alive", although the first data table is not changed at 3 months 1 day, the second data table should also be generated with the second dimension change information, and the content should be based on the current latest data (i.e., the result generated at 1 month 1 day). The second dimension change information dim_change should be written as:
{"table_name":dim_site_d,"id":1,"column":"site_name","start_date":"2020-01-01","old_value":"NULL"},
{"table_name":dim_site_d,"id":1,"column":"location","start_date":"2020-01-01","old_value":"NULL"},
{"table_name":dim_site_d,"id":2,"column":"site_name","start_date":"2020-01-01","old_value":"NULL"},
{"table_name":dim_site_d,"id":2,"column":"location","start_date":"2020-01-01","old_value":"NULL"}。
for the single number 530, which occurs on 1 day of 6 months, since the site with id 1 has changed position and the site with id 2 has changed name on 1 day of 4 months, the second dimension change information dim_change should be written as:
{ "table_name": dim_site_d, "id":1, "column": "site_name", "start_date": "2020-01-01", "old_value": "Beijing A site" },
{ "table_name": dim_site_d, "id":1, "column": "location", "start_date": "2020-04-01", "old_value": "east urban area" },
{ "table_name": dim_site_d, "id":2, "column": "site_name", "start_date": "2020-04-01", "old_value": "Beijing B site" },
{ "table_name": dim_site_d, "id":2, "column": "location", "start_date": "2020-01-01", "old_value": "western district" }.
In this way, even though the processing logic of the second data table is not known, the dimensional change information can be easily obtained. In some special cases, even if the first data table name, the primary key, and other data are omitted for security purposes, the dimension change condition can be obtained only by the change time and the dimension current value, that is, the old value.
According to the embodiment of the application, the fields reflecting the dimension change information in the dimension table are denormalized to the second data table, the generation process of the dimension change information is integrated into the association process of the second data table and the first data table, and the association process is inherent and conventional work during the processing of the second data table, so that no additional association step is needed, the efficiency is improved, and the workload is simplified. Where the update time is located to the field level, rather than the normal row level, with accuracy and independence.
Further, the common dimensions denormalized to the second data table are only a portion of the full dimension columns of the corresponding dimension table, and when a dimension other than the common dimension changes, the second data table does not care about such changes, although this would cause a change in the effective time of the row. It is not reasonable to obtain the validation time of the row in the second data table, whereas the validation time of the field level is significantly more accurate. When more dimensions are required to be acquired from the dimension table in the second data table, the effective time of different dimensions is independent, so that original data is not required to be changed, and new dimension information is directly added in the change reason field.
In addition, since some second data tables may be very sparse, the frequency of occurrence of many dimensions is not high, and there is a high likelihood that the dimension will not appear in the second data table on the same day when the dimension changes. The content persistence and the time persistence of the dimension change information are continuously stored, so that the change information can be acquired no matter which piece of data in the second data table is acquired by a user, and the dimension change information only appears.
Based on the same inventive concept, the embodiment 300 of the present application further provides an apparatus for generating a data table, where, as shown in fig. 3, the apparatus includes:
the obtaining module 31 is configured to obtain a current partition data set and an incremental data set corresponding to the first data table, and associate the current partition data set and the incremental data set through a primary key;
a first generating module 32, configured to determine whether each dimension in the first data table changes based on the primary key existing in the current partition data set and the incremental data set, and generate first dimension change information when the dimensions change;
the second generating module 33 is configured to denormalize the first data table carrying the first dimension change information to a second data table associated with the first data table, generate second dimension change information based on the first dimension change information, and store the second dimension change information in the second data table.
In this embodiment, the specific functions and interaction manners of the acquiring module 31, the first generating module 32 and the second generating module 33 may be referred to the description of the corresponding embodiment of fig. 1, and are not repeated herein.
Optionally, the first generating module 32 includes:
a first generation unit configured to generate, when a primary key exists only in the incremental dataset, first dimension change information based on the incremental dataset, wherein the first dimension change information includes a dimension field name, a change date, and a dimension current value of a data row changed in the first data table, and the dimension current value is a first character string indicating that no old data value exists;
and the second generation unit is used for sequentially comparing whether the values of each dimension in the data row in the current partition data set and the data row in the increment data set are equal or not when the primary key exists in the current partition data set and the increment data set, and generating first dimension transformation information when the values of each dimension in the data row in the current partition data set and the data row in the increment data set are unequal, wherein the first dimension transformation information comprises a dimension field name, a change date and a dimension current value.
Optionally, the second generating module 33 includes:
the positioning unit is used for searching the main key of the first data table according to the numerical value of the external key of the second data table to be updated and positioning the main key to the data row in the first data table corresponding to the second data table;
and the storage unit is used for storing the data of the dimension in the data row into the corresponding field of the second data table.
Optionally, the second generating module 33 is further configured to:
and adding the dimension field name, the change date and the dimension current value contained in the first dimension change information into the second dimension change information, wherein the second dimension change information also comprises a dimension identifier, and the dimension identifier is used for representing the numerical value of the primary key of the first data table which is the same as the numerical value of the external key of the second data table.
As shown in fig. 4, a further embodiment 400 of the present application further provides a terminal device, comprising a processor 401, wherein the processor 401 is configured to perform the steps of the above-mentioned method for generating a data table. As can also be seen from fig. 4, the terminal device provided by the above embodiment further comprises a non-transitory computer readable storage medium 402, on which non-transitory computer readable storage medium 402 a computer program is stored which, when being executed by the processor 401, performs the steps of one of the above-described methods of generating a data table. In practice, the terminal device may be one or more computers, as long as the computer readable medium and the processor are included.
In particular, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, a FLASH, etc., and the computer program on the storage medium, when executed, can perform the steps of one of the methods for generating a data table described above. In practice, the computer readable medium may be contained in the apparatus/device/system described in the above embodiments or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs that, when executed, are capable of performing the steps of one of the methods of generating a data table described above.
According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: portable computer diskette, hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disc read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing, but are not intended to limit the scope of the application. In the disclosed embodiments, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the present application. In particular, the features recited in the various embodiments of the application and/or in the claims may be combined in various combinations and/or combinations without departing from the spirit and teachings of the application, all of which are within the scope of the disclosure.
Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may, within the scope of the disclosure of the present application, still make modifications to the technical solutions described in the foregoing embodiments or easily conceive of changes, or make equivalent substitutions of some of the technical features thereof; such changes, variations or substitutions, however, are not intended to depart from the spirit and scope of the embodiments of the application, and are intended to be included within the scope of the application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. A method of generating a data table, comprising:
acquiring a current partition data set and an incremental data set corresponding to a first data table, and associating the current partition data set and the incremental data set through a main key;
judging whether each dimension in the first data table changes based on the primary key existing in the current partition data set and the incremental data set, and generating first dimension change information when the dimensions change;
de-normalizing the first data table carrying the first dimension change information to a second data table associated with the first data table, generating second dimension change information based on the first dimension change information, and storing the second dimension change information into the second data table;
the step of judging whether each dimension in the first data table changes includes:
generating the first dimension change information based on the incremental dataset when the primary key exists only in the incremental dataset, wherein the first dimension change information comprises a dimension field name, a change date and a dimension current value of a changed data row in the first data table, and the dimension current value is a first character string which indicates that no old data value exists;
when the primary key exists in both the current partition data set and the incremental data set, sequentially comparing whether the value of each dimension in the data row in the current partition data set and the data row in the incremental data set is equal, and generating the first dimension change information when the values are not equal, wherein the first dimension change information comprises the dimension field name, the change date and the dimension current value.
2. The method of claim 1, wherein de-normalizing the first data table carrying the first dimension change information to a second data table associated with the first data table comprises:
searching the primary key of the first data table according to the value of the external key of the second data table to be updated, and positioning the primary key to the data row in the first data table corresponding to the second data table;
and storing the data of the dimension in the data row into a corresponding field of the second data table.
3. The method of claim 1, wherein the step of generating second dimension change information based on the first dimension change information comprises:
and adding the dimension field name, the change date and the dimension current value contained in the first dimension change information to the second dimension change information, wherein the second dimension change information further comprises a dimension identifier, and the dimension identifier is used for representing the numerical value of the main key of the first data table, which is the same as the numerical value of the external key of the second data table.
4. An apparatus for generating a data table, comprising:
the acquisition module is used for acquiring a current partition data set and an incremental data set corresponding to the first data table, and associating the current partition data set and the incremental data set through a main key;
the first generation module is used for judging whether each dimension in the first data table changes based on the main key existing in the current partition data set and the incremental data set, and generating first dimension change information when the dimensions change;
the second generation module is used for denormalizing the first data table carrying the first dimension change information to a second data table associated with the first data table, generating second dimension change information based on the first dimension change information, and storing the second dimension change information into the second data table;
wherein the first generation module comprises:
a first generation unit configured to generate, when the primary key exists only in the incremental dataset, the first dimension change information based on the incremental dataset, where the first dimension change information includes a dimension field name, a change date, and a dimension current value of a changed data row in the first data table, and the dimension current value is a first string indicating that no old data value exists;
and the second generation unit is used for sequentially comparing whether the value of each dimension in the data row in the current partition data set and the data row in the increment data set is equal or not when the primary key exists in the current partition data set and the increment data set, and generating the first dimension change information when the values of the dimensions in the data row in the current partition data set and the data row in the increment data set are unequal, wherein the first dimension change information comprises the dimension field name, the change date and the current dimension value.
5. The apparatus of claim 4, wherein the second generating module comprises:
the positioning unit is used for searching the main key of the first data table according to the value of the external key of the second data table to be updated and positioning the main key to the data row in the first data table corresponding to the second data table;
and the storage unit is used for storing the data of the dimension in the data row to the corresponding field of the second data table.
6. The apparatus of claim 4, wherein the second generation module is further configured to:
and adding the dimension field name, the change date and the dimension current value contained in the first dimension change information to the second dimension change information, wherein the second dimension change information further comprises a dimension identifier, and the dimension identifier is used for representing the numerical value of the main key of the first data table, which is the same as the numerical value of the external key of the second data table.
7. A non-transitory computer readable storage medium storing instructions which, when executed by a processor, cause the processor to perform the steps of a method of generating a data table according to any one of claims 1 to 3.
8. A terminal device comprising a processor for performing the steps of a method of generating a data table as claimed in any one of claims 1 to 3.
CN202011237403.XA 2020-11-09 2020-11-09 Method, device and storage medium for generating data table Active CN113760888B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011237403.XA CN113760888B (en) 2020-11-09 2020-11-09 Method, device and storage medium for generating data table

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011237403.XA CN113760888B (en) 2020-11-09 2020-11-09 Method, device and storage medium for generating data table

Publications (2)

Publication Number Publication Date
CN113760888A CN113760888A (en) 2021-12-07
CN113760888B true CN113760888B (en) 2023-09-01

Family

ID=78785998

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011237403.XA Active CN113760888B (en) 2020-11-09 2020-11-09 Method, device and storage medium for generating data table

Country Status (1)

Country Link
CN (1) CN113760888B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104484398A (en) * 2014-12-12 2015-04-01 北京国双科技有限公司 Method and device for aggregation of data in datasheet
CN108647339A (en) * 2018-05-14 2018-10-12 五八有限公司 A kind of dimensionality analysis method, apparatus, equipment and storage medium
CN111444256A (en) * 2019-01-16 2020-07-24 北京京东尚科信息技术有限公司 Method and device for realizing data visualization

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090055439A1 (en) * 2007-08-24 2009-02-26 Ketera Technologies, Inc. Flexible Dimension Approach In A Data Warehouse
US9208183B2 (en) * 2013-03-15 2015-12-08 Bmc Software Inc. Data access of slowly changing dimensions

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104484398A (en) * 2014-12-12 2015-04-01 北京国双科技有限公司 Method and device for aggregation of data in datasheet
CN108647339A (en) * 2018-05-14 2018-10-12 五八有限公司 A kind of dimensionality analysis method, apparatus, equipment and storage medium
CN111444256A (en) * 2019-01-16 2020-07-24 北京京东尚科信息技术有限公司 Method and device for realizing data visualization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于多维模型的独立型数据集市数据转换策略研究";张雅茜等;《沈阳航空工业学院学报》;第23卷(第3期);第31-34页 *

Also Published As

Publication number Publication date
CN113760888A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
KR102143889B1 (en) System for metadata management
US11741059B2 (en) System and method for extracting a star schema from tabular data for use in a multidimensional database environment
CN110472068B (en) Big data processing method, equipment and medium based on heterogeneous distributed knowledge graph
US9710536B2 (en) Method and system for validating data
US11119988B2 (en) Performing logical validation on loaded data in a database
CN112613789A (en) Risk control data processing method and risk early warning rule prepositive data monitoring method
CN110795524B (en) Main data mapping processing method and device, computer equipment and storage medium
CN112445875B (en) Data association and verification method and device, electronic equipment and storage medium
CN110955661A (en) Data fusion method and device, readable storage medium and electronic equipment
CN111708760B (en) Model migration deployment method and device, electronic equipment and storage medium
US9773003B2 (en) Computer implemented system and method for investigative data analytics
CN113760888B (en) Method, device and storage medium for generating data table
CN114356945A (en) Data processing method, data processing device, computer equipment and storage medium
CN113901046A (en) Virtual dimension table construction method and device
US8195604B2 (en) System and method for verifying IMS databases on a mainframe computer
CN116501375B (en) Data dictionary version management method, device, computer equipment and storage medium
JP2013171495A (en) Data management device, data management method and data management program
JP2018060477A (en) Estimation device and program
CN117649240A (en) Suspicious account identification method, suspicious account identification system, suspicious account identification device, suspicious account identification storage medium, suspicious account identification program product
CN115629958A (en) Universal field level automatic checking method and device for different service interfaces
CN114579179A (en) Version synchronization method and device, computer equipment and storage medium
CN117312283A (en) Database and table data verification method and device, computer equipment and storage medium
CN114493485A (en) Resource movement budgeting method and computer program product
CN116050371A (en) Report generation method, device, equipment and program product
CN117556473A (en) Data processing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant