CN116821253A - Label configuration method and system based on dimension modeling - Google Patents

Label configuration method and system based on dimension modeling Download PDF

Info

Publication number
CN116821253A
CN116821253A CN202310697911.3A CN202310697911A CN116821253A CN 116821253 A CN116821253 A CN 116821253A CN 202310697911 A CN202310697911 A CN 202310697911A CN 116821253 A CN116821253 A CN 116821253A
Authority
CN
China
Prior art keywords
entity
dimension
event
attribute
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310697911.3A
Other languages
Chinese (zh)
Inventor
徐梦宇
王乐珩
张金银
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Bizhi Technology Co ltd
Original Assignee
Hangzhou Bizhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Bizhi Technology Co ltd filed Critical Hangzhou Bizhi Technology Co ltd
Priority to CN202310697911.3A priority Critical patent/CN116821253A/en
Publication of CN116821253A publication Critical patent/CN116821253A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a label configuration method based on dimension modeling, which specifically comprises the following steps: extracting the fields of the dimension table, and generating an entity attribute list according to the fields of the dimension table; extracting the field of the fact table, and generating an event attribute list according to the field of the fact table; generating and storing an associated tag of the entity attribute and the event attribute according to the entity attribute list and the event attribute list; the tag name is configured for the associated tag of the entity attribute and the event attribute. The method carries out secondary management according to the dimension table, the fact table and the field generated by the original dimension modeling method to generate the association relation between the field in the new dimension table and the field in the fact table, configures the mapping relation rule of the field in the dimension table and the field in the fact table by referring to the association relation, and rapidly and automatically generates the corresponding SQL operation script.

Description

Label configuration method and system based on dimension modeling
Technical Field
The application belongs to the technical field of data storage, and particularly relates to a label configuration method and system based on dimension modeling.
Background
At present, enterprises can design a plurality of bin tables based on a plurality of mainstream modeling methods in the process of constructing own bins, wherein the dimension modeling method theory is widely applied. The method can divide the transaction table in the source library into two major categories, namely a fact table for measurement and a dimension table for describing environment, and then carries out subsequent ETL operation processing.
The business label is a data asset reprocessed based on a fact table or a dimension table already processed in a plurality of bins, is in a downstream link of data development from the development process, and requires a data development engineer or a business analyst to perform data development operation in a manner of manually writing SQL operation scripts.
However, several-bin data sheet developers and tag developers tend to be members of different teams. When the upstream table structure or field name changes, the existing downstream label development SQL job script content needs to be totally readjusted, otherwise, operation failure or accurate data cannot be found, and therefore the downstream label application process of the whole link is affected.
Therefore, a label configuration method and a label configuration system based on dimension modeling are needed to relieve management pressure of dimension tables, fact tables and fields in the upstream and downstream coordinated development process so as to further reduce development pressure of SQL job scripts.
Disclosure of Invention
In view of the foregoing drawbacks and deficiencies of the prior art, it is an object of the present application to at least address one or more of the problems of the prior art, in other words, to provide a method and system for dimension modeling based tag configuration that meets one or more of the aforementioned needs.
In order to achieve the aim of the application, the application adopts the following technical scheme:
in a first aspect, the present application provides a label configuration method based on dimension modeling, which specifically includes:
s1, extracting a field of a dimension table, and generating an entity attribute list according to the field of the dimension table, wherein the entity attribute list is composed of a plurality of entity attributes;
s2, extracting fields of a fact table, and generating an event attribute list according to the fields of the fact table, wherein the event attribute list is composed of a plurality of event attributes;
s3, generating and storing an association tag of the entity attribute and the event attribute according to the entity attribute list and the event attribute list;
s4, configuring a label name for the associated label of the entity attribute and the event attribute.
As a preferred embodiment, before step S1, the method further includes the steps of:
s0, selecting an object as a main object, and taking a dimension table and a fact table of the main object as the dimension table and the fact table extracted in the steps S1 and S2.
As a preferred embodiment, after step S4, the method further comprises the steps of:
s5, generating corresponding SQL job script contents according to the association labels of the entity attributes and the event attributes.
As a preferable scheme, step S3 specifically includes:
generating an association tag of the entity attribute and the event attribute according to the entity attribute list and the event attribute list;
the associated labels of the entity attributes and the event attributes are stored as an associated label broad table.
As a further preferable scheme, step S4 specifically includes:
configuring a tag name for an associated tag of the entity attribute and the event attribute;
the tag name is stored in a value form into a designated location in the associated tag width table.
As a preferred scheme, the generation of the associated tag utilizes interface options to select the generation rule.
As a preferable scheme, step S11 is further included after step S1, the comments of the fields in the dimension table are extracted, and names are generated for the entity attributes in the entity attribute list according to the comments of the fields in the dimension table;
step S21 is also included after step S2, the comments of the fields in the fact table are extracted, and names are generated for the event attributes in the event attribute list according to the comments of the fields in the fact table.
In a second aspect, the present application further provides a label configuration system based on dimension modeling, specifically including:
the extraction module is used for extracting the fields of the dimension table, generating an entity attribute list according to the fields of the dimension table, wherein the entity attribute list is composed of a plurality of entity attributes; the method comprises the steps of generating a field of a fact table, and generating an event attribute list according to the field of the fact table, wherein the event attribute list is composed of a plurality of event attributes;
the associated tag generation module is used for generating and storing associated tags of the entity attribute and the event attribute according to the entity attribute list and the event attribute list;
and the tag name configuration module is used for configuring tag names for the associated tags of the entity attribute and the event attribute.
As a preferred solution, the system further comprises:
and the SQL script generating module is used for generating corresponding SQL job script contents according to the association labels of the entity attribute and the event attribute.
Compared with the prior art, the application has the beneficial effects that:
according to the label configuration method and the label configuration system based on dimension modeling, secondary management is carried out according to the dimension table, the fact table and the fields generated by the original dimension modeling method, so that the association relation between the fields in the new dimension table and the fields in the fact table is generated, the mapping relation rule of the fields in the dimension table and the fact table is configured by referring to the association relation, and the corresponding SQL operation script is automatically generated rapidly.
Drawings
FIG. 1 is a flow chart of a dimension modeling based tag configuration method of an embodiment of the present application;
FIG. 2 is a detailed flow chart of a dimension modeling based tag configuration method of an embodiment of the present application;
fig. 3 is a schematic diagram of an association mapping model according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made more apparent and fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the application are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In the following description, various embodiments of the application are provided, and various embodiments may be substituted or combined, so that the application is intended to include all possible combinations of the same and/or different embodiments described. Thus, if one embodiment includes feature A, B, C and another embodiment includes feature B, D, then the present application should also be considered to include embodiments that include one or more of all other possible combinations including A, B, C, D, although such an embodiment may not be explicitly recited in the following. The following description provides examples only, and is not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements described without departing from the scope of the application. Various examples may omit, replace, or add various procedures or components as appropriate, e.g., the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined; furthermore, features described with respect to some examples may also be combined into other examples.
In a first aspect, the present application provides a label configuration method based on dimension modeling, where a flowchart of the label configuration method is shown in fig. 1, and specifically includes:
s1, extracting a field of a dimension table, and generating an entity attribute list according to the field of the dimension table, wherein the entity attribute list is composed of a plurality of entity attributes;
s2, extracting fields of a fact table, and generating an event attribute list according to the fields of the fact table, wherein the event attribute list is composed of a plurality of event attributes;
s3, generating and storing an association tag of the entity attribute and the event attribute according to the entity attribute list and the event attribute list;
s4, configuring a label name for the associated label of the entity attribute and the event attribute.
An embodiment of the present application provides a specific implementation of the above method, and a detailed flowchart of the specific implementation is shown in fig. 2, and specifically includes the following method:
s1, extracting fields of the dimension table, and generating an entity attribute list according to the fields of the dimension table. Specifically, step S1, by reading data table information in a plurality of bins of data sources, selecting a dimension table from the data table information as a source table of an entity; identifying the names of all fields in the table, and storing the names as names of entity attributes; meanwhile, according to different field types, each entity attribute can be identified as different entity attribute types, such as a main key, a related dimension, a fact attribute and a partition key type;
s2, extracting the field of the fact table, and generating an event attribute list according to the field of the fact table. Specifically, step S2 is to select a fact table from the data table information by reading the data table information in the data sources of the plurality of bins as a source table of the event; identifying the names of all fields in the table, and storing the names as the names of event attributes; meanwhile, according to different field types, each event attribute can be identified as different attribute types, such as a main key, a related dimension, a fact attribute, a measurement and a partition key type;
and S3, generating and storing the association labels of the entity attribute and the event attribute according to the entity attribute list and the event attribute list. Specifically, step S3 constructs a data model in a canvas manner according to the registered entity attribute list, the entity attributes in the event attribute list and the event attributes, configures the association relations of the source tables corresponding to different entity attributes or event attributes, and constructs a new table recording the association relations by using the association relations of the fields. The attribute information of each object in the new table inherits from the entity attribute and the event attribute, and takes the mode of 'object name + source field name' as the field name of each object in the new table;
s4, configuring a label name for the associated label of the entity attribute and the event attribute. Specifically, step S4 is to set new names for each object in the new table, so that the subsequent developer can use, search and call the new names.
The schematic diagram of the association relation mapping model generated by the method is shown in fig. 3.
In one embodiment of the present application, the implementation of step S3 may be further specifically divided into the following steps:
generating an association tag of the entity attribute and the event attribute according to the entity attribute list and the event attribute list;
the associated labels of the entity attributes and the event attributes are stored as an associated label broad table.
The method comprises the steps of generating association labels of entity attributes and event attributes according to an entity attribute list and an event attribute list, specifically determining the association labels to be generated according to a source table corresponding to the entity and the event in a model and an association relation among fields in the table, and splicing and generating object attribute names in the form of 'object name_obj_attribute names' to serve as the association labels for enhancing the readability of object attribute fields and reducing the resolution difficulty.
And then, generating a wide table by using the determined association tags to be generated, wherein the wide table contains the combination forms of all the association tags and is stored in the wide table form so as to improve the retrieval performance and facilitate the subsequent searching and retrieval.
One specific code example of step S3 is provided below:
INSERT OVERWRITE TABLE object_model_mbr_mid PARTITION(ds='${bdp.system.bizdate}')
SELECT
dim_mbr.oneid AS mbr_obj_oneid,
dim_mbr.mbr_name AS mbr_obj_name,
dim_mbr.mbr_phone AS mbr_obj_phone,
dim_mbr.mbr_gender AS mbr_obj_gender,
dim_mbr.mbr_birthday AS mbr_obj_birthday,
dim_mbr.mbr_idcard AS mbr_obj_idcard,
dws_ord.ord_familyincome AS mbr_obj_familyincome,
dws_ord.csmr_residencelevel AS mbr_obj_residencelevel,
dws_ord.csmr_edulevel AS mbr_obj_edulevel
FROM
dim_mbr INNER JOIN dws_ord
ON dim_mbr.oneid=dws_ord.oneid
WHERE dim_mbr.ds='${bdp.system.bizdate}'。
the method generates and stores the associated tag by splicing the object name and the attribute name and generating a wide table.
In a further embodiment of the present application, in order to improve the usability of the associated tag and facilitate the searching of the developer, the step S4 specifically includes the following steps:
configuring a tag name for an associated tag of the entity attribute and the event attribute;
the tag name is stored in a value form into a designated location in the associated tag width table.
The method provides a label name for searching for the associated label, stores the label name in the wide table for searching, and can search more visual and better memorized label names to find the associated label when the associated label is used later, so that the mapping relation rule of the fields in the dimension table and the fact table is obtained and configured according to the associated label.
In another embodiment of the present application, step S11 is further included after step S1, where comments of fields in the dimension table are extracted, and names are generated for entity attributes in the entity attribute list according to the comments of the fields in the dimension table;
step S21 is also included after step S2, the comments of the fields in the fact table are extracted, and names are generated for the event attributes in the event attribute list according to the comments of the fields in the fact table.
According to the method, the annotation information of the original field is identified and used as the initial value of the entity attribute name, so that the generation simplicity of the associated label in the following step S3 is improved, the field name and the attribute are subjected to secondary editing annotation, the field name and the attribute can be marked as the attribute name with certain business semantics, and the readability of the field is improved when the associated label is generated. Furthermore, in the subsequent step S3, the field may be selected by using an interface option manner to manually adjust the construction of the associated tag, and the name generated by the annotation information can greatly improve the convenience of the interface option manner in use.
An embodiment of the present application provides an improvement, in this embodiment, before step S1, the method further includes the steps of:
s0, selecting an object as a main object, and taking a dimension table and a fact table of the main object as the dimension table and the fact table extracted in the steps S1 and S2.
The improvement promises an object to be used as a main object referenced in the process of generating the association tag, and the regular automatic update of the association relationship can be realized by extracting data from the entity and event source table referenced by the main object according to the day timing.
After the association relation is stored, the association relation needs to be applied to the actual generation of the SQL job script content, and in another embodiment of the application, after the step S4, the method further comprises the steps of:
s5, generating corresponding SQL job script contents according to the association labels of the entity attributes and the event attributes.
Specifically, in step S5, the mapping relationships and logic operation logic of different tag values may be called according to the association relationships stored in steps S3 and S4, so as to generate different SQL statements, which may specifically be case white statements. The field names in the SQL sentences are field names in the association relation broad table, and the source table names of the SQL sentences are table names of the association relation broad table.
For example, when distinguishing users with ages less than 20 years old, between 20-39 years old, and greater than or equal to 40 years old, the SQL statement may be generated using the following code:
the method carries out secondary management according to the dimension table, the fact table and the field generated by the original dimension modeling method to generate the association relation between the field in the new dimension table and the field in the fact table, configures the mapping relation rule of the field in the dimension table and the field in the fact table by referring to the association relation, and rapidly and automatically generates the corresponding SQL operation script.
Meanwhile, in the generated SQL job script code, the field name and the table name of the association relation wide table generated in the secondary management of the method are referenced, the table name and the field name cannot change along with the change of the field name and the table name of the entity and the event source table, the metadata decoupling of the tag development code and the original table is realized, the workload that all downstream jobs need to be changed and re-submitted for checking due to the change or increase and decrease of the field name of one table can be greatly reduced, and the development and maintenance cost is reduced.
The application also provides a label configuration system based on dimension modeling, which specifically comprises:
the extraction module is used for extracting the fields of the dimension table, generating an entity attribute list according to the fields of the dimension table, wherein the entity attribute list is composed of a plurality of entity attributes; the method comprises the steps of generating a field of a fact table, and generating an event attribute list according to the field of the fact table, wherein the event attribute list is composed of a plurality of event attributes;
the associated tag generation module is used for generating and storing associated tags of the entity attribute and the event attribute according to the entity attribute list and the event attribute list;
and the tag name configuration module is used for configuring tag names for the associated tags of the entity attribute and the event attribute.
In another embodiment of the present application, the system further comprises:
and the SQL script generating module is used for generating corresponding SQL job script contents according to the association labels of the entity attribute and the event attribute.
In another embodiment of the present application, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the dimension-generation-based tag configuration method in the above embodiments. The computer readable storage medium may include, among other things, any type of disk including floppy disks, optical disks, DVDs, CD-ROMs, micro-drives, and magneto-optical disks, ROM, RAM, EPROM, EEPROM, DRAM, VRAM, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.

Claims (10)

1. The label configuration method based on dimension modeling is characterized by comprising the following steps:
s1, extracting a field of a dimension table, and generating an entity attribute list according to the field of the dimension table, wherein the entity attribute list is composed of a plurality of entity attributes;
s2, extracting fields of a fact table, and generating an event attribute list according to the fields of the fact table, wherein the event attribute list is composed of a plurality of event attributes;
s3, generating and storing an associated tag of the entity attribute and the event attribute according to the entity attribute list and the event attribute list;
s4, configuring a label name for the associated label of the entity attribute and the event attribute.
2. The method for configuring a label based on dimension modeling according to claim 1, further comprising, before said step S1:
s0, selecting an object as a main object, wherein the dimension table and the fact table of the main object are used as the dimension table and the fact table extracted in the steps S1 and S2.
3. The method for configuring a label based on dimension modeling according to claim 1, further comprising, after the step S4, the steps of:
s5, generating corresponding SQL job script content according to the entity attribute and the associated label of the event attribute.
4. The label configuration method based on dimension modeling as claimed in claim 1, wherein the step S3 specifically includes:
generating association labels of the entity attribute and the event attribute according to the entity attribute list and the event attribute list;
and storing the association tag of the entity attribute and the event attribute as an association tag wide table.
5. The method for configuring labels based on dimension modeling of claim 4, wherein the step S4 is specifically:
configuring a tag name for the associated tag of the entity attribute and the event attribute;
and storing the tag name into a designated position in the associated tag width table in a value form.
6. The method for configuring labels based on dimension modeling according to claim 1, wherein the generation of the associated labels uses interface option selection generation rules.
7. The label configuration method based on dimension modeling according to claim 1, wherein step S1 is followed by step S11 of extracting comments of fields in a dimension table, and generating names for entity attributes in the entity attribute list according to the comments of the fields in the dimension table;
the step S2 is followed by a step S21 of extracting comments of fields in the fact table, and generating names for the event attributes in the event attribute list according to the comments of the fields in the fact table.
8. A label configuration system based on dimension modeling, comprising:
the extraction module is used for extracting the fields of the dimension table, generating an entity attribute list according to the fields of the dimension table, and the entity attribute list is composed of a plurality of entity attributes; the method comprises the steps of generating an event attribute list according to a field of a fact table, wherein the event attribute list consists of a plurality of event attributes;
the associated tag generation module is used for generating and storing associated tags of the entity attribute and the event attribute according to the entity attribute list and the event attribute list;
and the tag name configuration module is used for configuring tag names for the associated tags of the entity attribute and the event attribute.
9. The dimension modeling based tag configuration method of claim 8, wherein the system further comprises:
and the SQL script generating module is used for generating corresponding SQL job script contents according to the entity attribute and the associated label of the event attribute.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the method according to any of claims 1 to 7.
CN202310697911.3A 2023-06-13 2023-06-13 Label configuration method and system based on dimension modeling Pending CN116821253A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310697911.3A CN116821253A (en) 2023-06-13 2023-06-13 Label configuration method and system based on dimension modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310697911.3A CN116821253A (en) 2023-06-13 2023-06-13 Label configuration method and system based on dimension modeling

Publications (1)

Publication Number Publication Date
CN116821253A true CN116821253A (en) 2023-09-29

Family

ID=88123403

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310697911.3A Pending CN116821253A (en) 2023-06-13 2023-06-13 Label configuration method and system based on dimension modeling

Country Status (1)

Country Link
CN (1) CN116821253A (en)

Similar Documents

Publication Publication Date Title
US10169337B2 (en) Converting data into natural language form
US7562088B2 (en) Structure extraction from unstructured documents
US7617444B2 (en) File formats, methods, and computer program products for representing workbooks
US20160085742A1 (en) Automated collective term and phrase index
US20080162455A1 (en) Determination of document similarity
US7783971B2 (en) Graphic object themes
CN108762743B (en) Data table operation code generation method and device
US10169393B2 (en) Tracking changes among similar documents
CN111061742B (en) Method and device for marking data and service system thereof
US11567995B2 (en) Branch threading in graph databases
US7587416B2 (en) Advanced desktop reporting
US20180357328A1 (en) Functional equivalence of tuples and edges in graph databases
Pamungkas et al. B-BabelNet: business-specific lexical database for improving semantic analysis of business process models
US9697239B1 (en) Token-based database system and method of interfacing with the token-based database system
US10942892B2 (en) Transport handling of foreign key checks
CN116821253A (en) Label configuration method and system based on dimension modeling
US20230123555A1 (en) Systems and methods for translation comments flowback
US11789903B1 (en) Tagging tool for managing data
US11379432B2 (en) File management using a temporal database architecture
CN113434734A (en) Method, device, equipment and storage medium for generating file and reading file
US20240220876A1 (en) Artificial intelligence (ai) based data product provisioning
US20240193127A1 (en) Relevant content document comparison
US20230033364A1 (en) Core data services performance annotations
US20110191089A1 (en) Method and apparatus for monitoring demands in a number of models of a system
CN114328965A (en) Knowledge graph updating method and device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination