CN114064612A - Data processing method and device in data warehouse - Google Patents

Data processing method and device in data warehouse Download PDF

Info

Publication number
CN114064612A
CN114064612A CN202111386391.1A CN202111386391A CN114064612A CN 114064612 A CN114064612 A CN 114064612A CN 202111386391 A CN202111386391 A CN 202111386391A CN 114064612 A CN114064612 A CN 114064612A
Authority
CN
China
Prior art keywords
service
service code
partition
field
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111386391.1A
Other languages
Chinese (zh)
Inventor
崔丞翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202111386391.1A priority Critical patent/CN114064612A/en
Publication of CN114064612A publication Critical patent/CN114064612A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method and a device for processing data in a data warehouse, wherein the method comprises the following steps: adding the name and the type of the newly added field in the corresponding partition table according to the information of the newly added field of the first service in the field newly added request; acquiring a first service code corresponding to an original field of a first service in a first partition under a partition table, generating a second service code according to the first service code and field information recorded in the partition table, and writing the second service code into the first partition, wherein the second service code comprises the original field of the first service and an empty character string for representing a newly added field; receiving a third service code corresponding to a newly added field uploaded by a user, generating a fourth service code according to the third service code and field information recorded in a partition table, creating a second partition under the partition table and writing the fourth service code in the partition table, wherein the fourth service code comprises the newly added field of the first service and an empty character string used for representing an original field; and summarizing the second service code and the fourth service code to obtain a complete service code of the first service.

Description

Data processing method and device in data warehouse
Technical Field
The application belongs to the technical field of big data processing, and particularly relates to a data processing method and device in a data warehouse.
Background
A data warehouse is a strategic set that provides all types of data support for all levels of decision-making processes of an enterprise. It is a single data store created for analytical reporting and decision support purposes that can provide services to the enterprise that guide business process improvement, monitor time, cost, quality, and control. According to the data inflow and outflow processes, the data warehouse architecture can be roughly divided into three layers: an ODS (Operation Data Store) layer, a DW (Data Store) layer, and a DA (Data Application) layer.
At present, in the offline processing technology of large data warehouse, the design of the DW layer is positioned in detail summary, when a new service is added, in the prior art, the service logic is continuously added in a code and is added to the original service code of the DW layer.
However, as the business increment is larger and larger, the number of fields is more and more, the actual machining code is gradually accumulated to thousands to tens of thousands of rows, and is continuously increased, the fields to be made are more and more compatible, and meanwhile, the problem of data insertion sequence is also considered when the fields are added, the caliber is more and more difficult to trace, and the maintenance cost of the data warehouse is higher.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method and an apparatus for processing data in a data warehouse, which can solve the problem in the prior art that the maintenance cost of the data warehouse is high.
In a first aspect, an embodiment of the present application provides a method for processing data in a data warehouse, where the method includes:
under the condition that a service field newly-increased request aiming at a first service in a data warehouse is received, increasing the name and the type of the newly-increased field in a partition table of the data warehouse according to the information of the newly-increased field carried in the field newly-increased request;
acquiring a first service code in a first partition under the partition table, generating a second service code according to the first service code and field information of the first service recorded in the partition table, and writing the second service code into the first partition, wherein the first service code is a service code corresponding to an original field of the first service, and the second service code comprises the original field of the first service and a null character string used for representing a newly added field;
receiving a third service code corresponding to a newly added field of the first service uploaded by a user, generating a fourth service code according to the third service code and field information of the first service recorded in the partition table, creating a second partition under the partition table, and writing the fourth service code into the second partition, wherein the fourth service code comprises the newly added field of the first service and a null character string for representing an original field;
and summarizing the second service code in the first partition and the fourth service code in the second partition to obtain a complete service code of the first service after the field is added.
In a second aspect, an embodiment of the present application provides an apparatus for processing data in a data warehouse, where the apparatus includes:
the modification module is used for increasing the name and the type of a newly added field in a partition table of a data warehouse according to the information of the newly added field carried in a newly added field request under the condition of receiving the newly added service field request aiming at a first service in the data warehouse;
the first processing module is used for acquiring a first service code in a first partition under the partition table, generating a second service code according to the first service code and field information of the first service recorded in the partition table, and writing the second service code into the first partition, wherein the first service code is a service code corresponding to an original field of the first service, and the second service code comprises the original field of the first service and a null character string used for representing a newly added field;
the second processing module is used for receiving a third service code corresponding to the newly added field of the first service uploaded by a user, generating a fourth service code according to the third service code and field information of the first service recorded in the partition table, creating a second partition under the partition table, and writing the fourth service code into the second partition, wherein the fourth service code comprises the newly added field of the first service and a null character string used for representing an original field;
and the summarizing module is used for summarizing the second service code in the first partition and the fourth service code in the second partition to obtain the complete service code of the first service after the field is added.
In a third aspect, embodiments of the present application provide an electronic device, which includes a processor and a memory, where the memory stores a program or instructions executable on the processor, and the program or instructions, when executed by the processor, implement the steps of the method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.
In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the steps of the method according to the first aspect.
In a sixth aspect, embodiments of the present application provide a computer program product, stored on a storage medium, for execution by at least one processor to implement the method according to the first aspect.
In the embodiment of the application, under the condition that a service field newly-increased request aiming at a first service in a data warehouse is received, a table structure of a corresponding partition table is changed; on the basis of the changed partition table, for a first service, automatically rewriting a first service code of an original field into a second service code containing the original field and an empty character string for representing a newly added field, automatically rewriting a third service code of the newly added field written by a user into a fourth service code containing the newly added field and the empty character string for representing the original field, storing the fourth service code and the second service code in different partitions, and summarizing the second service code and the fourth service code in different partitions to obtain a complete service code of the first service after the newly added field.
Compared with the prior art, in the embodiment of the application, when the service field needs to be added, the user only needs to compile and submit the corresponding service code for the newly added field without considering field compatibility, insertion sequence and tracing aperture, the data warehouse can automatically generate all the service codes of the first service based on the service code compiled by the user, and the maintenance cost of the data warehouse is reduced.
Drawings
FIG. 1 is a diagram illustrating an example of a process for adding a traffic field in the prior art;
fig. 2 is a flowchart of a method for processing data in a data warehouse according to an embodiment of the present application;
fig. 3 is a diagram of a first example of a process for adding a service field according to an embodiment of the present application;
fig. 4 is a diagram illustrating a second example of a process for adding a service field according to an embodiment of the present application;
fig. 5 is a block diagram of a data processing apparatus in a data warehouse according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 7 is a hardware structure diagram of an electronic device implementing various embodiments of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
At present, in the offline processing technology of large data warehouse, the design of DW layer is positioned in detail summary, and the existing scheme is that the business logic is continuously added in a code for business addition and is added to the existing maintenance code.
Taking fig. 1 as an example, the data warehouse includes an ODS layer, a DW layer, and a DA layer, where a partition table of the DW layer includes N partitions, and each partition stores service codes of a service, for example, the service code of the service 1 is stored in the partition 1, the service code of the service 2 is stored in the partition 2, the service code of the service 3 is stored in the partition 3, and … …, and the service code of the service N is stored in the partition N.
When a service 1 needs to newly add M service fields, in the prior art, a user needs to insert service codes of the M newly added fields into a service code of a partition 1, and in addition, null characters with the same length need to be inserted into partitions 2 to N to ensure that the number of the characters inserted into the N partitions is the same, where N and M are both positive integers.
However, as the business increment is larger and larger, the number of fields is more and more, the actual machining code is gradually accumulated to thousands to tens of thousands of rows, new fields are continuously added, the fields to be made are more and more compatible, meanwhile, the problem of data insertion sequence is also considered when the fields are added, the caliber is more and more difficult to trace, and the cost is increased geometrically for maintenance personnel. Therefore, the existing scheme has the defects of service compatibility and maintainability.
The embodiment of the application provides a method and a device for processing data in a data warehouse, so as to solve the technical problems of high additional business logic and high maintenance cost.
The following describes in detail a data processing method in a data warehouse provided in the embodiment of the present application through a specific embodiment and an application scenario thereof with reference to the accompanying drawings.
Fig. 2 is a flowchart of a method for processing data in a data warehouse according to an embodiment of the present application, and as shown in fig. 2, the method may include the following steps: step 201, step 202, step 203 and step 204, wherein,
in step 201, when a service field addition request for a first service in a data warehouse is received, a name and a type of a newly added field are added in a partition table of the data warehouse according to information of the newly added field carried in the field addition request.
In the embodiment of the application, the relationship between the partition table and the partitions, similar to the directory and the subdirectory, may include a plurality of subdirectories under one target, and similarly, may include a plurality of partitions under the partition table.
In the prior art, a data warehouse includes: the system comprises an ODS layer, a DW layer and a DA layer, wherein the DW layer comprises a partition table, the partition table comprises a plurality of partitions, and each partition stores a service code of a service. When a new service field needs to be added, the DW layer is mainly processed.
In the embodiment of the present application, two scenarios may be included: in a first scenario, the data warehouse comprises: the system comprises an ODS layer, a DW layer and a DA layer, wherein the DW layer comprises a partition table, the partition table comprises a plurality of partitions, when a service field needs to be newly added, the DW layer is processed, and steps 201 to 204 are executed.
In a second scenario, the data warehouse includes: the system comprises an ODS layer, an intermediate layer, a DW layer and a DA layer, wherein the intermediate layer is a newly-added structural layer, the content contained in the intermediate layer is the same as that in the DW layer, the intermediate layer also comprises a partition table, the partition table comprises a plurality of partitions, when a service field needs to be newly added, the intermediate layer is processed firstly, steps 201 to 203 are executed, then the DW layer is processed, and step 204 is executed.
In the prior art, when a new field is required for a service in a data warehouse, a table structure of a partition table of a DW layer needs to be changed, specifically, a name and a type of the new field are added to the partition table. Similarly, in the embodiment of the present application, when a new field is needed for a service in the data warehouse, the table structure of the partition table also needs to be changed.
For the first scenario, when a new field is needed for a service in the data warehouse, the table structure of the partition table of the DW layer needs to be changed. For the second scenario, when a new field is needed for a service in the data warehouse, the table structure of the partition table in the middle layer needs to be changed.
In step 202, a first service code in a first partition under a partition table is obtained, a second service code is generated according to the first service code and field information of the first service recorded in the partition table, and the second service code is written into the first partition, wherein the first service code is a service code corresponding to an original field of the first service, and the second service code includes the original field of the first service and a null character string used for representing a newly added field.
In this embodiment of the present application, the first partition is a partition where a service code of the first service is located, and the number of the first partitions may be 1 or multiple.
In the embodiment of the application, the first service code includes an original field of the first service, and the second service code includes an original field of the first service and an empty string for representing the newly added field.
In the embodiment of the present application, in order to reduce data redundancy, after the second service code is generated, the first service code in the first partition may be deleted, and then the second service code may be written into the first partition.
In the embodiment of the application, a field automatic filling tool can be adopted to convert the first service code into the second service code and write the second service code into the first partition. Specifically, the first service code is uploaded to an automatic field filling tool through a specific parameter transmission mode, and a second service code can be generated through processing of the automatic field filling tool, wherein the specific parameter transmission mode can be implemented by using a parameter transmission mode in the prior art, which is not limited herein.
In the embodiment of the present application, for an input code, the field automatic filling tool may implement the following functions: fields that are not query-inserted are automatically filled, and query-insertion can be written out of order, and dynamic partitioning can be supported.
In this embodiment, a specific processing procedure of the field automatic filling tool may include the following steps: analyzing the field name and the type of the first service in the partition table, and generating a sixth service code based on the analyzed field name and type, wherein data except the digital type and the structural type in the sixth service code is an empty string; and analyzing the information of the original field of the first service in the first service code, and inserting the information of the original field into the sixth service code to obtain a second service code.
In step 203, a third service code corresponding to the newly added field of the first service uploaded by the user is received, a fourth service code is generated according to the third service code and the field information of the first service recorded in the partition table, a second partition is created under the partition table, and the fourth service code is written into the second partition, wherein the fourth service code includes the newly added field of the first service and a null character string used for representing the original field.
In this embodiment, the third service code is an SQL code written by the user based on the newly added field of the service.
In the embodiment of the application, the service code of the newly added field and the service code of the original field are stored in different partitions, so that when the field needs to be newly added, a user only needs to pay attention to the newly added field, the SQL code which can be identified by the machine is compiled based on the newly added field and submitted, and the machine can automatically realize the newly adding of the service code according to the code submitted by the user without paying attention to the original field or considering field compatibility by the user.
In the embodiment of the application, the third service code includes a newly added field of the first service, and the fourth service code includes a newly added field of the first service and a null character string for representing an original field.
In the prior art, when a new field is needed for a service in a data warehouse, a service code of the new field is needed to be inserted into a service code of an original field in a DW layer, and when the number of fields is large, the insertion process is time-consuming because the insertion sequence needs to be considered.
In order to solve the above problem, in this embodiment of the application, when writing the second service code, the user may write according to a certain rule, specifically, the third service code is an SQL code written according to a preset code writing rule, where the preset code writing rule is: each newly added field is assigned to a corresponding field in the partition table by as. That is, the insert principle of hive is changed from field order insert to field specific insert, so that the data processing logic can be written without any ordering, and at the same time, the data can be written in a dynamic partitioning mode.
Accordingly, in one embodiment provided herein, the step 203 may include the following steps:
analyzing the names and types of all fields of the first service in the partition table, and generating a fifth service code based on the analyzed names and types of the fields, wherein data except the digital type and the structural type in the fifth service code are empty character strings; analyzing the content from the first select to the first from in the third service code to obtain the information of the newly added field; and inserting the information of the newly added field into the fifth service code according to the information indicated by the as in the third service code to obtain a fourth service code.
In this embodiment of the present application, a field automatic filling tool may also be used to convert the third service code into the fourth service code, and write the fourth service code into the second partition. Specifically, the third service code is uploaded to the field automatic filling tool, and the fourth service code can be generated after being processed by the field automatic filling tool.
In the embodiment of the present application, for the first scenario, both the first partition and the second partition are in the DW layer. For the second scenario, both the first partition and the second partition are in the middle layer.
In step 204, the second service code in the first partition and the fourth service code in the second partition are summarized to obtain a complete service code of the first service after the field is added.
In the embodiment of the application, during the summary, a GROUP BY operation may be performed on the second service code and the fourth service code once to obtain a complete service code of the first service after the field is added, so as to replace the multi-result-layer join to reduce the shuffle. Specifically, the step 204 may include the following steps: and deleting the empty character strings in the second service code and the fourth service code through a GROUP BY function, and combining the remaining contents in the second service code and the fourth service code to obtain a complete service code of the first service after the field is newly added.
For the first case mentioned in step 201, in an embodiment provided by the present application, the step 204 may specifically include the following steps:
creating a third partition under the partition table;
and in the third partition, summarizing the second service code in the first partition and the fourth service code in the second partition to obtain a complete service code of the first service after the field is added.
In one example, as shown in fig. 3, the data warehouse includes an ODS layer, a DW layer (which may also be referred to as an intermediate layer, i.e., the intermediate layer is the DW layer), and a DA layer, where a partition table of the DW layer includes N partitions, and each partition stores service codes of services, for example, service code of service 1 is stored in partition 1, service code of service 2 is stored in partition 2, service code of service 3 is stored in partition 3, and service code of service N is stored in partition N, … ….
When the service 1 needs to add M service fields, the user needs to compile corresponding SQL codes, that is, third service codes, based on the M service fields, and then the user uploads the compiled SQL codes to the field automatic filling tool mentioned in step 203, the field automatic filling tool processes the SQL codes into fourth service codes, creates a partition N +1, that is, a second partition, and writes the fourth service codes into the partition N + 1;
similarly, since the service code of the service 1 is stored in the partition 1, that is, the first partition, reads the service code, that is, the first service code, in the partition 1, and then uploads the first service code to the field automatic filling tool mentioned in step 202, and the field automatic filling tool processes the first service code into the second service code, deletes the first service code in the partition 1, and then writes the second service code into the partition 1;
and finally, creating a TYPE partition, namely a third partition, and summarizing a second service code and a fourth service code in the TYPE partition to obtain a complete service code of the first service after the field is newly added, wherein N and M are positive integers.
Therefore, in the embodiment of the present application, a new partition is created under the partition table to store the service code of the newly added field, and for the partition (for example, partition 2 to partition N in fig. 3) where the service of the non-newly added field is located, character completion is not required, so that the storage space can be saved.
For the second case mentioned in step 201, the data warehouse comprises, according to the process of data inflow and outflow: in another embodiment provided by the present application, the step 204 may specifically include the following steps:
sending a second service code in the first partition and a fourth service code in the second partition of the middle layer to the data warehouse layer;
and in the data warehouse layer, summarizing the second service code in the first partition and the fourth service code in the second partition to obtain a complete service code of the first service after the field is added.
In one example, as shown in fig. 4, the data warehouse includes an ODS layer, an intermediate layer, a DW layer (the intermediate layer and the DW layer are two layers), and a DA layer, where a partition table of the intermediate layer includes N partitions, and each partition stores service codes of services, for example, service code of service 1 is stored in partition 1, service code of service 2 is stored in partition 2, service code of service 3 is stored in partition 3, and service code of service N is stored in partition N, … ….
When the service 1 needs to add M service fields, the user needs to compile corresponding SQL codes, that is, third service codes, based on the M service fields, and then the user uploads the compiled SQL codes to the field automatic filling tool mentioned in step 203, the field automatic filling tool processes the SQL codes into fourth service codes, creates a partition N +1, that is, a second partition, and writes the fourth service codes into the partition N + 1;
similarly, since the service code of the service 1 is stored in the partition 1, that is, the first partition, reads the service code, that is, the first service code, in the partition 1, and then uploads the first service code to the field automatic filling tool mentioned in step 202, and the field automatic filling tool processes the first service code into the second service code, deletes the first service code in the partition 1, and then writes the second service code into the partition 1;
and finally, summarizing the second service code and the fourth service code in the DW layer to obtain a complete service code of the first service after a field is newly added, wherein N and M are positive integers.
Therefore, in the embodiment of the present application, a new partition is created under the partition table of the intermediate layer to store the service code of the newly added field, and for the partition where the service of the non-newly added field is located (for example, partition 2 to partition N in fig. 4), character completion is not required, so that the storage space can be saved.
For example, 20 fields are newly added to the browser service, 150 fields exist in the browser service in the partition table, and when writing the SQL code, a user only needs to write the code of the 20 fields, and the code level can be reduced to about 300 lines.
In the embodiment of the application, the first partition and the second partition further include a Type field, where the Type field is used to indicate a Type of accessing data in the partition where the Type field is located. That is, in addition to the usual partition fields of day and hour, a Type field may be added to the partition of the partition table to meet the time requirement, for example: shown in table 1.
Name (R) Type (B) Description of the invention
day string Date zoning
hour string Hourly zone
Type string Data type, 0 is summarized data
TABLE 1
For the scene requiring output timeliness, because the processing time length and the output time of each feature are different, in the embodiment of the application, the task result of the current processing can be written into the corresponding Type field to meet the timeliness.
For example, when the feature of location needs to be concerned about the yield aging, the following way can be written: 2021-10-16/hour ═ HIVE _ DEFAULT _ PARTITION:/TYPE ═ location
For normal scenes such as feature mining and quality analysis, generally, data features after integral splicing are needed, in the embodiment of the application, a certain Type field can be designated for summarizing, and summarized data is given.
For example, when aggregation is required, it can be written as follows:
day=2021-10-16/hour=_HIVE_DEFAULT_PARTITION_/Type=0。
therefore, in the embodiment of the application, the caliber disassembling and dividing tasks are convenient to analyze, the characteristics are added without paying attention to other fields, the thought is centralized, the complicated field supplement process is omitted, all tasks except the summary are not required to be changed when the fields are added, and the problems of difficult caliber operation and maintenance, poor compatibility and poor expansibility are integrally solved.
As can be seen from the above embodiment, in this embodiment, when a service field addition request for a first service in a data warehouse is received, a table structure of a corresponding partition table is changed; on the basis of the changed partition table, for a first service, automatically rewriting a first service code of an original field into a second service code containing the original field and an empty character string for representing a newly added field, automatically rewriting a third service code of the newly added field written by a user into a fourth service code containing the newly added field and the empty character string for representing the original field, storing the fourth service code and the second service code in different partitions, and summarizing the second service code and the fourth service code in different partitions to obtain a complete service code of the first service after the newly added field.
Compared with the prior art, in the embodiment of the application, when the service field needs to be added, the user only needs to compile and submit the corresponding service code for the newly added field without considering field compatibility, insertion sequence and tracing aperture, the data warehouse can automatically generate all the service codes of the first service based on the service code compiled by the user, and the maintenance cost of the data warehouse is reduced.
According to the data processing method in the data warehouse provided by the embodiment of the application, the execution main body can be a data processing device in the data warehouse. In the embodiment of the present application, a method for processing data in a data warehouse by a processing device of data in the data warehouse is taken as an example, and the processing device of data in the data warehouse provided in the embodiment of the present application is described.
Fig. 5 is a block diagram of a device for processing data in a data warehouse according to an embodiment of the present application, and as shown in fig. 5, the device 500 for processing data in a data warehouse may include: a modification module 501, a first processing module 502, a second processing module 503, and an aggregation module 504, wherein,
a modifying module 501, configured to, when a service field addition request for a first service in a data warehouse is received, add a name and a type of a newly added field in a partition table of the data warehouse according to information of the newly added field carried in the field addition request;
a first processing module 502, configured to obtain a first service code in a first partition in the partition table, generate a second service code according to the first service code and field information of the first service recorded in the partition table, and write the second service code into the first partition, where the first service code is a service code corresponding to an original field of the first service, and the second service code includes the original field of the first service and a null character string used for representing a newly added field;
a second processing module 503, configured to receive a third service code corresponding to an added field of the first service uploaded by a user, generate a fourth service code according to the third service code and field information of the first service recorded in the partition table, create a second partition under the partition table, and write the fourth service code into the second partition, where the fourth service code includes the added field of the first service and a null character string used for representing an original field;
a summarizing module 504, configured to summarize the second service code in the first partition and the fourth service code in the second partition, so as to obtain a complete service code of the first service after adding a field.
As can be seen from the above embodiment, in this embodiment, when a service field addition request for a first service in a data warehouse is received, a table structure of a corresponding partition table is changed; on the basis of the changed partition table, for a first service, automatically rewriting a first service code of an original field into a second service code containing the original field and an empty character string for representing a newly added field, automatically rewriting a third service code of the newly added field written by a user into a fourth service code containing the newly added field and the empty character string for representing the original field, storing the fourth service code and the second service code in different partitions, and summarizing the second service code and the fourth service code in different partitions to obtain a complete service code of the first service after the newly added field.
Compared with the prior art, in the embodiment of the application, when the service field needs to be added, the user only needs to compile and submit the corresponding service code for the newly added field without considering field compatibility, insertion sequence and tracing aperture, the data warehouse can automatically generate all the service codes of the first service based on the service code compiled by the user, and the maintenance cost of the data warehouse is reduced.
Optionally, as an embodiment, the summarizing module 504 may include:
the creating submodule is used for creating a third partition under the partition table;
and the first summarizing submodule is used for summarizing the second service code in the first partition and the fourth service code in the second partition in the third partition to obtain a complete service code of the first service after a field is added.
Optionally, as an embodiment, according to a process of data inflow and outflow, the data warehouse includes: the system comprises an operation type data storage layer, a middle layer, a data warehouse layer and a data application layer, wherein the partition table is stored in the middle layer;
the summarizing module 504 may include:
the sending submodule is used for sending a second business code in the first partition and a fourth business code in the second partition of the middle layer to the data warehouse layer;
and the second summarizing submodule is used for summarizing the second service code in the first partition and the fourth service code in the second partition in the data warehouse layer to obtain the complete service code of the first service after the field is added.
Optionally, as an embodiment, the summarizing module 504 may include:
and the third summarizing submodule is used for deleting the empty character strings in the second service code and the fourth service code through a GROUP BY function, and combining the remaining contents in the second service code and the fourth service code to obtain a complete service code of the first service after a field is added.
Optionally, as an embodiment, the third service code is an SQL code written according to a preset code writing rule, where the preset code writing rule is: each newly added field is assigned to a corresponding field in the partition table through an as;
the second processing module 503 may include:
the first analysis submodule is used for analyzing the names and types of all the fields of the first service in the partition table and generating a fifth service code based on the analyzed names and types of the fields, wherein data except the digital type and the structural type in the fifth service code are empty character strings;
the second analysis submodule is used for analyzing the content from the first select to the first from in the third service code to obtain the information of the newly added field;
and the inserting sub-module is used for inserting the information of the newly added field into the fifth service code according to the information indicated by the as in the third service code to obtain a fourth service code.
Optionally, as an embodiment, the first partition and the second partition further include a Type field, where the Type field is used to indicate a Type of accessing data in the partition where the Type field is located.
The processing device of the data in the data warehouse in the embodiment of the present application may be an electronic device, and may also be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be a device other than a terminal. The electronic Device may be, for example, a Mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic Device, a Mobile Internet Device (MID), an Augmented Reality (AR)/Virtual Reality (VR) Device, a robot, a wearable Device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and may also be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not particularly limited.
The processing device of the data in the data warehouse in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system (Android), an iOS operating system, or other possible operating systems, which is not specifically limited in the embodiments of the present application.
The processing device for data in a data warehouse provided in the embodiment of the present application can implement each process implemented in the method embodiment of fig. 1, and is not described here again to avoid repetition.
Optionally, as shown in fig. 6, an electronic device 600 is further provided in this embodiment of the present application, and includes a processor 601 and a memory 602, where a program or an instruction that can be executed on the processor 601 is stored in the memory 602, and when the program or the instruction is executed by the processor 601, the steps of the embodiment of the data processing method in the data warehouse are implemented, and the same technical effects can be achieved, and are not described again here to avoid repetition.
It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.
Fig. 7 is a hardware structure diagram of an electronic device implementing various embodiments of the present application.
The electronic device 700 includes, but is not limited to: a radio frequency unit 701, a network module 702, an audio output unit 703, an input unit 704, a sensor 705, a display unit 706, a user input unit 707, an interface unit 708, a memory 709, and a processor 710.
Those skilled in the art will appreciate that the electronic device 700 may also include a power supply (e.g., a battery) for powering the various components, and the power supply may be logically coupled to the processor 710 via a power management system, such that the functions of managing charging, discharging, and power consumption may be performed via the power management system. The electronic device structure shown in fig. 7 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.
The processor 710 is configured to, when a service field addition request for a first service in a data warehouse is received, add a name and a type of a newly added field in a partition table of the data warehouse according to information of the newly added field carried in the field addition request; acquiring a first service code in a first partition under the partition table, generating a second service code according to the first service code and field information of the first service recorded in the partition table, and writing the second service code into the first partition, wherein the first service code is a service code corresponding to an original field of the first service, and the second service code comprises the original field of the first service and a null character string used for representing a newly added field; receiving a third service code corresponding to a newly added field of the first service uploaded by a user, generating a fourth service code according to the third service code and field information of the first service recorded in the partition table, creating a second partition under the partition table, and writing the fourth service code into the second partition, wherein the fourth service code comprises the newly added field of the first service and a null character string for representing an original field; and summarizing the second service code in the first partition and the fourth service code in the second partition to obtain a complete service code of the first service after the field is added.
Therefore, in the embodiment of the application, when the service field needs to be added, the user only needs to write and submit the corresponding service code for the added field without considering field compatibility, insertion sequence and tracing aperture, the data warehouse can automatically generate all the service codes of the first service based on the service code written by the user, and the maintenance cost of the data warehouse is reduced.
Optionally, as an embodiment, the processor 710 is further configured to create a third partition under the partition table; and in the third partition, summarizing the second service code in the first partition and the fourth service code in the second partition to obtain a complete service code of the first service after the field is added.
Optionally, as an embodiment, according to a process of data inflow and outflow, the data warehouse includes: the system comprises an operation type data storage layer, a middle layer, a data warehouse layer and a data application layer, wherein the partition table is stored in the middle layer;
a processor 710, further configured to send a second business code in the first partition and a fourth business code in the second partition of the middle tier to the data warehouse tier; and summarizing the second service code in the first partition and the fourth service code in the second partition in the data warehouse layer to obtain a complete service code of the first service after the field is added.
Optionally, as an embodiment, the processor 710 is further configured to delete a null character string in the second service code and the fourth service code through a GROUP BY function, and combine the remaining contents in the second service code and the fourth service code to obtain a complete service code of the first service after a field is added.
Optionally, as an embodiment, the third service code is an SQL code written according to a preset code writing rule, where the preset code writing rule is: each newly added field is assigned to a corresponding field in the partition table through an as;
the processor 710 is further configured to parse names and types of all fields of the first service in the partition table, and generate a fifth service code based on the parsed names and types of the fields, where data except a digital type and a structure type in the fifth service code is an empty string; analyzing the content from the first select to the first from in the third service code to obtain the information of the newly added field; and inserting the information of the newly added field into the fifth service code according to the information indicated by the as in the third service code to obtain a fourth service code.
Optionally, as an embodiment, the first partition and the second partition further include a Type field, where the Type field is used to indicate a Type of accessing data in the partition where the Type field is located.
It should be understood that in the embodiment of the present application, the input Unit 704 may include a Graphics Processing Unit (GPU) 7041 and a microphone 7042, and the Graphics Processing Unit 7041 processes image data of still pictures or videos obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 706 may include a display panel 7061, and the display panel 7061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 707 includes at least one of a touch panel 7071 and other input devices 7072. The touch panel 7071 is also referred to as a touch screen. The touch panel 7071 may include two parts of a touch detection device and a touch controller. Other input devices 7072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.
The memory 709 may be used to store software programs as well as various data. The memory 709 may mainly include a first storage area for storing a program or an instruction and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or an instruction (such as a sound playing function, an image playing function, and the like) required by at least one function, and the like. Further, the memory 709 may include volatile memory or nonvolatile memory, or the memory 709 may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. The volatile Memory may be a Random Access Memory (RAM), a Static Random Access Memory (Static RAM, SRAM), a Dynamic Random Access Memory (Dynamic RAM, DRAM), a Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, ddr SDRAM), an Enhanced Synchronous SDRAM (ESDRAM), a Synchronous Link DRAM (SLDRAM), and a Direct Memory bus RAM (DRRAM). The memory 709 in the embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.
Processor 710 may include one or more processing units; optionally, the processor 710 integrates an application processor, which primarily handles operations related to the operating system, user interface, and applications, and a modem processor, which primarily handles wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into processor 710.
The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the embodiment of the data processing method in the data warehouse, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a computer read only memory ROM, a random access memory RAM, a magnetic or optical disk, and the like.
The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the embodiment of the data processing method in the data warehouse, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.
Embodiments of the present application provide a computer program product, where the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the processes of the foregoing data processing method embodiments in a data warehouse, and can achieve the same technical effects, and in order to avoid repetition, details are not repeated here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method of processing data in a data warehouse, the method comprising:
under the condition that a service field newly-increased request aiming at a first service in a data warehouse is received, increasing the name and the type of the newly-increased field in a partition table of the data warehouse according to the information of the newly-increased field carried in the field newly-increased request;
acquiring a first service code in a first partition under the partition table, generating a second service code according to the first service code and field information of the first service recorded in the partition table, and writing the second service code into the first partition, wherein the first service code is a service code corresponding to an original field of the first service, and the second service code comprises the original field of the first service and a null character string used for representing a newly added field;
receiving a third service code corresponding to a newly added field of the first service uploaded by a user, generating a fourth service code according to the third service code and field information of the first service recorded in the partition table, creating a second partition under the partition table, and writing the fourth service code into the second partition, wherein the fourth service code comprises the newly added field of the first service and a null character string for representing an original field;
and summarizing the second service code in the first partition and the fourth service code in the second partition to obtain a complete service code of the first service after the field is added.
2. The method of claim 1, wherein the aggregating the second service code in the first partition and the fourth service code in the second partition to obtain a complete service code of the first service after adding a new field comprises:
creating a third partition under the partition table;
and in the third partition, summarizing the second service code in the first partition and the fourth service code in the second partition to obtain a complete service code of the first service after the field is added.
3. The method of claim 1, wherein the data warehouse comprises, in accordance with the process of data flow-in and flow-out: the system comprises an operation type data storage layer, a middle layer, a data warehouse layer and a data application layer, wherein the partition table is stored in the middle layer;
the summarizing the second service code in the first partition and the fourth service code in the second partition to obtain the complete service code of the first service after the field is added, including:
sending a second business code in the first partition and a fourth business code in the second partition of the middle layer to the data warehouse layer;
and summarizing the second service code in the first partition and the fourth service code in the second partition in the data warehouse layer to obtain a complete service code of the first service after the field is added.
4. The method as claimed in any one of claims 1 to 3, wherein said aggregating the second service code in the first partition and the fourth service code in the second partition to obtain the complete service code of the first service after adding the new field comprises:
deleting the empty character strings in the second service code and the fourth service code through a GROUP BY function, and combining the remaining contents in the second service code and the fourth service code to obtain a complete service code of the first service after the field is added.
5. The method according to claim 1, wherein the third business code is SQL code written according to preset code writing rules, wherein the preset code writing rules are: each newly added field is assigned to a corresponding field in the partition table through an as;
generating a fourth service code according to the third service code and the field information of the first service recorded in the partition table, including:
analyzing the names and types of all fields of the first service in the partition table, and generating a fifth service code based on the analyzed names and types of the fields, wherein data except the digital type and the structural type in the fifth service code are empty character strings;
analyzing the content from the first select to the first from in the third service code to obtain the information of the newly added field;
and inserting the information of the newly added field into the fifth service code according to the information indicated by the as in the third service code to obtain a fourth service code.
6. An apparatus for processing data in a data warehouse, the apparatus comprising:
the modification module is used for increasing the name and the type of a newly added field in a partition table of a data warehouse according to the information of the newly added field carried in a newly added field request under the condition of receiving the newly added service field request aiming at a first service in the data warehouse;
the first processing module is used for acquiring a first service code in a first partition under the partition table, generating a second service code according to the first service code and field information of the first service recorded in the partition table, and writing the second service code into the first partition, wherein the first service code is a service code corresponding to an original field of the first service, and the second service code comprises the original field of the first service and a null character string used for representing a newly added field;
the second processing module is used for receiving a third service code corresponding to the newly added field of the first service uploaded by a user, generating a fourth service code according to the third service code and field information of the first service recorded in the partition table, creating a second partition under the partition table, and writing the fourth service code into the second partition, wherein the fourth service code comprises the newly added field of the first service and a null character string used for representing an original field;
and the summarizing module is used for summarizing the second service code in the first partition and the fourth service code in the second partition to obtain the complete service code of the first service after the field is added.
7. The apparatus of claim 6, wherein the aggregation module comprises:
the creating submodule is used for creating a third partition under the partition table;
and the first summarizing submodule is used for summarizing the second service code in the first partition and the fourth service code in the second partition in the third partition to obtain a complete service code of the first service after a field is added.
8. The apparatus of claim 6, wherein the data warehouse comprises, in accordance with a process of data inflow and outflow: the system comprises an operation type data storage layer, a middle layer, a data warehouse layer and a data application layer, wherein the partition table is stored in the middle layer;
the summarization module comprises:
the sending submodule is used for sending a second business code in the first partition and a fourth business code in the second partition of the middle layer to the data warehouse layer;
and the second summarizing submodule is used for summarizing the second service code in the first partition and the fourth service code in the second partition in the data warehouse layer to obtain the complete service code of the first service after the field is added.
9. The apparatus of any of claims 6 to 8, wherein the aggregation module comprises:
and the third summarizing submodule is used for deleting the empty character strings in the second service code and the fourth service code through a GROUP BY function, and combining the remaining contents in the second service code and the fourth service code to obtain a complete service code of the first service after a field is added.
10. The apparatus according to claim 6, wherein the third service code is SQL code written according to preset code writing rules, where the preset code writing rules are: each newly added field is assigned to a corresponding field in the partition table through an as;
the second processing module comprises:
the first analysis submodule is used for analyzing the names and types of all the fields of the first service in the partition table and generating a fifth service code based on the analyzed names and types of the fields, wherein data except the digital type and the structural type in the fifth service code are empty character strings;
the second analysis submodule is used for analyzing the content from the first select to the first from in the third service code to obtain the information of the newly added field;
and the inserting sub-module is used for inserting the information of the newly added field into the fifth service code according to the information indicated by the as in the third service code to obtain a fourth service code.
CN202111386391.1A 2021-11-22 2021-11-22 Data processing method and device in data warehouse Pending CN114064612A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111386391.1A CN114064612A (en) 2021-11-22 2021-11-22 Data processing method and device in data warehouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111386391.1A CN114064612A (en) 2021-11-22 2021-11-22 Data processing method and device in data warehouse

Publications (1)

Publication Number Publication Date
CN114064612A true CN114064612A (en) 2022-02-18

Family

ID=80278852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111386391.1A Pending CN114064612A (en) 2021-11-22 2021-11-22 Data processing method and device in data warehouse

Country Status (1)

Country Link
CN (1) CN114064612A (en)

Similar Documents

Publication Publication Date Title
CN107391653B (en) Distributed NewSQL database system and picture data storage method
US10599313B2 (en) System for high volume data analytic integration and channel-independent advertisement generation
US10409583B2 (en) Content deployment system having a content publishing engine with a filter module for selectively extracting content items provided from content sources for integration into a specific release and methods for implementing the same
US20210318851A1 (en) Systems and Methods for Dataset Merging using Flow Structures
US9823813B2 (en) Apparatus and methods for performing an action on a database record
US20160364770A1 (en) System for high volume data analytic integration and channel-independent advertisement generation
CN111475757A (en) Page updating method and device
CN102142014B (en) System and method for export and import of metadata located in metadata registries
US10565280B2 (en) Website content identification in a content management system
US20170315790A1 (en) Interactive multimodal display platform
US11698944B2 (en) System and method for creation and handling of configurable applications for website building systems
CN104750776A (en) Accessing information content in a database platform using metadata
US20080222112A1 (en) Method and System for Document Searching and Generating to do List
CN107391535A (en) The method and device of document is searched in document application
CN113688612A (en) Multimodal sharing of content between documents
CN113791765A (en) Resource arranging method, device and equipment of cloud service and storage medium
CN104915390A (en) ETL data lineage query system and query method
US11308131B2 (en) Combining visualizations in a business analytic application
CN115167785B (en) Label-based network disk file management method and device, network disk and storage medium
US20230289729A1 (en) Systems and methods for visualizing and managing project flows in a megaproject
CN114064612A (en) Data processing method and device in data warehouse
KR20130126012A (en) Method and apparatusfor providing report of business intelligence
US11803356B1 (en) Software requirements creating and tracking system and method
CN114253922A (en) Resource directory management method, resource management method, device, equipment and medium
CN113419957A (en) Rule-based big data offline batch processing performance capacity scanning method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination