CN111694810B - Data warehouse creation method and device, electronic equipment and readable storage medium - Google Patents

Data warehouse creation method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN111694810B
CN111694810B CN201910191102.9A CN201910191102A CN111694810B CN 111694810 B CN111694810 B CN 111694810B CN 201910191102 A CN201910191102 A CN 201910191102A CN 111694810 B CN111694810 B CN 111694810B
Authority
CN
China
Prior art keywords
data
processing
warehouse
domain
data warehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910191102.9A
Other languages
Chinese (zh)
Other versions
CN111694810A (en
Inventor
康进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910191102.9A priority Critical patent/CN111694810B/en
Publication of CN111694810A publication Critical patent/CN111694810A/en
Application granted granted Critical
Publication of CN111694810B publication Critical patent/CN111694810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models

Abstract

The invention discloses a method and a device for creating a data warehouse, electronic equipment and a computer readable storage medium. The creation method comprises the following steps: acquiring original data; preprocessing the original data to obtain preprocessed data; and processing the pre-processing data according to a preset label and a field model to obtain a data warehouse for query and acquisition.

Description

Data warehouse creation method and device, electronic equipment and readable storage medium
Technical Field
The present invention relates to the field of data warehouse technology, and more particularly, to a data warehouse creation method, a data warehouse creation apparatus, an electronic device, and a readable storage medium.
Background
A data warehouse is a strategic set that provides all types of data support for all levels of decision-making processes of an enterprise. It is a single data store created for analytical reporting and decision support purposes. To the business that needs business intelligence, provide and guide business process improvement, monitoring time, cost, quality and control. The data warehouse can screen and integrate various business data and can be used for data analysis, data mining and data reporting.
However, the prior art fails to create a data warehouse from less raw data, making the existing data warehouse less time efficient.
Disclosure of Invention
It is an object of the present invention to provide a new solution for creating a data warehouse.
According to a first aspect of the present invention, there is provided a method of creating a data warehouse, comprising:
acquiring original data;
preprocessing the original data to obtain preprocessed data;
and processing the pre-processing data according to a preset label and a field model to obtain a data warehouse for query and acquisition.
Optionally, the raw data comprises unstructured data,
the step of preprocessing the original data to obtain preprocessed data comprises the following steps:
carrying out structuring treatment on the unstructured data to obtain structured data;
and cleaning the structured data to obtain the preprocessing data.
Optionally, the step of processing the pre-processing data according to a preset label and a domain model to obtain a data warehouse for query and acquisition includes:
extracting business concepts corresponding to each label contained in the preprocessing data according to the field model;
determining the association relation between each business concept according to the field model;
and creating the data warehouse according to the domain model and the association relation so as to obtain the query.
Optionally, the tag includes a theme corresponding to the data detail layer, a public index corresponding to the data summary layer, and/or a personalized index corresponding to the data application layer.
Optionally, the tag includes a theme corresponding to the data detail layer, a public index corresponding to the data summary layer, and a personalized index corresponding to the data application layer;
the step of processing the pre-processing data to obtain a data warehouse according to a preset label and a domain model for query acquisition comprises the following steps:
processing the preprocessing data in the data detail layer according to the theme and the field model to obtain first data;
processing the first data at the summarization layer according to the public indexes and the domain model to obtain second data;
and processing the second data at the application layer according to the personalized index and the field model to obtain the data warehouse for query and acquisition.
Optionally, the processing the pre-processed data according to a preset label and a preset domain model to obtain a data warehouse, before the query is obtained, further includes:
and loading the increment or the whole amount of the pre-processing data to the data detail layer, and executing the step of processing the pre-processing data in the data detail layer according to the theme and the field model to obtain first data.
Optionally, the creating method further includes:
acquiring a service case;
extracting service concepts contained in the service use cases;
determining the association relation between each business concept according to the business use cases;
performing domain division on the business concept according to a preset domain label to obtain a domain division result;
and obtaining the domain model according to the domain division result and the association relation.
Optionally, the creating method further includes:
the data warehouse is exposed in response to a query request for the data warehouse.
According to a second aspect of the present invention, there is provided a creation apparatus of a data warehouse, comprising:
the data acquisition module is used for acquiring the original data;
the preprocessing module is used for preprocessing the original data to obtain preprocessed data;
and the data warehouse creating module is used for processing the preprocessing data according to the preset label and the field model to obtain a data warehouse for query and acquisition.
Optionally, the raw data comprises unstructured data,
the pre-processing module is also used for:
carrying out structuring treatment on the unstructured data to obtain structured data;
and cleaning the structured data to obtain the preprocessing data.
Optionally, the data warehouse creation module is further configured to:
extracting business concepts corresponding to each label contained in the preprocessing data according to the field model;
determining the association relation between each business concept according to the field model;
and creating the data warehouse according to the domain model and the association relation so as to obtain the query.
Optionally, the tag includes a theme corresponding to the data detail layer, a public index corresponding to the data summary layer, and/or a personalized index corresponding to the data application layer.
Optionally, the tag includes a theme corresponding to the data detail layer, a public index corresponding to the data summary layer, and a personalized index corresponding to the data application layer;
the data warehouse creation module is further to:
processing the preprocessing data in the data detail layer according to the theme and the field model to obtain first data;
processing the first data at the summarization layer according to the public indexes and the domain model to obtain second data;
and processing the second data at the application layer according to the personalized index and the field model to obtain the data warehouse for query and acquisition.
Optionally, the creating device further includes:
and the module is used for loading the increment or the whole amount of the pre-processed data to the data detail layer so that the data warehouse creation module can execute the step of processing the pre-processed data at the data detail layer according to the theme and the domain model to obtain first data.
Optionally, the creating device further includes:
the module is used for acquiring service cases;
a module for extracting business concepts contained in the business use case;
a module for determining the association relation between each business concept according to the business use cases;
the module is used for carrying out domain division on the business concepts according to a preset domain label to obtain a domain division result;
and the module is used for obtaining the domain model according to the domain division result and the association relation.
Optionally, the creating device further includes:
means for exposing the data warehouse in response to a query request for the data warehouse.
According to a third aspect of the present invention, there is provided an electronic device comprising:
a creation means according to the second aspect of the present invention; or,
a processor and a memory for storing instructions for controlling the processor to perform the creation method according to the first aspect of the invention.
According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the creation method according to the first aspect of the present invention.
In the embodiment of the invention, the data warehouse is obtained by processing the preprocessing data obtained by preprocessing the original data through the preset label and the domain model. Thus, by using the combination of the field model and the data warehouse, the data warehouse can realize an hour-level report, a minute-level report and even a real-time report. And moreover, the creation efficiency of the data warehouse can be improved, the development cost of the data warehouse is reduced, and the expansibility and stability of the data warehouse are improved.
Other features of the present invention and its advantages will become apparent from the following detailed description of exemplary embodiments of the invention, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a block diagram of one example of a hardware configuration of an electronic device that may be used to implement an embodiment of the invention;
FIG. 2 is a block diagram of another example of a hardware configuration of an electronic device that may be used to implement an embodiment of the invention;
FIG. 3 is a flow chart of a method for creating a data warehouse according to a first embodiment of the present invention;
FIG. 4 is a flow chart of a method of creating a data warehouse according to a first embodiment of the present invention;
FIG. 5 is a flow chart of a method of creating a data warehouse according to a first embodiment of the present invention;
FIG. 6 is a flow chart of a method of creating a data warehouse according to a first embodiment of the present invention;
FIG. 7 shows a functional block diagram of a creation means of a data warehouse provided by an embodiment of the present invention;
fig. 8 is a functional block diagram of an electronic device provided according to a first embodiment of the present invention;
fig. 9 is a schematic diagram of a hardware structure of an electronic device according to a second embodiment of the present invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
The following description of at least one exemplary embodiment is merely exemplary in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
< hardware configuration >
Fig. 1 and 2 are block diagrams of hardware configurations of an electronic device 1000 that may be used to implement the method of creating a data warehouse of any embodiment of the present invention.
In one embodiment, as shown in FIG. 1, electronic device 1000 may be a server 1100.
The server 1100 provides the service points for processing, databases, communication facilities. The server 1100 may be a monolithic server or a distributed server across multiple computers or computer data centers. The server may be of various types such as, but not limited to, a web server, news server, mail server, message server, advertisement server, file server, application server, interaction server, database server, or proxy server. In some embodiments, each server may include hardware, software, or embedded logic components or a combination of two or more such components for performing the appropriate functions supported by or implemented by the server. For example, a server, such as a blade server, cloud server, etc., or may be a server group consisting of multiple servers, may include one or more of the types of servers described above, etc.
In this embodiment, the server 1100 may include a processor 1110, a memory 1120, an interface device 1130, a communication device 1140, a display device 1150, and an input device 1160, as shown in fig. 1.
In this embodiment, the server 1100 may also include a speaker, microphone, etc., without limitation.
The processor 1110 may be a dedicated server processor, or may be a desktop processor, a mobile processor, or the like that meets performance requirements, which is not limited herein. The memory 1120 includes, for example, ROM (read only memory), RAM (random access memory), nonvolatile memory such as a hard disk, and the like. The interface device 1130 includes, for example, various bus interfaces such as a serial bus interface (including a USB interface), a parallel bus interface, and the like. The communication device 1140 can perform wired or wireless communication, for example. The display device 1150 is, for example, a liquid crystal display, an LED display touch display, or the like. The input device 1160 may include, for example, a touch screen, a keyboard, and the like.
In this embodiment, the memory 1120 of the server 1100 is used to store instructions for controlling the processor 1110 to operate at least to perform a method of creating a data warehouse according to any embodiment of the present invention. The skilled person can design instructions according to the disclosed solution. How the instructions control the processor to operate is well known in the art and will not be described in detail here.
Although a plurality of devices of the server 1100 are shown in fig. 1, the present invention may relate to only some of the devices, for example, the server 1100 may relate to only the memory 1120 and the processor 1110.
In one embodiment, the electronic device 1000 may be a terminal device 1200 such as a PC, a notebook computer, etc. used by an operator, which is not limited herein.
In this embodiment, referring to fig. 2, the terminal apparatus 1200 may include a processor 1210, a memory 1220, an interface device 1230, a communication device 1240, a display device 1250, an input device 1260, a speaker 1270, a microphone 1280, and the like.
Processor 1210 may be a mobile version processor. The memory 1220 includes, for example, ROM (read only memory), RAM (random access memory), nonvolatile memory such as a hard disk, and the like. The interface device 1230 includes, for example, a USB interface, a headphone interface, and the like. The communication device 1240 may be, for example, a wired or wireless communication device, and the communication device 1240 may include a short-range communication device, for example, any device that performs short-range wireless communication based on a short-range wireless communication protocol such as a Hilink protocol, wiFi (IEEE 802.11 protocol), mesh, bluetooth, zigBee, thread, Z-Wave, NFC, UWB, liFi, or the like, and the communication device 1240 may also include a remote communication device, for example, any device that performs WLAN, GPRS, 2G/3G/4G/5G remote communication. The display device 1250 is, for example, a liquid crystal display, a touch display, or the like. The input device 1260 may include, for example, a touch screen, a keyboard, and the like. A user may input/output voice information through the speaker 1270 and the microphone 1280.
In this embodiment, the memory 1220 of the terminal device 1200 is used to store instructions for controlling the processor 1210 to operate to perform at least the method of creating a data warehouse according to any embodiment of the present invention. The skilled person can design instructions according to the disclosed solution. How the instructions control the processor to operate is well known in the art and will not be described in detail here.
Although a plurality of devices of the terminal apparatus 1200 are shown in fig. 2, the present invention may relate to only some of the devices, for example, the terminal apparatus 1200 may relate to only the memory 1220 and the processor 1210 and the display device 1250.
< method >
Fig. 3 is a flow diagram of a method of creating a data warehouse, which may be implemented by an electronic device, in accordance with an embodiment of the present invention. The electronic device may be a server 1100 as shown in fig. 1 or a terminal device 1200 as shown in fig. 2.
As shown in fig. 3, the method for creating a data warehouse of the present embodiment may include the following steps S3100 to S3300:
step S3100, the raw data is acquired.
In one embodiment of the invention, the raw data may be obtained by an operational data layer (operational data store).
Specifically, the operation data layer may extract the specified data from the data source as the original data.
Since data generated by some sources is not valuable for analysis, or may be of much lower cost in implementation and performance than the data warehouse required to store such data, it may be that only specified data is extracted.
The original data may be extracted from a specified client or an open data processing service (Open Data Processing Service, ODPS for short), or may be extracted from a generated log or service data.
Step S3200, preprocessing is performed on the original data to obtain preprocessed data.
The data acquired by step S3100 may include structured data or unstructured data. Wherein, the structured data, also called row data, is data logically expressed and realized by a two-dimensional table structure, strictly follows the data format and length specification, and is mainly stored and managed by a relational database. Opposite to structured data is unstructured data that is not suitable for presentation by a two-dimensional table of a database, including office documents in all formats, XML, HTML, various types of reports, pictures and frequency, video information, and the like.
In embodiments where the raw data includes structured data, the step of preprocessing the structured data may include a data cleansing process to obtain preprocessed data. Specifically, the cleaning process may clean dirty data such as incomplete data, erroneous data, and repeated data in the structured data. The cleansing process in this embodiment is a process of rechecking and verifying structured data, with the aim of deleting duplicate information, correcting errors that exist, and providing data consistency.
In embodiments where the original data includes unstructured data, the step of preprocessing the unstructured data may include: carrying out structuring treatment on unstructured data to obtain structured data; and cleaning the structured data to obtain the preprocessing data. Unstructured data may be converted into structured data by performing a structuring process on the unstructured data.
Further, the manner in which unstructured data is structured may include, for example, transcoding (e.g., m/f- > male/female), field conversion (e.g., balance- > bal), conversion of units of measure (e.g., cm- > m), and conversion of data granularity. The business system data stores very detailed data, while the data in the data warehouse is analyzed, and the business system data is aggregated according to the granularity of the data warehouse without the need of very detailed data.
In step S3300, the pre-processing data is processed according to the preset label and the domain model to obtain a data warehouse for query and acquisition.
The data warehouse in this embodiment may be a data report. The labels can be preset according to application scenes or specific requirements.
In one embodiment of the present invention, the domain model may be specifically obtained according to steps S4100 to S4500 shown in fig. 4:
in step S4100, a service instance is acquired.
The service use cases can be pre-written according to application scenes or specific requirements, and the service use cases can reflect product requirements.
In step S4200, the business concepts included in the business case are extracted, wherein the business concepts include labels.
Specifically, the service concept may be a specific field included in the service case, or may be a concept corresponding to a specific field included in the service case. For example, for a business case "user clicks on an advertisement into a landing page," the business concepts contained therein may include "user click behavior," advertisement, "and" landing page.
Step S4300, determining the association relationship between each business concept according to the business use cases.
Specifically, the service concepts contained in the same use case can be associated with each other. For example, the business concepts extracted from the business case may include "user click behavior", "advertisement", "landing page", "advertiser" and "drop", and then, according to the business case "user click advertisement into landing page", the association between the business concepts "user click behavior", "advertisement" and "landing page" may be determined. From the business use case "advertiser posts an advertisement", an association between the business concepts "advertiser", "post", and "advertisement" may be determined.
Step S4400, performing domain division on the business concept according to a preset domain label to obtain a domain division result.
The domain label can be set in advance according to application scenes or specific requirements. Each domain label is used for uniquely identifying a corresponding business domain. According to the method, the service concepts are divided into the fields according to the preset field labels, and each service concept can be divided into the service fields under the corresponding field labels, so that each service concept has a corresponding field label, namely, each service concept is divided into the corresponding service fields, and a field division result is obtained.
By dividing the service concepts into service fields, the coupling between the service fields is low, and the coupling between the service concepts in the service fields is high. The effect of low coupling between the service fields and high cohesion in the service fields is achieved.
And step S4500, obtaining a domain model according to the domain division result and the association relation.
And obtaining a domain model according to the domain division result and the corresponding relation between each service concept.
In one embodiment of the present invention, the step of processing the pre-processed data to obtain the data warehouse according to the preset label and domain model for query acquisition may further include steps S3311 to S3313 as shown in fig. 5:
step S3311, extracting the business concept corresponding to each label from the pre-processing data according to the preset labels and the domain model.
Specifically, the business concept corresponding to each label may be defined in advance through the domain model. Then, according to the domain model, the business concept corresponding to each label contained in the pre-processing data can be extracted.
Step S3312, determining the association relationship between each business concept according to the domain model.
Since the association relationship between each business concept is predefined in the domain model, the association relationship between the business concepts corresponding to each tag contained in the pre-processing data can be determined according to the domain model.
Step S3313, creating a data warehouse for query acquisition according to the domain model and the determined association relationship.
Further, the tag in this embodiment may include a theme corresponding to the data detail layer, a public index corresponding to the data summary layer, and/or a personalized index corresponding to the data application layer.
The data detail layer (Data Warehouse Detail, DWD layer) is used for storing detail data and dimension table data. And the data summarization layer (Data Warehouse Summary, DWS layer for short) is used for storing the public indexes. And the data application layer (Application Data store, called ADS layer for short) is used for storing the personalized index.
For example, the common indicator may be revenue, and then financial revenue, report revenue, etc. all belong to the common indicator of revenue. For example, the personalized index may be a unit of revenue, and may specifically be a meta or a score.
In one embodiment of the invention, the tag includes a theme corresponding to the data detail layer, a common index corresponding to the data summary layer, and a personalized index corresponding to the data application layer. Then, according to the preset label and the domain model, the step of processing the pre-processed data to obtain the data warehouse for query acquisition may include steps S3321 to S3323 shown in fig. 6:
step S3321, processing the pre-processed data in the detail layer according to the subject and the field model of the corresponding data detail layer to obtain first data.
The first data may be a data report.
For example, there are multiple tables in the pre-process data that are relevant to the advertiser: the table a contains company information of advertisers, the table b contains delivery information of the advertisers, the table c contains bid information of the advertisers, and the table d contains display, clicking and consumption information of the advertisers. The processing operation of the pre-processed data in the data detail layer according to the "advertiser" theme can be to splice the table a, b, c, d into a detail table so as to meet the query of various advertiser information.
Specifically, the step of processing the pre-processed data in the detail layer to obtain the first data according to the subject and the domain model of the corresponding data detail layer may refer to the descriptions of the foregoing steps S3311 to S3313, which are not repeated herein.
Further, the subject in the present embodiment may be the domain label in step S4400.
And step S3322, processing the first data at the data summarization layer according to the public indexes and the field model of the corresponding data summarization layer to obtain second data.
The second data may be a data report.
The data summarization layer typically performs a summarization operation on the first data of the data detail layer. For example: and summarizing the common index of the common agent in the corresponding detail table of the advertiser theme of the data detail layer to obtain a mild summary table of common information such as the exhibition, clicking, consumption and the like of the agent.
Specifically, the step of processing the first data at the data aggregation layer according to the common index and the domain model of the corresponding data aggregation layer to obtain the second data may refer to the descriptions of the foregoing steps S3311 to S3313, which are not repeated herein.
Step S3323, processing the second data at the data application layer according to the personalized index and the field model of the corresponding data application layer to obtain a data warehouse for query and acquisition.
The data warehouse may be a data report. The data application layer may be a fine-grained operation on the second data of the data aggregation layer. For example, for a mild summary table for a common index of "agent" under "advertiser" topic, a personalized index of "consumption" may be summarized to obtain a data warehouse for consumption information.
Specifically, the step of processing the second data at the data application layer to obtain the data warehouse according to the personalized index and the domain model of the corresponding data application layer may refer to the descriptions of the foregoing steps S3311 to S3313, which are not repeated herein.
Thus, there are multiple levels of reporting (data detail layer, data summary layer, data application layer) under a topic. The method comprises the steps of providing a public label, wherein the theme is a specific field in first data of a data detail layer, the public label is a specific field in second data of data summarization, and the personalized label is a specific field of a data application layer data warehouse.
On this basis, the creation method may further include: and loading the converted data increment or total quantity to a data detail layer so as to execute the step of processing the preprocessing data at the data detail layer according to the theme and the domain model to obtain first data.
The full load may specifically be a load of all the pre-processed data at once.
Incremental loading typically requires full loading for the first time, but consumes significant physical and time resources if full loading is still required during the second or third cycle. It is possible that some of the data sources are unchanged, while some of the data sources may be augmented with only a small amount of data. Only new modified records and new inserted records are considered for data in the data source is delta loading.
In one embodiment of the present invention, the method for creating a data warehouse of the present invention may further include, after performing step S3300: in response to the query request for the data warehouse, the data warehouse created through this step S3300 is exposed.
Specifically, the user may be presented with MySQL or a specific application. Wherein MySQL is a relational database management system.
In the embodiment of the invention, the data warehouse is obtained by processing the preprocessing data obtained by preprocessing the original data through the preset label and the domain model. Thus, by using the combination of the field model and the data warehouse, the data warehouse can realize an hour-level report, a minute-level report and even a real-time report. And moreover, the creation efficiency of the data warehouse can be improved, the development cost of the data warehouse is reduced, and the expansibility and stability of the data warehouse are improved.
< device >
In the present embodiment, there is provided a creation apparatus 7000 of a data warehouse, which includes a data acquisition module 7100, a preprocessing module 7200, and a data warehouse creation module 7300, as shown in fig. 7. The data acquisition module 7100 is used for acquiring original data; the preprocessing module 7200 is used for preprocessing the original data to obtain preprocessed data; the data warehouse creating module 7300 is configured to process the pre-processed data according to a preset label and domain model to obtain a data warehouse for query and acquisition.
In one embodiment of the invention, the raw data may include unstructured data, then the preprocessing module 7200 may also be used to:
carrying out structuring treatment on unstructured data to obtain structured data;
and cleaning the structured data to obtain the preprocessing data.
In one embodiment of the invention, the data warehouse creation module 7300 may also be configured to:
extracting business concepts corresponding to each label contained in the preprocessing data according to the field model;
determining the association relation between each business concept according to the field model;
and creating a data warehouse according to the domain model and the association relation for query acquisition.
In one embodiment of the invention, the tag includes a theme corresponding to the data detail layer, a common indicator corresponding to the data summary layer, and/or a personalized indicator corresponding to the data application layer.
Further, the tag comprises a theme corresponding to the data detail layer, a public index corresponding to the data summarization layer and a personalized index corresponding to the data application layer; then, the data warehouse creation module 7300 may also be configured to:
processing the pre-processing data in a data detail layer according to the theme and the field model to obtain first data;
processing the first data at the summarizing layer according to the public index and the field model to obtain second data;
and processing the second data at the application layer according to the personalized index and the field model to obtain a data warehouse for query and acquisition.
In one embodiment of the present invention, the creating means 7000 may further include:
the module for loading the pre-processed data delta or full to the data detail layer for the data warehouse creation module 7300 to perform the step of processing the pre-processed data at the data detail layer to obtain the first data according to the topic and domain model.
In one embodiment of the present invention, the creating means 7000 further comprises:
the module is used for acquiring service cases;
a module for extracting business concepts contained in the business use case;
a module for determining the association relation between each business concept according to the business use cases;
the module is used for carrying out domain division on the business concept according to a preset domain label to obtain a domain division result; the method comprises the steps of,
and the module is used for obtaining a domain model according to the domain division result and the association relation.
In one embodiment of the present invention, the creating means 7000 further comprises:
the method includes exposing a data warehouse in response to a query request for the data warehouse.
It will be appreciated by those skilled in the art that the creation means 7000 of the data warehouse may be implemented in various ways. For example, the creation means 7000 of the data warehouse may be implemented by an instruction configuration processor. For example, instructions may be stored in a ROM and, when the device is started, the instructions are read from the ROM into the programmable device to implement the creation means 7000 of the data warehouse. For example, the creation means 7000 of the data warehouse may be solidified into a dedicated device (e.g. ASIC). The creation means 7000 of the data warehouse may be divided into mutually independent units or they may be implemented together. The creation means 7000 of the data warehouse may be implemented by one of the various implementations described above, or may be implemented by a combination of two or more of the various implementations described above.
In this embodiment, the creation means 7000 of the data warehouse may have various implementation forms, for example, the creation means 7000 of the data warehouse may be any functional module running in a software product or an application program providing the network access service, or an external embedded part, a plug-in part, a patch part, etc. of the software product or the application program, or may be the software product or the application program itself.
< electronic device >
In this embodiment, there is also provided an electronic device 1000, where the electronic device 1000 may be the server 1100 shown in fig. 1 or the terminal device 1200 shown in fig. 2.
As shown in fig. 8, the electronic device 1000 may include a creation means 7000 of a data warehouse according to any embodiment of the invention for implementing the creation method of the data warehouse of any embodiment of the invention.
In another embodiment, as shown in fig. 9, the electronic device 1000 may further include a processor 1300 and a memory 1400, the memory 1400 for storing executable instructions; the processor 1300 is configured to execute the electronic device 1000 according to the control of the instruction to perform the method of creating a data warehouse according to any embodiment of the present invention.
< computer-readable storage Medium >
In this embodiment, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of creating a data warehouse as in any of the embodiments of the present invention.
The present invention may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present invention may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are all equivalent.
The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvements in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims (7)

1. A method of creating a data warehouse, comprising:
acquiring original data;
preprocessing the original data to obtain preprocessed data;
processing the pre-processing data according to a preset label and a field model to obtain a data warehouse for query and acquisition;
the label comprises a theme corresponding to the data detail layer, a public index corresponding to the data summarization layer and a personalized index corresponding to the data application layer;
the step of processing the pre-processing data to obtain a data warehouse according to a preset label and a domain model for query acquisition comprises the following steps:
processing the preprocessing data in the data detail layer according to the theme and the field model to obtain first data;
processing the first data at the summarization layer according to the public indexes and the domain model to obtain second data;
processing the second data at the application layer according to the personalized index and the field model to obtain the data warehouse for query and acquisition;
the creation method further comprises the following steps:
acquiring a service case;
extracting service concepts contained in the service use cases;
determining the association relation between each business concept according to the business use cases;
performing domain division on the business concept according to a preset domain label to obtain a domain division result;
and obtaining the domain model according to the domain division result and the association relation.
2. The creation method of claim 1, wherein the original data comprises unstructured data,
the step of preprocessing the original data to obtain preprocessed data comprises the following steps:
carrying out structuring treatment on the unstructured data to obtain structured data;
and cleaning the structured data to obtain the preprocessing data.
3. The creation method according to claim 1, wherein the processing the pre-processed data according to a preset label and domain model to obtain a data warehouse, before obtaining the query, further comprises:
and loading the increment or the whole amount of the pre-processing data to the data detail layer, and executing the step of processing the pre-processing data in the data detail layer according to the theme and the field model to obtain first data.
4. The creation method of claim 1, wherein the creation method further comprises:
the data warehouse is exposed in response to a query request for the data warehouse.
5. A data warehouse creation apparatus, comprising:
the data acquisition module is used for acquiring the original data;
the preprocessing module is used for preprocessing the original data to obtain preprocessed data;
the data warehouse creating module is used for processing the preprocessing data according to a preset label and a field model to obtain a data warehouse for query and acquisition;
the label comprises a theme corresponding to the data detail layer, a public index corresponding to the data summarization layer and a personalized index corresponding to the data application layer;
the data warehouse creation module is further to:
processing the preprocessing data in the data detail layer according to the theme and the field model to obtain first data;
processing the first data at the summarization layer according to the public indexes and the domain model to obtain second data;
processing the second data at the application layer according to the personalized index and the field model to obtain the data warehouse for query and acquisition;
the creation apparatus further includes:
the module is used for acquiring service cases;
a module for extracting business concepts contained in the business use case;
a module for determining the association relation between each business concept according to the business use cases;
the module is used for carrying out domain division on the business concepts according to a preset domain label to obtain a domain division result;
and the module is used for obtaining the domain model according to the domain division result and the association relation.
6. An electronic device, comprising:
the creation apparatus of claim 5; or,
a processor and a memory for storing instructions for controlling the processor to perform the creation method according to any of claims 1 to 4.
7. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the creation method of any of claims 1 to 4.
CN201910191102.9A 2019-03-12 2019-03-12 Data warehouse creation method and device, electronic equipment and readable storage medium Active CN111694810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910191102.9A CN111694810B (en) 2019-03-12 2019-03-12 Data warehouse creation method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910191102.9A CN111694810B (en) 2019-03-12 2019-03-12 Data warehouse creation method and device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN111694810A CN111694810A (en) 2020-09-22
CN111694810B true CN111694810B (en) 2024-04-05

Family

ID=72475056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910191102.9A Active CN111694810B (en) 2019-03-12 2019-03-12 Data warehouse creation method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111694810B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380852A (en) * 2020-11-12 2021-02-19 沃民高新科技(北京)股份有限公司 Public opinion data processing system
CN115858691A (en) * 2022-11-17 2023-03-28 北京白龙马云行科技有限公司 Report creation method and device, electronic equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718565A (en) * 2016-01-20 2016-06-29 北京京东尚科信息技术有限公司 Data warehouse model construction method and construction apparatus
CN108268565A (en) * 2017-01-04 2018-07-10 北京京东尚科信息技术有限公司 Method and system based on data warehouse processing user browsing behavior data
CN108520008A (en) * 2018-03-15 2018-09-11 链家网(北京)科技有限公司 The construction method and construction device of data warehouse model
CN108763278A (en) * 2018-04-11 2018-11-06 口碑(上海)信息技术有限公司 The statistical method and device of user characteristics label
CN109189764A (en) * 2018-09-20 2019-01-11 北京桃花岛信息技术有限公司 A kind of colleges and universities' data warehouse layered design method based on Hive

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9778973B2 (en) * 2015-10-28 2017-10-03 International Business Machines Corporation Early diagnosis of hardware, software or configuration problems in data warehouse system utilizing grouping of queries based on query parameters

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718565A (en) * 2016-01-20 2016-06-29 北京京东尚科信息技术有限公司 Data warehouse model construction method and construction apparatus
CN108268565A (en) * 2017-01-04 2018-07-10 北京京东尚科信息技术有限公司 Method and system based on data warehouse processing user browsing behavior data
CN108520008A (en) * 2018-03-15 2018-09-11 链家网(北京)科技有限公司 The construction method and construction device of data warehouse model
CN108763278A (en) * 2018-04-11 2018-11-06 口碑(上海)信息技术有限公司 The statistical method and device of user characteristics label
CN109189764A (en) * 2018-09-20 2019-01-11 北京桃花岛信息技术有限公司 A kind of colleges and universities' data warehouse layered design method based on Hive

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
数据仓库创建、设计与开发;谷和启;;中文信息;20030401(04);全文 *
数据仓库在税收领域的应用;许合利;王慧林;;电脑开发与应用;20100405(第04期);全文 *
许合利 ; 王慧林 ; .数据仓库在税收领域的应用.电脑开发与应用.2010,(04),全文. *

Also Published As

Publication number Publication date
CN111694810A (en) 2020-09-22

Similar Documents

Publication Publication Date Title
US10572494B2 (en) Bootstrapping the data lake and glossaries with ‘dataset joins’ metadata from existing application patterns
US20200320067A1 (en) Displaying messages relevant to system administration
US9384473B2 (en) Methods and systems for creating online unified contact and communication management (CM) platform
US10839000B2 (en) Presentations and reports built with data analytics
CN111190888A (en) Method and device for managing graph database cluster
US20150106928A1 (en) Screening of email templates in campaign management
US9378194B2 (en) Previewing email templates in marketing campaigns
CN111694810B (en) Data warehouse creation method and device, electronic equipment and readable storage medium
US20170286551A1 (en) Scalable processing of heterogeneous user-generated content
EP2839395A1 (en) Linking web extension and content contextually
US11295326B2 (en) Insights on a data platform
US10567522B2 (en) Workflow to automatically generate reports for different component-level metadata after interacting with multiple web services
US20160321229A1 (en) Technique for clipping and aggregating content items
CN113704288A (en) Data display method and device, computer readable medium and electronic equipment
US20230089164A1 (en) Aggregate query optimization
CN111027924A (en) Project management system
US9426243B2 (en) Remote contextual collaboration
CN112800354B (en) Policy issuing and intelligent pushing method, system, equipment and medium
CN114580675A (en) Operation and maintenance data processing method, device, equipment and medium based on tree structure
US20170031884A1 (en) Automated dependency management based on page components
CN113138974A (en) Database compliance detection method and device
CN113703638A (en) Data management page processing method and device, electronic equipment and storage medium
US20160004783A1 (en) Automated generation of web site entry pages
CN111178014A (en) Method and device for processing business process
US20240005245A1 (en) Techniques for communication process flow and data platform integration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant