CN111694810A - Data warehouse creation method and device, electronic equipment and readable storage medium - Google Patents

Data warehouse creation method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN111694810A
CN111694810A CN201910191102.9A CN201910191102A CN111694810A CN 111694810 A CN111694810 A CN 111694810A CN 201910191102 A CN201910191102 A CN 201910191102A CN 111694810 A CN111694810 A CN 111694810A
Authority
CN
China
Prior art keywords
data
processing
warehouse
preprocessing
data warehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910191102.9A
Other languages
Chinese (zh)
Other versions
CN111694810B (en
Inventor
康进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910191102.9A priority Critical patent/CN111694810B/en
Publication of CN111694810A publication Critical patent/CN111694810A/en
Application granted granted Critical
Publication of CN111694810B publication Critical patent/CN111694810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data warehouse creation method, a data warehouse creation device, electronic equipment and a computer readable storage medium. The creating method comprises the following steps: acquiring original data; preprocessing the original data to obtain preprocessed data; and processing the preprocessed data according to a preset label and a field model to obtain a data warehouse for query and acquisition.

Description

Data warehouse creation method and device, electronic equipment and readable storage medium
Technical Field
The present invention relates to the field of data warehouse technology, and more particularly, to a method and an apparatus for creating a data warehouse, an electronic device, and a readable storage medium.
Background
The data warehouse is a strategic set which provides all types of data support for decision making processes of all levels of enterprises. It is a single data store created for analytical reporting and decision support purposes. And providing guidance for business process improvement, monitoring time, cost, quality and control for enterprises needing business intelligence. The data warehouse can be used for screening and integrating various service data, and can be used for data analysis, data mining and data reporting.
However, in the prior art, a data warehouse cannot be created according to less raw data, so that the existing data warehouse is poor in timeliness.
Disclosure of Invention
It is an object of the present invention to provide a new technical solution for creating a data warehouse.
According to a first aspect of the present invention, there is provided a method for creating a data warehouse, including:
acquiring original data;
preprocessing the original data to obtain preprocessed data;
and processing the preprocessed data according to a preset label and a field model to obtain a data warehouse for query and acquisition.
Optionally, the raw data comprises unstructured data,
the preprocessing processing is performed on the original data to obtain preprocessed data, and the preprocessing processing comprises the following steps:
carrying out structuralization processing on the unstructured data to obtain structured data;
and cleaning the structured data to obtain the preprocessing data.
Optionally, the step of processing the preprocessed data according to a preset label and a field model to obtain a data warehouse for query and acquisition includes:
extracting a business concept corresponding to each label contained in the preprocessed data according to the domain model;
determining an incidence relation between each business concept according to the domain model;
and creating the data warehouse according to the domain model and the incidence relation for query and acquisition.
Optionally, the tag includes a theme corresponding to the data detail layer, a common index corresponding to the data summary layer, and/or a personalized index corresponding to the data application layer.
Optionally, the tag includes a theme corresponding to the data detail layer, a public index corresponding to the data summary layer, and a personalized index corresponding to the data application layer;
the processing method comprises the following steps of processing the preprocessed data according to a preset label and a field model to obtain a data warehouse for query and acquisition:
processing the preprocessing data at the data detail layer according to the theme and the field model to obtain first data;
processing the first data on the summary layer according to the public indexes and the field model to obtain second data;
and processing the second data at the application layer according to the personalized index and the domain model to obtain the data warehouse for query and acquisition.
Optionally, the processing the preprocessed data according to the preset label and the field model to obtain a data warehouse, so as to further include:
and loading the preprocessing data increment or the whole amount to the data detail layer, and executing the step of processing the preprocessing data in the data detail layer according to the theme and the field model to obtain first data.
Optionally, the creating method further includes:
acquiring a service use case;
extracting service concepts contained in the service use cases;
determining an incidence relation between each business concept according to the business case;
performing domain division on the service concept according to a preset domain label to obtain a domain division result;
and obtaining the domain model according to the domain division result and the incidence relation.
Optionally, the creating method further includes:
exposing the data warehouse in response to a query request for the data warehouse.
According to a second aspect of the present invention, there is provided a data warehouse creation apparatus, including:
the data acquisition module is used for acquiring original data;
the preprocessing module is used for preprocessing the original data to obtain preprocessed data;
and the data warehouse creating module is used for processing the preprocessing data according to a preset label and a field model to obtain a data warehouse for query and acquisition.
Optionally, the raw data comprises unstructured data,
the preprocessing module is further configured to:
carrying out structuralization processing on the unstructured data to obtain structured data;
and cleaning the structured data to obtain the preprocessing data.
Optionally, the data warehouse creating module is further configured to:
extracting a business concept corresponding to each label contained in the preprocessed data according to the domain model;
determining an incidence relation between each business concept according to the domain model;
and creating the data warehouse according to the domain model and the incidence relation for query and acquisition.
Optionally, the tag includes a theme corresponding to the data detail layer, a common index corresponding to the data summary layer, and/or a personalized index corresponding to the data application layer.
Optionally, the tag includes a theme corresponding to the data detail layer, a public index corresponding to the data summary layer, and a personalized index corresponding to the data application layer;
the data warehouse creation module is further to:
processing the preprocessing data at the data detail layer according to the theme and the field model to obtain first data;
processing the first data on the summary layer according to the public indexes and the field model to obtain second data;
and processing the second data at the application layer according to the personalized index and the domain model to obtain the data warehouse for query and acquisition.
Optionally, the creating apparatus further includes:
and the module is used for loading the preprocessing data increment or the whole preprocessing data increment to the data detail layer so that the data warehouse creation module executes the step of processing the preprocessing data at the data detail layer according to the theme and the field model to obtain first data.
Optionally, the creating apparatus further includes:
a module for acquiring a service use case;
a module for extracting the service concept contained in the service use case;
a module for determining the incidence relation between each business concept according to the business use case;
a module for performing domain division on the service concept according to a preset domain label to obtain a domain division result;
and the module is used for obtaining the domain model according to the domain division result and the incidence relation.
Optionally, the creating apparatus further includes:
means for exposing the data warehouse in response to a query request directed to the data warehouse.
According to a third aspect of the present invention, there is provided an electronic apparatus comprising:
a creating apparatus according to the second aspect of the present invention; alternatively, the first and second electrodes may be,
a processor and a memory for storing instructions for controlling the processor to perform the creation method according to the first aspect of the invention.
According to a fourth aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the creation method according to the first aspect of the present invention.
In the embodiment of the invention, preprocessing data obtained by preprocessing original data is processed through a preset label and a field model to obtain a data warehouse. In this way, by using the combination of the domain model and the data warehouse, the data warehouse can realize an hour-level report, a minute-level report, and even a real-time report. Moreover, the creation efficiency of the data warehouse can be improved, the development cost of the data warehouse is reduced, and the expansibility and the stability of the data warehouse are improved.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a block diagram of one example of a hardware configuration of an electronic device that may be used to implement an embodiment of the invention;
FIG. 2 is a block diagram of another example of a hardware configuration of an electronic device that may be used to implement an embodiment of the invention;
fig. 3 is a flowchart illustrating a method for creating a data warehouse according to a first embodiment of the present invention;
fig. 4 is a flowchart illustrating a method for creating a data warehouse according to a first embodiment of the present invention;
fig. 5 is a flowchart illustrating a method for creating a data warehouse according to a first embodiment of the present invention;
fig. 6 is a flowchart illustrating a method for creating a data warehouse according to a first embodiment of the present invention;
FIG. 7 is a schematic block diagram of an apparatus for creating a data warehouse according to an embodiment of the present invention;
FIG. 8 is a functional block diagram of an electronic device provided in accordance with a first embodiment of the present invention;
fig. 9 is a schematic diagram of a hardware structure of an electronic device according to a second embodiment of the present invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
< hardware configuration >
Fig. 1 and 2 are block diagrams of the hardware configuration of an electronic apparatus 1000 that can be used to implement the data warehouse creation method according to any embodiment of the present invention.
In one embodiment, as shown in FIG. 1, the electronic device 1000 may be a server 1100.
The server 1100 provides a service point for processes, databases, and communications facilities. The server 1100 can be a unitary server or a distributed server across multiple computers or computer data centers. The server may be of various types, such as, but not limited to, a web server, a news server, a mail server, a message server, an advertisement server, a file server, an application server, an interaction server, a database server, or a proxy server. In some embodiments, each server may include hardware, software, or embedded logic components or a combination of two or more such components for performing the appropriate functions supported or implemented by the server. For example, a server, such as a blade server, a cloud server, etc., or may be a server group consisting of a plurality of servers, which may include one or more of the above types of servers, etc.
In this embodiment, the server 1100 may include a processor 1110, a memory 1120, an interface device 1130, a communication device 1140, a display device 1150, and an input device 1160, as shown in fig. 1.
In this embodiment, the server 1100 may also include a speaker, a microphone, and the like, which are not limited herein.
The processor 1110 may be a dedicated server processor, or may be a desktop processor, a mobile version processor, or the like that meets performance requirements, and is not limited herein. The memory 1120 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1130 includes various bus interfaces such as a serial bus interface (including a USB interface), a parallel bus interface, and the like. The communication device 1140 is capable of wired or wireless communication, for example. The display device 1150 is, for example, a liquid crystal display panel, an LED display panel touch display panel, or the like. Input devices 1160 may include, for example, a touch screen, a keyboard, and the like.
In this embodiment, memory 1120 of server 1100 is configured to store instructions for controlling processor 1110 to operate at least to perform a method of data warehouse creation in accordance with any of the embodiments of the present invention. The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.
Although shown as multiple devices in fig. 1, the present invention may relate to only some of the devices, e.g., server 1100 may relate to only memory 1120 and processor 1110.
In one embodiment, the electronic device 1000 may be a terminal device 1200 such as a PC, a notebook computer, or the like used by an operator, which is not limited herein.
In this embodiment, referring to fig. 2, the terminal apparatus 1200 may include a processor 1210, a memory 1220, an interface device 1230, a communication device 1240, a display device 1250, an input device 1260, a speaker 1270, a microphone 1280, and the like.
The processor 1210 may be a mobile version processor. The memory 1220 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1230 includes, for example, a USB interface, a headphone interface, and the like. The communication device 1240 may be capable of wired or wireless communication, for example, the communication device 1240 may include a short-range communication device, such as any device that performs short-range wireless communication based on short-range wireless communication protocols, such as the Hilink protocol, WiFi (IEEE 802.11 protocol), Mesh, bluetooth, ZigBee, Thread, Z-Wave, NFC, UWB, LiFi, and the like, and the communication device 1240 may also include a long-range communication device, such as any device that performs WLAN, GPRS, 2G/3G/4G/5G long-range communication. The display device 1250 is, for example, a liquid crystal display, a touch display, or the like. The input device 1260 may include, for example, a touch screen, a keyboard, and the like. A user can input/output voice information through the speaker 1270 and the microphone 1280.
In this embodiment, memory 1220 of terminal device 1200 is used to store instructions for controlling processor 1210 to operate at least to perform a method of data warehouse creation in accordance with any of the embodiments of the present invention. The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.
Although a plurality of devices of the terminal apparatus 1200 are shown in fig. 2, the present invention may relate only to some of the devices, for example, the terminal apparatus 1200 relates only to the memory 1220 and the processor 1210 and the display device 1250.
< method >
Fig. 3 is a flow diagram illustrating a method of creating a data warehouse, which may be implemented by an electronic device, according to an embodiment of the present invention. The electronic device may be the server 1100 shown in fig. 1 or the terminal device 1200 shown in fig. 2.
As shown in fig. 3, the method for creating a data warehouse of the present embodiment may include the following steps S3100 to S3300:
in step S3100, raw data is acquired.
In one embodiment of the present invention, the original data may be obtained by an operational data store (operational data store).
Specifically, the operation data layer may extract the specified data from the data source as the original data.
Since data generated by some sources is of little value to analyze, or may be of much less value than the cost in implementation and performance of the data warehouse required to store such data, it may be that only the specified data is extracted.
The original data may be extracted from a specified client or an Open Data Processing Service (ODPS), or may be extracted from a generated log or Service data.
Step S3200, preprocessing the raw data to obtain preprocessed data.
The data acquired by step S3100 may include structured data or unstructured data. The structured data is also called row data, is logically expressed and realized by a two-dimensional table structure, strictly follows the data format and length specification, and is mainly stored and managed by a relational database. In contrast to structured data, unstructured data that is not suitable for representation by a database two-dimensional table includes office documents of all formats, XML, HTML, various types of reports, pictures and , video information, and the like.
In embodiments where the raw data comprises structured data, the step of preprocessing the structured data may comprise a data cleansing process to obtain preprocessed data. Specifically, the cleaning process may clean dirty data such as incomplete data, error data, and duplicate data in the structured data. The cleansing process in this embodiment is a process of reviewing and verifying structured data, and aims to delete duplicate information, correct existing errors, and provide data consistency.
In embodiments where the raw data comprises unstructured data, the step of preprocessing the unstructured data may comprise: carrying out structuralization processing on the unstructured data to obtain structured data; and cleaning the structured data to obtain preprocessed data. By performing the structuring process on the unstructured data, the unstructured data can be converted into structured data.
Further, the way to structure unstructured data may include, for example, transcoding (e.g., m/f- > male/female), field transformation (e.g., balance- > bal), transformation of metric units (e.g., cm- > m), and transformation of data granularity. The business system data stores very detailed data, and the data in the data warehouse is analyzed, so that the business system data can be aggregated according to the granularity of the data warehouse without needing very detailed data.
And S3300, processing the preprocessed data according to the preset label and the field model to obtain a data warehouse for query and acquisition.
The data warehouse in this embodiment may be a data report. The label can be preset according to application scenes or specific requirements.
In an embodiment of the present invention, the domain model may be obtained according to steps S4100 to S4500 shown in fig. 4:
step S4100, acquiring a service use case.
The service use case can be written in advance according to an application scene or specific requirements, and can reflect the product requirements.
Step S4200, extracting a business concept contained in the business case, wherein the business concept contains a tag.
Specifically, the service concept may be a specific field included in the service use case, or may be a concept corresponding to the specific field included in the service use case. For example, for the business use case "user clicks on advertisement into landing page", the business concepts contained therein may include "user click behavior", "advertisement" and "landing page".
Step S4300, determining the association relation between each business concept according to the business use case.
Specifically, business concepts contained in the same use case may be related to each other. For example, the business concepts extracted from the business use case may include "user click behavior", "advertisement", "landing page", "advertiser", and "placement", and then, according to the business use case "user clicks advertisement into landing page", the association between the business concepts "user click behavior", "advertisement", and "landing page" may be determined. According to the business use case of 'advertiser putting advertisement', the association among the business concepts 'advertiser', 'putting' and 'advertisement' can be determined.
Step S4400, performing domain division on the service concept according to the preset domain label, to obtain a domain division result.
The domain label can be set in advance according to an application scenario or a specific requirement. Each domain label is used for uniquely identifying the corresponding business domain. The business concepts are divided into fields according to preset field labels, and each business concept can be divided into business fields under the corresponding field labels, so that each business concept is provided with one corresponding field label, namely, each business concept is divided into the corresponding business fields to obtain field division results.
By dividing the service concepts into service domains, the coupling between the service domains can be low, and the coupling between the service concepts in the service domains can be high. Namely, the effects of low coupling between the service fields and high cohesion inside the service fields are achieved.
And step S4500, obtaining a domain model according to the domain division result and the incidence relation.
And obtaining the domain model according to the corresponding relation between the domain division result and each business concept.
In an embodiment of the present invention, the step of processing the pre-processed data according to the preset label and the domain model to obtain the data warehouse for query and acquisition may further include steps S3311 to S3313 shown in fig. 5:
step S3311, extracting the business concept corresponding to each label from the preprocessed data according to the preset label and the domain model.
Specifically, the business concept corresponding to each tag may be defined in advance through a domain model. Then, according to the domain model, the business concept corresponding to each tag included in the preprocessed data can be extracted.
Step S3312, determining the incidence relation between each business concept according to the domain model.
Since the association relationship between each business concept is predefined in the domain model, the association relationship between the business concepts corresponding to each label contained in the preprocessed data can be determined according to the domain model.
Step S3313, according to the domain model and the determined incidence relation, a data warehouse is established for query and acquisition.
Further, the tag in this embodiment may include a theme corresponding to the data detail layer, a common index corresponding to the data summary layer, and/or a personalized index corresponding to the data application layer.
The Data Detail layer (DWD layer for short) is used for storing Detail Data and dimension table Data. And the Data Warehouse Summary layer (DWS layer for short) is used for storing the public indexes. And the Data Application layer (ADS layer for short) is used for storing the personalized indexes.
For example, the common index may be income, and then financial income, statement income, and the like all belong to the common index of income. For example, the personalization index may be a unit of income, and specifically may be a meta or a point.
In an embodiment of the present invention, the tag includes a theme corresponding to the data detail layer, a public index corresponding to the data summary layer, and a personalized index corresponding to the data application layer. Then, the step of processing the pre-processed data according to the preset tag and the domain model to obtain a data warehouse for query and acquisition may include steps S3321 to S3323 shown in fig. 6:
and step S3321, processing the preprocessed data on the detail layer according to the theme and the field model of the corresponding data detail layer to obtain first data.
The first data may be a data report.
For example, there are multiple tables in the pre-processed data that are relevant to the advertiser: the a table contains company information of the advertiser, the b table contains delivery information of the advertiser, the c table contains bid information of the advertiser, and the d table contains showing, clicking and consuming information of the advertiser. The processing operation of the preprocessed data in the data detail layer according to the subject of the advertiser can be to splice the tables a, b, c and d into a detail table so as to meet the query of various advertiser information.
Specifically, according to the theme and the domain model of the detail layer corresponding to the data, the step of processing the preprocessed data at the detail layer to obtain the first data may refer to the description of the foregoing steps S3311 to S3313, and is not described herein again.
Further, the subject in the present embodiment may be the domain label in step S4400.
And S3322, processing the first data in the data summarization layer according to the public indexes and the field model of the corresponding data summarization layer to obtain second data.
The second data may be a data report.
The data summarization layer typically performs a summarization operation on the first data of the data detail layer. For example: and summarizing common public indexes of commonly used agents in the detail list corresponding to the subject of the data detail layer advertiser to obtain a light summary list of the commonly used information of the agents such as the showing, clicking, consumption and the like.
Specifically, the step of processing the first data in the data summarizing layer according to the common index and the domain model corresponding to the data summarizing layer to obtain the second data may refer to the description of the foregoing steps S3311 to S3313, and will not be described herein again.
And step S3323, processing the second data at the data application layer to obtain a data warehouse for query and acquisition according to the personalized indexes and the domain model of the corresponding data application layer.
The data warehouse may be a data report. The data application layer may be a fine-grained operation on the second data of the data summarization layer. For example, for a light summary table of a common index of "agent" under the theme of "advertiser", a personalized index of "consumption" can be summarized to obtain a data warehouse of consumption information.
Specifically, according to the personalized index and the domain model corresponding to the data application layer, the step of processing the second data at the data application layer to obtain the data warehouse may refer to the description of the foregoing steps S3311 to S3313, and will not be described herein again.
Thus, under one theme there are multiple levels of reports (data detail layer, data summary layer, data application layer). The subject is a specific field in first data of a data detail layer, the public label is a specific field in second data of data summarization, and the personalized label is a specific field in a data application layer data warehouse.
On this basis, the creating method may further include: and loading the converted data increment or full amount to a data detail layer to execute the step of processing the preprocessed data in the data detail layer according to the theme and the field model to obtain first data.
The full load may specifically be a load of all preprocessed data at once.
Incremental loading generally requires full loading for the first time, but consumes significant physical and time resources if full loading is still performed during the second or third cycle. It is possible that some data sources are unchanged and some data sources may have only a small amount of data added. It is an incremental load that only takes into account the newly modified record and the newly inserted record for the data in the data source.
In an embodiment of the present invention, the method for creating a data warehouse of the present invention may further include, after performing step S3300: in response to the query request for the data warehouse, the data warehouse created by this step S3300 is exposed.
Specifically, the application program can be presented to the user through MySQL or a specified application program. MySQL is a relational database management system.
In the embodiment of the invention, preprocessing data obtained by preprocessing original data is processed through a preset label and a field model to obtain a data warehouse. In this way, by using the combination of the domain model and the data warehouse, the data warehouse can realize an hour-level report, a minute-level report, and even a real-time report. Moreover, the creation efficiency of the data warehouse can be improved, the development cost of the data warehouse is reduced, and the expansibility and the stability of the data warehouse are improved.
< apparatus >
In this embodiment, a data warehouse creation apparatus 7000 is provided, as shown in fig. 7, and includes a data acquisition module 7100, a preprocessing module 7200, and a data warehouse creation module 7300. The data acquisition module 7100 is used for acquiring original data; the preprocessing module 7200 is configured to perform preprocessing on the original data to obtain preprocessed data; the data warehouse creating module 7300 is configured to process the preprocessed data according to the preset tag and the field model to obtain a data warehouse for query and acquisition.
In one embodiment of the invention, the raw data may include unstructured data, then the pre-processing module 7200 may be further configured to:
carrying out structuralization processing on the unstructured data to obtain structured data;
and cleaning the structured data to obtain preprocessed data.
In one embodiment of the present invention, data warehouse creation module 7300 may also be configured to:
extracting a business concept corresponding to each label contained in the preprocessed data according to the domain model;
determining an incidence relation between each business concept according to the domain model;
and creating a data warehouse according to the domain model and the incidence relation for query acquisition.
In one embodiment of the invention, the label comprises a theme corresponding to the data detail layer, a public index corresponding to the data summary layer and/or a personalized index corresponding to the data application layer.
Further, the label comprises a theme corresponding to the data detail layer, a public index corresponding to the data summary layer and a personalized index corresponding to the data application layer; then, the data warehouse creation module 7300 may be further configured to:
processing the preprocessed data at the data detail layer according to the theme and field models to obtain first data;
processing the first data at the summary layer according to the public indexes and the field model to obtain second data;
and processing the second data at the application layer according to the personalized indexes and the field model to obtain a data warehouse for query and acquisition.
In an embodiment of the present invention, the creating apparatus 7000 may further include:
and the module is used for loading the preprocessed data increment or full amount to the data detail layer so that the data warehouse creating module 7200 can execute the step of processing the preprocessed data at the data detail layer according to the theme and field models to obtain the first data.
In one embodiment of the present invention, the creating apparatus 7000 further includes:
a module for acquiring a service use case;
a module for extracting a service concept contained in the service use case;
the module is used for determining the incidence relation between each business concept according to the business use case;
the module is used for carrying out domain division on the service concept according to a preset domain label to obtain a domain division result; and the number of the first and second groups,
and the module is used for obtaining a domain model according to the domain division result and the incidence relation.
In one embodiment of the present invention, the creating apparatus 7000 further includes:
means for exposing a data warehouse in response to a query request directed to the data warehouse.
It will be clear to a person skilled in the art that the data warehouse creating means 7000 can be implemented in various ways. For example, the data warehouse creating apparatus 7000 may be implemented by an instruction configuration processor. For example, the instructions may be stored in a ROM and read from the ROM into a programmable device when starting up the apparatus to implement the data repository creating apparatus 7000. For example, the data warehouse creation apparatus 7000 may be solidified into a dedicated device (e.g., ASIC). The data warehouse creating apparatus 7000 may be divided into units independent of each other, or may be implemented by combining them together. The data warehouse creating apparatus 7000 may be implemented by one of the various implementations described above, or may be implemented by a combination of two or more of the various implementations described above.
In this embodiment, the data warehouse creating apparatus 7000 may have various implementation forms, for example, the data warehouse creating apparatus 7000 may be any functional module running in a software product or application providing the network access service, or a peripheral insert, a plug-in, a patch, etc. of the software product or application, and may also be the software product or application itself.
< electronic apparatus >
In this embodiment, an electronic device 1000 is also provided, where the electronic device 1000 may be the server 1100 shown in fig. 1, or may be the terminal device 1200 shown in fig. 2.
As shown in fig. 8, the electronic device 1000 may include a data warehouse creating apparatus 7000 according to any embodiment of the present invention, for implementing the data warehouse creating method according to any embodiment of the present invention.
In another embodiment, as shown in fig. 9, the electronic device 1000 may further comprise a processor 1300 and a memory 1400, the memory 1400 for storing executable instructions; the processor 1300 is configured to operate the electronic device 1000 according to the control of the instructions to execute the method for creating a data warehouse according to any embodiment of the present invention.
< computer-readable storage Medium >
In this embodiment, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of creating a data warehouse according to any of the embodiments of the present invention.
The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, by software, and by a combination of software and hardware are equivalent.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims (11)

1. A method for creating a data warehouse comprises the following steps:
acquiring original data;
preprocessing the original data to obtain preprocessed data;
and processing the preprocessed data according to a preset label and a field model to obtain a data warehouse for query and acquisition.
2. The creation method of claim 1, wherein the raw data comprises unstructured data,
the preprocessing processing is performed on the original data to obtain preprocessed data, and the preprocessing processing comprises the following steps:
carrying out structuralization processing on the unstructured data to obtain structured data;
and cleaning the structured data to obtain the preprocessing data.
3. The creating method according to claim 1, wherein the step of processing the preprocessed data according to the preset label and the domain model to obtain a data warehouse for query and acquisition comprises:
extracting a business concept corresponding to each label contained in the preprocessed data according to the domain model;
determining an incidence relation between each business concept according to the domain model;
and creating the data warehouse according to the domain model and the incidence relation for query and acquisition.
4. The creation method according to claim 3, wherein the label comprises a theme of the corresponding data detail layer, a common index of the corresponding data summary layer, and/or a personalized index of the corresponding data application layer.
5. The creation method according to claim 4, wherein the label comprises a theme corresponding to the data detail layer, a public index corresponding to the data summary layer, and a personalized index corresponding to the data application layer;
the processing method comprises the following steps of processing the preprocessed data according to a preset label and a field model to obtain a data warehouse for query and acquisition:
processing the preprocessing data at the data detail layer according to the theme and the field model to obtain first data;
processing the first data on the summary layer according to the public indexes and the field model to obtain second data;
and processing the second data at the application layer according to the personalized index and the domain model to obtain the data warehouse for query and acquisition.
6. The creating method according to claim 5, wherein the processing the preprocessed data according to the preset label and the domain model to obtain a data warehouse further comprises:
and loading the preprocessing data increment or the whole amount to the data detail layer, and executing the step of processing the preprocessing data in the data detail layer according to the theme and the field model to obtain first data.
7. The creation method according to claim 1, wherein the creation method further comprises:
acquiring a service use case;
extracting service concepts contained in the service use cases;
determining an incidence relation between each business concept according to the business case;
performing domain division on the service concept according to a preset domain label to obtain a domain division result;
and obtaining the domain model according to the domain division result and the incidence relation.
8. The creation method according to claim 1, wherein the creation method further comprises:
exposing the data warehouse in response to a query request for the data warehouse.
9. An apparatus for creating a data warehouse, comprising:
the data acquisition module is used for acquiring original data;
the preprocessing module is used for preprocessing the original data to obtain preprocessed data;
and the data warehouse creating module is used for processing the preprocessing data according to a preset label and a field model to obtain a data warehouse for query and acquisition.
10. An electronic device, comprising:
the creation means according to claim 9; alternatively, the first and second electrodes may be,
a processor and a memory for storing instructions for controlling the processor to perform the creation method of any of claims 1 to 8.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the creation method according to any one of claims 1 to 8.
CN201910191102.9A 2019-03-12 2019-03-12 Data warehouse creation method and device, electronic equipment and readable storage medium Active CN111694810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910191102.9A CN111694810B (en) 2019-03-12 2019-03-12 Data warehouse creation method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910191102.9A CN111694810B (en) 2019-03-12 2019-03-12 Data warehouse creation method and device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN111694810A true CN111694810A (en) 2020-09-22
CN111694810B CN111694810B (en) 2024-04-05

Family

ID=72475056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910191102.9A Active CN111694810B (en) 2019-03-12 2019-03-12 Data warehouse creation method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111694810B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380852A (en) * 2020-11-12 2021-02-19 沃民高新科技(北京)股份有限公司 Public opinion data processing system
CN115858691A (en) * 2022-11-17 2023-03-28 北京白龙马云行科技有限公司 Report creation method and device, electronic equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718565A (en) * 2016-01-20 2016-06-29 北京京东尚科信息技术有限公司 Data warehouse model construction method and construction apparatus
US20170123871A1 (en) * 2015-10-28 2017-05-04 International Business Machines Corporation Early diagnosis of hardware, software or configuration problems in data warehouse system utilizing grouping of queries based on query parameters
CN108268565A (en) * 2017-01-04 2018-07-10 北京京东尚科信息技术有限公司 Method and system based on data warehouse processing user browsing behavior data
CN108520008A (en) * 2018-03-15 2018-09-11 链家网(北京)科技有限公司 The construction method and construction device of data warehouse model
CN108763278A (en) * 2018-04-11 2018-11-06 口碑(上海)信息技术有限公司 The statistical method and device of user characteristics label
CN109189764A (en) * 2018-09-20 2019-01-11 北京桃花岛信息技术有限公司 A kind of colleges and universities' data warehouse layered design method based on Hive

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170123871A1 (en) * 2015-10-28 2017-05-04 International Business Machines Corporation Early diagnosis of hardware, software or configuration problems in data warehouse system utilizing grouping of queries based on query parameters
CN105718565A (en) * 2016-01-20 2016-06-29 北京京东尚科信息技术有限公司 Data warehouse model construction method and construction apparatus
CN108268565A (en) * 2017-01-04 2018-07-10 北京京东尚科信息技术有限公司 Method and system based on data warehouse processing user browsing behavior data
CN108520008A (en) * 2018-03-15 2018-09-11 链家网(北京)科技有限公司 The construction method and construction device of data warehouse model
CN108763278A (en) * 2018-04-11 2018-11-06 口碑(上海)信息技术有限公司 The statistical method and device of user characteristics label
CN109189764A (en) * 2018-09-20 2019-01-11 北京桃花岛信息技术有限公司 A kind of colleges and universities' data warehouse layered design method based on Hive

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
许合利;王慧林;: "数据仓库在税收领域的应用" *
许合利;王慧林;: "数据仓库在税收领域的应用", 电脑开发与应用, no. 04, 5 April 2010 (2010-04-05) *
谷和启;: "数据仓库创建、设计与开发", 中文信息, no. 04, 1 April 2003 (2003-04-01) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380852A (en) * 2020-11-12 2021-02-19 沃民高新科技(北京)股份有限公司 Public opinion data processing system
CN115858691A (en) * 2022-11-17 2023-03-28 北京白龙马云行科技有限公司 Report creation method and device, electronic equipment and medium

Also Published As

Publication number Publication date
CN111694810B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
CN111190888A (en) Method and device for managing graph database cluster
CN113987074A (en) Distributed service full-link monitoring method and device, electronic equipment and storage medium
US11411871B2 (en) Augmenting functionality in distributed systems with payload headers
CN108108986B (en) Design method and device of customer relationship management system and electronic equipment
CN112800354B (en) Policy issuing and intelligent pushing method, system, equipment and medium
US20200186619A1 (en) Extraction and Distribution of Content Packages in a Digital Services Framework
US20220060552A1 (en) Variable content generation and engagement tracking
CN113485781A (en) Report generation method and device, electronic equipment and computer readable medium
CN111694810B (en) Data warehouse creation method and device, electronic equipment and readable storage medium
US10567522B2 (en) Workflow to automatically generate reports for different component-level metadata after interacting with multiple web services
CN112948726A (en) Method, device and system for processing abnormal information
CN114528269A (en) Method, electronic device and computer program product for processing data
US20210124752A1 (en) System for Data Collection, Aggregation, Storage, Verification and Analytics with User Interface
CN104217284A (en) An ecological balance system for productivity
US11893027B2 (en) Aggregate query optimization
US20170149724A1 (en) Automatic generation of social media messages regarding a presentation
CN110674426A (en) Webpage behavior reporting method and device
CN110554892A (en) Information acquisition method and device
CN114580675A (en) Operation and maintenance data processing method, device, equipment and medium based on tree structure
CN111010449B (en) Image information output method, system, device, medium, and electronic apparatus
CN113656041A (en) Data processing method, device, equipment and storage medium
US11394626B2 (en) Digital services framework
CN108153834B (en) Method and device for querying data by commercial intelligent application and electronic equipment
US10861032B2 (en) Data validation and digestible content object generation
CN113138974A (en) Database compliance detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant