CN110297818B - Method and device for constructing data warehouse - Google Patents

Method and device for constructing data warehouse Download PDF

Info

Publication number
CN110297818B
CN110297818B CN201910563806.4A CN201910563806A CN110297818B CN 110297818 B CN110297818 B CN 110297818B CN 201910563806 A CN201910563806 A CN 201910563806A CN 110297818 B CN110297818 B CN 110297818B
Authority
CN
China
Prior art keywords
theme
specified
data
attribute
subject
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910563806.4A
Other languages
Chinese (zh)
Other versions
CN110297818A (en
Inventor
王超群
林必红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN201910563806.4A priority Critical patent/CN110297818B/en
Publication of CN110297818A publication Critical patent/CN110297818A/en
Application granted granted Critical
Publication of CN110297818B publication Critical patent/CN110297818B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The present disclosure provides a method of building a data warehouse comprising one or more topic libraries, the method comprising: setting a theme priority configuration table, wherein the theme priority configuration table is used for configuring the priority of each specified theme attribute in each specified data source; according to the priority of each specified theme attribute in each specified data source, determining theme data corresponding to each specified theme attribute and the source tracing of the theme data to obtain a theme source tracing table; and generating a theme table for representing the theme base according to the theme tracing table. Therefore, the method and the system realize longitudinal extension, transverse extension and traceability of the data warehouse and also improve the reliability of constructing the data warehouse.

Description

Method and device for constructing data warehouse
Technical Field
The present disclosure relates to the field of computer communications technologies, and in particular, to a method and an apparatus for building a data warehouse.
Background
The Data Warehouse, known in english under the name Data Warehouse, may be abbreviated as DW or DWH. The data warehouse is a strategic set which provides all types of data support for decision making processes of all levels of enterprises.
In the related art, a data warehouse is subject-oriented, and data in the data warehouse is organized according to a certain subject domain. Where subject matter herein refers to key aspects of interest when a user makes a decision using a data repository. Such as: in a data warehouse of a public security system, data can be divided into: people, places, things, cases, etc. For the construction of a human subject library, the following steps can be divided: (1) firstly, establishing a wide table of people according to the characteristics of people and the business knowledge of public security; (2) sorting and extracting fields required in the wide table from the existing service data; (3) when multiple service tables have the same field, the data in the table with the highest reliability is selected.
However, in the above-mentioned subject-oriented data warehouse construction, data in a plurality of tables need to be compared, each field in the data table needs to be distinguished, if a plurality of tables have fields with the same meaning, not only priority needs to be distinguished, but also validity of each piece of data needs to be distinguished, and the implementation process is very complex, and meanwhile, longitudinal extension, lateral extension and traceability of the data warehouse are not facilitated.
Disclosure of Invention
To overcome the problems in the related art, the present disclosure provides a method and apparatus for constructing a data warehouse.
According to a first aspect of embodiments of the present disclosure, there is provided a method of building a data warehouse comprising one or more topic libraries, the method comprising:
setting a theme priority configuration table, wherein the theme priority configuration table is used for configuring the priority of each specified theme attribute in each specified data source;
according to the priority of each specified theme attribute in each specified data source, determining theme data corresponding to each specified theme attribute and the source tracing of the theme data to obtain a theme source tracing table;
and generating a theme table for representing the theme base according to the theme tracing table. According to a second aspect of embodiments of the present disclosure, there is provided an apparatus for building a data warehouse comprising one or more subject libraries, the apparatus comprising:
the setting module is configured to set a theme priority configuration table, and the theme priority configuration table is used for configuring the priority of each specified theme attribute in each specified data source;
the determining module is configured to determine the subject data corresponding to each specified subject attribute and the source tracing of the subject data according to the priority of each specified subject attribute in each specified data source, so as to obtain a subject source tracing table;
and the generating module is configured to generate a theme table for representing the theme base according to the theme tracing table.
According to a third aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon a computer program, wherein the program is configured to implement the method for building a data warehouse provided in the first aspect when executed by a processor.
According to a fourth aspect of embodiments of the present disclosure, there is provided an apparatus for building a data warehouse comprising one or more subject libraries, the apparatus comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
setting a theme priority configuration table, wherein the theme priority configuration table is used for configuring the priority of each specified theme attribute in each specified data source;
according to the priority of each specified theme attribute in each specified data source, determining theme data corresponding to each specified theme attribute and the source tracing of the theme data to obtain a theme source tracing table;
and generating a theme table for representing the theme base according to the theme tracing table.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
the method can be realized by setting a theme priority configuration table, wherein the theme priority configuration table is used for configuring the priority of each specified theme attribute in each specified data source; according to the priority of each specified theme attribute in each specified data source, determining theme data corresponding to each specified theme attribute and the source tracing of the theme data to obtain a theme source tracing table; and generating a theme table for representing the theme base according to the theme tracing table, so that longitudinal expansion, transverse expansion and tracing of the data warehouse are facilitated, and the reliability of constructing the data warehouse is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
FIG. 1 is a flow diagram illustrating a method of building a data warehouse in accordance with an exemplary embodiment;
FIG. 2 is a flow diagram illustrating another method of building a data warehouse in accordance with an exemplary embodiment;
FIG. 3 is a flow diagram illustrating another method of building a data warehouse in accordance with an exemplary embodiment;
FIG. 4 is a block diagram illustrating an apparatus for building a data warehouse, in accordance with an exemplary embodiment;
FIG. 5 is a block diagram illustrating another apparatus for building a data warehouse, in accordance with an exemplary embodiment;
FIG. 6 is a block diagram illustrating another apparatus for building a data warehouse, in accordance with an exemplary embodiment;
FIG. 7 is a block diagram illustrating an apparatus for building a data warehouse, in accordance with an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
FIG. 1 is a flow chart illustrating a method of building a data warehouse, which may include one or more topic libraries, according to an exemplary embodiment of the present disclosure. Wherein, each theme library can be oriented to different themes, such as: people, places, things, cases, etc. As shown in FIG. 1, the method of building a data warehouse may include the following steps 110-:
in step 110, a topic priority configuration table is set, wherein the topic priority configuration table is used for configuring the priority of each specified topic attribute in each specified data source.
In the embodiment of the present disclosure, when constructing a data warehouse, instead of directly generating a topic table according to each specified data source, in order to facilitate longitudinal extension and tracing, an intermediate table entry is first set, for example: and obtaining a theme priority configuration table and a theme traceability table through the intermediate table entries.
In an embodiment, the specified topic attribute in step 110 may be a topic attribute extracted from each specified data source and used for describing the topic library; the specified data source may be a data source specified for building the topic library.
In an embodiment, the topic priority configuration table in step 110 may include a first type field for describing the specified data source, a second type field for describing the specified topic attribute, and a third type field for describing the priority of the specified topic attribute in each of the specified data sources.
In an embodiment, the theme priority configuration table in step 110 further includes a reserved field, where the reserved field is a field of a reserved data source and/or a reserved theme attribute for subsequent expansion.
Such as: taking the person topic as an example, each specified data source comprises: the cases shown in table 1 and the permanent mouth table shown in table 2 are shown below.
TABLE 1
Suspect ID Standing address 1 Native place 1 Height 1 Body weight 1
1000001 A2 Zhejiang river 1.70 68
1000002 B2 Guangdong (Chinese character of Guangdong) 1.72 60
1000003 C2 Henan province 1.80 77
1000004 D2 Shanghai province 1.55 44
TABLE 2
Person ID Standing address 2 Native place 2
1000001 A3 Zhejiang river
1000002 (Hubei)
1000003 C3 Hunan province
1000005 Yunnan province
Each specified subject attribute includes: address, native place, height and weight. And, the theme priority configuration table is set as shown in table 3 below.
TABLE 3
Figure BDA0002108991400000051
Figure BDA0002108991400000061
The person ID (Identity) in table 3 is a key field, which indicates that the same field is used in the source data table to integrate data from multiple data sources. In addition, the fields in table 3 except the key field have a corresponding priority, for example: for permanent address 1 of table 1, its priority is 90; for permanent address 2 of table 2, the priority is 95, and the larger the priority number, the higher the priority.
In the above table 3, when the theme priority configuration table is set, the minimum span between the configured priorities is 5, because it is easy to insert a new priority, that is, to make a reservation of the priority in advance, in consideration of the fact that a data source with a similar priority comes in. Of course, the minimum span may also be a value greater than 5, such as: 10. 100, etc.
In the above table 3, if the specified data source is expanded, at this time, only a new specified data source needs to be added in the table 3, the key value is filled, and the corresponding value field and priority are filled, so that the data source is expanded very easily, and only the configuration table needs to be modified, thereby facilitating the implementation of the horizontal expansion of the subject table.
In step 120, according to the priority of each specified theme attribute in each specified data source, the theme data corresponding to each specified theme attribute and the source tracing of the theme data are determined, and a theme source tracing table is obtained.
In the embodiment of the present disclosure, when determining the theme data corresponding to each of the specified theme attributes, the source data having a higher priority and being effective is generally selected.
In an embodiment, the topic traceability table in step 120 may include a fourth field for describing topic data corresponding to each specified topic attribute, and a fifth field for describing traceability of the topic data.
In step 130, a topic table for characterizing the topic library is generated according to the topic traceability table.
In the embodiment of the present disclosure, since the topic table may not include the source of the topic data corresponding to each specified topic attribute, when the topic table for characterizing the topic library is generated according to the topic source table, the source of the topic data corresponding to each specified topic attribute may be directly removed.
As can be seen from the above embodiments, by setting a theme priority configuration table, the theme priority configuration table is used to configure the priority of each specified theme attribute in each specified data source; according to the priority of each specified theme attribute in each specified data source, determining theme data corresponding to each specified theme attribute and the source tracing of the theme data to obtain a theme source tracing table; and generating a theme table for representing the theme base according to the theme tracing table, so that longitudinal expansion, transverse expansion and tracing of the data warehouse are facilitated, and the reliability of constructing the data warehouse is improved.
FIG. 2 is a flowchart illustrating a method for building a data warehouse according to an exemplary embodiment of the present disclosure, which may be used on the basis of the method illustrated in FIG. 1, and when performing step 120, as illustrated in FIG. 2, may include the following steps 210 and 230:
in step 210, for any specified subject attribute, according to the priority of the specified subject attribute in each specified data source, the specified data source corresponding to the highest priority is selected.
In step 220, when the source data corresponding to the specified subject attribute is valid data in the specified data source corresponding to the highest priority, determining the source data corresponding to the specified subject attribute as the subject data corresponding to the specified subject attribute, and determining the specified data source corresponding to the highest priority as the source tracing of the subject data, thereby obtaining a subject tracing table.
In step 230, when the source data corresponding to the specified subject attribute is invalid data in the specified data source corresponding to the highest priority, the specified data source corresponding to the next highest priority is selected from the priorities of the specified data sources according to the specified subject attribute, and until the source data corresponding to the specified subject attribute is found to be valid data, the corresponding subject tracing table is determined.
In an embodiment, the topic traceability table in the above step 220 and step 230 may include a fourth field for describing topic data corresponding to each specified topic attribute, and a fifth field for describing traceability of the topic data.
Such as: taking the person topic as an example, each specified data source comprises: the case table shown in table 1 and the permanent mouth table shown in table 2. Each specified subject attribute includes: address, native place, height and weight. And, the theme priority configuration table is set as shown in the above table 3. And, the obtained topic traceability table is shown in the following table 4.
TABLE 4
Figure BDA0002108991400000081
The process for obtaining table 4 is specifically:
(1) in table 3, data having the same person ID, for example, a person having a person ID of 1000001 in table 1 and table 2, is taken from the specified data sources according to the specified data sources and the key fields;
(2) in table 3, the priority of the field for the standing address 1 of table 1 is 90, and the priority of the field for the standing address 2 of table 2 is 95;
(3) the permanent address of the personnel ID of 1000001 in the tables 1 and 2 is taken out, the high-priority and valid data is taken out according to the priority comparison and filled in the corresponding permanent address field (1000001. permanent address is A3) of the table 4, and the permanent address source of the table 4 is written with 'table 2', which indicates that the field comes from the table 2;
(4) similarly, for the permanent address with the human ID of 1000002, since the data in table 2 is null and invalid (i.e. the data in table 2 is invalid data), the data in table 1 (i.e. the data in table 1 is valid data) is fetched to fill the corresponding permanent address field (1000002. permanent address is B2) in table 4, and at the same time, "table 1" is written in the permanent address source in table 4, indicating that this field comes from table 1;
(5) and so on to write all data.
As can be seen from the above embodiments, for any specified theme attribute, a specified data source corresponding to the highest priority may be selected according to the priority of the specified theme attribute in each specified data source; in the designated data source corresponding to the highest priority, determining the effective source data corresponding to the designated theme attribute as the theme data corresponding to the designated theme attribute; and determining the designated data source corresponding to the highest priority as the source tracing of the theme data to obtain a theme source tracing table, so that the generation efficiency of the theme source tracing table is improved, and the practicability of the generated theme source tracing table is also improved.
FIG. 3 is a flowchart illustrating a method for building a data warehouse according to an exemplary embodiment of the present disclosure, which may be used on the basis of the method shown in FIG. 1, and when the step 130 is executed, as shown in FIG. 3, the method may include the following steps 310 and 330:
in step 310, the source of the subject data included in the subject source table is deleted, and a subject temporary table is obtained.
In step 320, a mapping table from the temporary table of topics to the table of topics is set, and the mapping table includes a mapping relationship between fields of the temporary table of topics and fields of the table of topics.
In an embodiment, the first field data included in the field of the temporary table of topics in the step 320 is each specified topic attribute included in the temporary table of topics; second field data included in the fields of the theme table are all appointed theme attributes included in the theme table, and the mapping relationship comprises a first mapping relationship between all appointed theme attributes included in the temporary theme table and all appointed theme attributes included in the theme table;
the third field data included in the field of the temporary theme table is each reserved theme attribute included in the temporary theme table; fourth field data included in the fields of the theme table are each reserved theme attribute included in the theme table; the mapping relationship comprises a second mapping relationship between each reserved subject attribute included in the subject temporary table and each reserved subject attribute included in the subject table.
In step 330, a topic table corresponding to the temporary topic table is determined according to the mapping table.
Such as: taking the person topic as an example, each specified data source comprises: the case table shown in table 1 and the permanent mouth table shown in table 2. Each specified subject attribute includes: address, native place, height and weight. And, the theme priority configuration table is set as shown in the above table 3. And the obtained topic traceability table is shown in the above table 4. Also, a temporary list of topics is shown in the following table 5; a mapping table from the temporary table of topics to the table of topics, as shown in table 6 below; the person topic table is shown in table 7 below.
TABLE 5
Figure BDA0002108991400000101
TABLE 6
Figure BDA0002108991400000102
TABLE 7
Figure BDA0002108991400000103
Both table 4 and table 5 have reserved fields, table 6 shows the mapping relationship between the reserved fields and the real fields, and table 8 can be obtained by mapping table 5 and table 6. In addition, in order to realize the horizontal extension, the mapping table is modified, and the configuration table is started; that is, the corresponding reserved field is reserved in table 7 and enabled in table 5.
As can be seen from the above embodiments, the temporary topic table is obtained by deleting the traceablility of the topic data included in the topic traceablility table; setting a mapping table from the temporary theme table to the theme table, wherein the mapping table comprises a mapping relation between fields of the temporary theme table and fields of the theme table; and determining the theme table corresponding to the temporary theme table according to the mapping table, thereby improving the accuracy of the generated theme table.
Corresponding to the embodiment of the method for building the data warehouse, the disclosure also provides an embodiment of a device for building the data warehouse.
As shown in fig. 4, fig. 4 is a block diagram of an apparatus for building a data warehouse according to an exemplary embodiment and for performing the method of building a data warehouse shown in fig. 1, which may include one or more subject libraries. Wherein, each theme library can be oriented to different themes, such as: people, places, things, cases, etc. As shown in fig. 4, the means for building a data warehouse may include:
a setting module 41 configured to set a theme priority configuration table, where the theme priority configuration table is used to configure the priority of each specified theme attribute in each specified data source;
a determining module 42, configured to determine, according to the priority of each specified subject attribute in each specified data source, subject data corresponding to each specified subject attribute and the source tracing of the subject data, so as to obtain a subject source tracing table;
a generating module 43 configured to generate a topic table for characterizing the topic library according to the topic traceability table.
In an embodiment, based on the apparatus shown in fig. 4, the specified topic attribute is a topic attribute extracted from each of the specified data sources and used for describing the topic library; the specified data source is a data source specified for constructing the topic library.
In an embodiment, based on the above-mentioned apparatus, the theme priority configuration table includes a first field for describing the specified data source, a second field for describing the specified theme property, and a third field for describing the priority of the specified theme property in each specified data source.
In an embodiment, based on the above-mentioned apparatus, the theme priority configuration table further includes a reserved field, where the reserved field is a field used for a reserved data source and/or reserved theme attribute of a subsequent extension.
As can be seen from the above embodiments, by setting a theme priority configuration table, the theme priority configuration table is used to configure the priority of each specified theme attribute in each specified data source; according to the priority of each specified theme attribute in each specified data source, determining theme data corresponding to each specified theme attribute and the source tracing of the theme data to obtain a theme source tracing table; and generating a theme table for representing the theme base according to the theme tracing table, so that longitudinal expansion, transverse expansion and tracing of the data warehouse are facilitated, and the reliability of constructing the data warehouse is improved.
In an embodiment, based on the apparatus shown in fig. 4, as shown in fig. 5, the determining module 42 may include:
the selecting submodule 51 is configured to select, for any one of the specified theme attributes, the specified data source corresponding to the highest priority according to the priority of the specified theme attribute in each specified data source;
a first determining submodule 52, configured to determine, when the source data corresponding to the specified subject attribute is valid data in the specified data source corresponding to the highest priority, the source data corresponding to the specified subject attribute as the subject data corresponding to the specified subject attribute, and determine the specified data source corresponding to the highest priority as the source tracing of the subject data, so as to obtain the subject tracing table;
the second determining submodule 53 is configured to, when the source data corresponding to the specified subject attribute in the specified data source corresponding to the highest priority is invalid data, select the specified data source corresponding to the next highest priority from the priorities of the specified data sources according to the specified subject attribute until the source data corresponding to the specified subject attribute is found to be valid data, and determine the corresponding subject traceability table.
In an embodiment, based on the apparatus shown in fig. 4 or fig. 5, the topic tracing table includes a fourth field for describing topic data corresponding to each specified topic attribute, and a fifth field for describing the tracing of the topic data.
As can be seen from the above embodiments, for any specified theme attribute, a specified data source corresponding to the highest priority may be selected according to the priority of the specified theme attribute in each specified data source; in the designated data source corresponding to the highest priority, determining the effective source data corresponding to the designated theme attribute as the theme data corresponding to the designated theme attribute; and determining the designated data source corresponding to the highest priority as the source tracing of the theme data to obtain a theme source tracing table, so that the generation efficiency of the theme source tracing table is improved, and the practicability of the generated theme source tracing table is also improved.
In an embodiment, based on the apparatus shown in fig. 4, as shown in fig. 6, the generating module 43 may include:
a deleting submodule 61 configured to delete the tracing of the topic data included in the topic tracing table to obtain a topic temporary table;
the setting submodule 62 is configured to set a mapping table from the temporary theme table to the theme table, and the mapping table includes mapping relations between fields of the temporary theme table and fields of the theme table;
a third determining submodule 63 configured to determine the topic table corresponding to the topic temporary table according to the mapping table.
In an embodiment, based on the apparatus shown in fig. 6, the first field data included in the field of the temporary table of topics is each specified topic attribute included in the temporary table of topics; second field data included in the fields of the theme table are all appointed theme attributes included in the theme table, and the mapping relationship comprises a first mapping relationship between all appointed theme attributes included in the temporary theme table and all appointed theme attributes included in the theme table;
the third field data included in the field of the temporary theme table is each reserved theme attribute included in the temporary theme table; fourth field data included in the fields of the theme table are each reserved theme attribute included in the theme table; the mapping relationship comprises a second mapping relationship between each reserved subject attribute included in the subject temporary table and each reserved subject attribute included in the subject table.
As can be seen from the above embodiments, the temporary topic table is obtained by deleting the traceablility of the topic data included in the topic traceablility table; setting a mapping table from the temporary theme table to the theme table, wherein the mapping table comprises a mapping relation between fields of the temporary theme table and fields of the theme table; and determining the theme table corresponding to the temporary theme table according to the mapping table, thereby improving the accuracy of the generated theme table.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the disclosed solution. One of ordinary skill in the art can understand and implement it without inventive effort.
The present disclosure also provides a non-transitory computer readable storage medium having stored thereon a computer program for execution by a processor of a method of building a data warehouse as shown in any of fig. 2-4.
The present disclosure also provides an apparatus for constructing a data warehouse comprising one or more topic libraries, the apparatus comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
setting a theme priority configuration table, wherein the theme priority configuration table is used for configuring the priority of each specified theme attribute in each specified data source;
according to the priority of each specified theme attribute in each specified data source, determining theme data corresponding to each specified theme attribute and the source tracing of the theme data to obtain a theme source tracing table;
and generating a theme table for representing the theme base according to the theme tracing table.
As shown in FIG. 7, FIG. 7 is a block diagram of an apparatus 700 for building a data warehouse, according to an exemplary embodiment. Referring to fig. 7, apparatus 700 includes a processing component 722 that further includes one or more processors, and memory resources, represented by 716, for storing instructions, such as applications, that are executable by processing component 722. The application stored in 716 may include one or more modules that each correspond to a set of instructions. Further, processing component 722 is configured to execute instructions to perform a method of building a data warehouse as described in any of fig. 2-4.
The apparatus 700 may also include a power component 726 configured to perform power management of the apparatus 700, a wired or wireless network interface 750 configured to connect the apparatus 700 to a network, and an input output (I/O) interface 758. The apparatus 700 may operate based on an operating system stored in the memory 716, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (14)

1. A method of building a data warehouse, the data warehouse comprising one or more subject libraries, the method comprising:
setting a theme priority configuration table, wherein the theme priority configuration table is used for configuring the priority of each specified theme attribute in each specified data source, the theme priority configuration table comprises reserved fields, and the reserved fields are fields used for subsequently expanded reserved data sources and/or reserved theme attributes;
according to the priority of each specified theme attribute in each specified data source, determining theme data corresponding to each specified theme attribute and the source tracing of the theme data to obtain a theme source tracing table;
generating a theme table for representing the theme base according to the theme tracing table;
the generating a topic table for characterizing the topic library according to the topic traceability table includes:
deleting the tracing of the theme data included in the theme tracing table to obtain a theme temporary table;
setting a mapping table from a temporary theme table to a theme table, wherein the mapping table comprises a mapping relation between fields of the temporary theme table and fields of the theme table;
and determining the theme table corresponding to the temporary theme table according to the mapping table.
2. The method according to claim 1, wherein the specified topic attribute is a topic attribute extracted from each of the specified data sources to describe the topic library; the specified data source is a data source specified for constructing the topic library.
3. The method according to claim 1 or 2, wherein the topic priority configuration table comprises a first type field for describing the specified data source, a second type field for describing the specified topic attribute, and a third type field for describing the priority of the specified topic attribute in each specified data source, wherein the priority is described in a numerical form and there is a preset span.
4. The method according to claim 1, wherein the determining, according to the priority of each specified subject attribute in each specified data source, the subject data corresponding to each specified subject attribute and the tracing of the subject data to obtain a subject tracing table includes:
aiming at any one of the specified theme attributes, selecting the specified data source corresponding to the highest priority according to the priority of the specified theme attribute in each specified data source;
when the source data corresponding to the specified subject attribute is valid data in the specified data source corresponding to the highest priority, determining the source data corresponding to the specified subject attribute as the subject data corresponding to the specified subject attribute, and determining the specified data source corresponding to the highest priority as the tracing source of the subject data to obtain the subject tracing table;
and when the source data corresponding to the specified subject attribute in the specified data source corresponding to the highest priority is invalid data, selecting the specified data source corresponding to the next highest priority from the priorities of the specified data sources according to the specified subject attribute, and determining the corresponding subject traceability table until the source data corresponding to the specified subject attribute is found to be valid data.
5. The method according to claim 1 or 4, wherein the topic traceability table includes a fourth field for describing topic data corresponding to each specified topic attribute and a fifth field for describing traceability of the topic data.
6. The method of claim 1, wherein the first field data included in the subject temporary table field is a respective specified subject attribute included in the subject temporary table; second field data included in the fields of the theme table are all appointed theme attributes included in the theme table, and the mapping relationship comprises a first mapping relationship between all appointed theme attributes included in the temporary theme table and all appointed theme attributes included in the theme table;
the third field data included in the field of the temporary theme table is each reserved theme attribute included in the temporary theme table; fourth field data included in the fields of the theme table are each reserved theme attribute included in the theme table; the mapping relationship comprises a second mapping relationship between each reserved subject attribute included in the subject temporary table and each reserved subject attribute included in the subject table.
7. An apparatus for building a data warehouse, the data warehouse comprising one or more subject libraries, the apparatus comprising:
the setting module is configured to set a theme priority configuration table, the theme priority configuration table is used for configuring the priority of each specified theme attribute in each specified data source, the theme priority configuration table comprises reserved fields, and the reserved fields are fields used for reserved data sources and/or reserved theme attributes of subsequent expansion;
the determining module is configured to determine the subject data corresponding to each specified subject attribute and the source tracing of the subject data according to the priority of each specified subject attribute in each specified data source, so as to obtain a subject source tracing table;
the generating module is configured to generate a theme table for representing the theme base according to the theme tracing table;
the generation module comprises:
the deleting submodule is configured to delete the tracing of the theme data included in the theme tracing table to obtain a theme temporary table;
the setting submodule is configured to set a mapping table from the temporary theme table to the theme table, and the mapping table comprises a mapping relation between fields of the temporary theme table and fields of the theme table;
and the third determining submodule is configured to determine the theme table corresponding to the temporary theme table according to the mapping table.
8. The apparatus according to claim 7, wherein the specified topic attribute is a topic attribute extracted from each of the specified data sources to describe the topic library; the specified data source is a data source specified for constructing the topic library.
9. The apparatus according to claim 7 or 8, wherein the topic priority configuration table includes a first type field for describing the specified data source, a second type field for describing the specified topic attribute, and a third type field for describing the priority of the specified topic attribute in each of the specified data sources, wherein the priority is described in a numerical form and there is a preset span.
10. The apparatus of claim 7, wherein the determining module comprises:
the selection submodule is configured to select the specified data source corresponding to the highest priority according to the priority of the specified theme attribute in each specified data source aiming at any specified theme attribute;
a first determining submodule configured to determine, when the source data corresponding to the specified subject attribute is valid data in the specified data source corresponding to the highest priority, the source data corresponding to the specified subject attribute as subject data corresponding to the specified subject attribute, and determine the specified data source corresponding to the highest priority as a source tracing of the subject data, so as to obtain the subject tracing table;
and the second determining submodule is configured to select the specified data source corresponding to the next highest priority from the priorities of the specified data sources according to the specified subject attribute when the specified data source corresponding to the highest priority and the source data corresponding to the specified subject attribute are invalid data, and determine the corresponding subject traceability table until the source data corresponding to the specified subject attribute is inquired to be valid data.
11. The apparatus according to claim 7 or 10, wherein the topic traceability table includes a fourth field for describing topic data corresponding to each specified topic attribute and a fifth field for describing traceability of the topic data.
12. The apparatus of claim 7, wherein the first field data included in the subject temporary table field is a respective specified subject attribute included in the subject temporary table; second field data included in the fields of the theme table are all appointed theme attributes included in the theme table, and the mapping relationship comprises a first mapping relationship between all appointed theme attributes included in the temporary theme table and all appointed theme attributes included in the theme table;
the third field data included in the field of the temporary theme table is each reserved theme attribute included in the temporary theme table; fourth field data included in the fields of the theme table are each reserved theme attribute included in the theme table; the mapping relationship comprises a second mapping relationship between each reserved subject attribute included in the subject temporary table and each reserved subject attribute included in the subject table.
13. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the steps of the method of any of claims 1 to 6.
14. An apparatus for building a data warehouse, the data warehouse comprising one or more subject libraries, the apparatus comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
setting a theme priority configuration table, wherein the theme priority configuration table is used for configuring the priority of each specified theme attribute in each specified data source, the theme priority configuration table comprises reserved fields, and the reserved fields are fields used for subsequently expanded reserved data sources and/or reserved theme attributes;
according to the priority of each specified theme attribute in each specified data source, determining theme data corresponding to each specified theme attribute and the source tracing of the theme data to obtain a theme source tracing table;
generating a theme table for representing the theme base according to the theme tracing table;
the generating a topic table for characterizing the topic library according to the topic traceability table includes:
deleting the tracing of the theme data included in the theme tracing table to obtain a theme temporary table;
setting a mapping table from a temporary theme table to a theme table, wherein the mapping table comprises a mapping relation between fields of the temporary theme table and fields of the theme table;
and determining the theme table corresponding to the temporary theme table according to the mapping table.
CN201910563806.4A 2019-06-26 2019-06-26 Method and device for constructing data warehouse Active CN110297818B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910563806.4A CN110297818B (en) 2019-06-26 2019-06-26 Method and device for constructing data warehouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910563806.4A CN110297818B (en) 2019-06-26 2019-06-26 Method and device for constructing data warehouse

Publications (2)

Publication Number Publication Date
CN110297818A CN110297818A (en) 2019-10-01
CN110297818B true CN110297818B (en) 2022-03-01

Family

ID=68029128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910563806.4A Active CN110297818B (en) 2019-06-26 2019-06-26 Method and device for constructing data warehouse

Country Status (1)

Country Link
CN (1) CN110297818B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143463B (en) * 2020-01-06 2023-07-04 中国工商银行股份有限公司 Construction method and device of bank data warehouse based on topic model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1975772A (en) * 2006-12-22 2007-06-06 中国建设银行股份有限公司 Method and device for integrating information in multi-system
CN103853820A (en) * 2014-02-20 2014-06-11 北京用友政务软件有限公司 Data processing method and data processing system
CN106294521A (en) * 2015-06-12 2017-01-04 交通银行股份有限公司 Date storage method and data warehouse
CN107657049A (en) * 2017-09-30 2018-02-02 深圳市华傲数据技术有限公司 A kind of data processing method based on data warehouse
CN107704590A (en) * 2017-09-30 2018-02-16 深圳市华傲数据技术有限公司 A kind of data processing method and system based on data warehouse
CN109145164A (en) * 2018-08-28 2019-01-04 百度在线网络技术(北京)有限公司 Data processing method, device, equipment and medium
CN109522312A (en) * 2018-11-27 2019-03-26 北京锐安科技有限公司 A kind of data processing method, device, server and storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7047230B2 (en) * 2002-09-09 2006-05-16 Lucent Technologies Inc. Distinct sampling system and a method of distinct sampling for optimizing distinct value query estimates
ITUD20050209A1 (en) * 2005-12-09 2007-06-10 Eurotech Spa METHOD FOR THE FINDING OF AFFINITY BETWEEN SUBJECTS AND ITS APPARATUS
US20110004622A1 (en) * 2007-10-17 2011-01-06 Blazent, Inc. Method and apparatus for gathering and organizing information pertaining to an entity
CN105830053A (en) * 2014-01-16 2016-08-03 英特尔公司 An apparatus, method, and system for a fast configuration mechanism
US10095766B2 (en) * 2015-10-23 2018-10-09 Numerify, Inc. Automated refinement and validation of data warehouse star schemas
US10360239B2 (en) * 2015-10-23 2019-07-23 Numerify, Inc. Automated definition of data warehouse star schemas
CN106933907B (en) * 2015-12-31 2020-09-15 北京国双科技有限公司 Processing method and device for data table expansion indexes
CN108520008A (en) * 2018-03-15 2018-09-11 链家网(北京)科技有限公司 The construction method and construction device of data warehouse model
CN109033173B (en) * 2018-06-21 2022-09-13 土巴兔集团股份有限公司 Data processing method and device for generating multidimensional index data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1975772A (en) * 2006-12-22 2007-06-06 中国建设银行股份有限公司 Method and device for integrating information in multi-system
CN103853820A (en) * 2014-02-20 2014-06-11 北京用友政务软件有限公司 Data processing method and data processing system
CN106294521A (en) * 2015-06-12 2017-01-04 交通银行股份有限公司 Date storage method and data warehouse
CN107657049A (en) * 2017-09-30 2018-02-02 深圳市华傲数据技术有限公司 A kind of data processing method based on data warehouse
CN107704590A (en) * 2017-09-30 2018-02-16 深圳市华傲数据技术有限公司 A kind of data processing method and system based on data warehouse
CN109145164A (en) * 2018-08-28 2019-01-04 百度在线网络技术(北京)有限公司 Data processing method, device, equipment and medium
CN109522312A (en) * 2018-11-27 2019-03-26 北京锐安科技有限公司 A kind of data processing method, device, server and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
什么是数据仓库主题;ChavinKing;《https://www.cnblogs.com/wcwen1990/p/7600251.html》;20170927;1-4 *
基于供应链的数据仓库系统研究;张洪波;《中国优秀硕博士学位论文全文数据库(硕士) 信息科技辑》;20050815(第4期);I138-221 *

Also Published As

Publication number Publication date
CN110297818A (en) 2019-10-01

Similar Documents

Publication Publication Date Title
US10621281B2 (en) Populating values in a spreadsheet using semantic cues
US20190005392A1 (en) Method, device and equipment for fusing different instances describing same entity
CN105335403B (en) Database access method and device and database system
US10268655B2 (en) Method, device, server and storage medium of searching a group based on social network
US9218568B2 (en) Disambiguating data using contextual and historical information
US20170235726A1 (en) Information identification and extraction
CN111046237B (en) User behavior data processing method and device, electronic equipment and readable medium
CN114706882A (en) Structured information card search and retrieval
CN110399448B (en) Chinese place name address searching and matching method, terminal and computer readable storage medium
CN104572946B (en) Yellow page data processing method and processing device
US10250550B2 (en) Social message monitoring method and apparatus
CN110069619A (en) Source of houses methods of exhibiting, device, equipment and computer readable storage medium
JP6454407B2 (en) Method and apparatus for acquiring candidate address information in a map
CN107545036B (en) Customer service robot knowledge base construction method, customer service robot and readable storage medium
CN110297818B (en) Method and device for constructing data warehouse
CN112068812B (en) Micro-service generation method and device, computer equipment and storage medium
US20160357858A1 (en) Using online social networks to find trends of top vacation destinations
JP2020004217A (en) Information display method, information display program and information display apparatus
CN107644103A (en) It is a kind of can tracing information source information storage method and system
CN115270731A (en) Collaborative editing method and device for mixed document
CN114418120A (en) Data processing method, device, equipment and storage medium of federal tree model
CN109542986B (en) Element normalization method, device, equipment and storage medium of network data
CN106557564A (en) A kind of object data analysis method and device
CN111078671A (en) Method, device, equipment and medium for modifying data table field
CN108920676B (en) Method and system for processing graph data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant