CN103559272A - Method and device for importing data into dimension table - Google Patents

Method and device for importing data into dimension table Download PDF

Info

Publication number
CN103559272A
CN103559272A CN201310541634.3A CN201310541634A CN103559272A CN 103559272 A CN103559272 A CN 103559272A CN 201310541634 A CN201310541634 A CN 201310541634A CN 103559272 A CN103559272 A CN 103559272A
Authority
CN
China
Prior art keywords
data
dimension table
data source
source data
importing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310541634.3A
Other languages
Chinese (zh)
Inventor
洪超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201310541634.3A priority Critical patent/CN103559272A/en
Publication of CN103559272A publication Critical patent/CN103559272A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and device for importing data into a dimension table. The method for importing data into the dimension table comprises the following steps: establishing a unique index of a target dimension table, wherein the target dimension table is a dimension table for receiving data source data in a database; setting the property of the unique index as a preset property, wherein the preset property indicates that the data source data are not inserted and the database does not report errors under the condition that the data source data already exist in the target dimension table; importing the data source data into the target dimension table. According to the method and the device, the problem of low importing efficiency of data into a large dimension table is solved, and the effect of increasing the data importing efficiency is further achieved.

Description

To the method and apparatus that imports data in dimension table
Technical field
The present invention relates to database field, in particular to a kind of method and apparatus to importing data in dimension table.
Background technology
Along with Data Growth, a lot of companies all adopt database to do analytic system, at lane database, have dimension and index.Dimension table is used for preserving some dimensions, and what preserve as DimUrl is the dimension of Url, for relevant index (as visit capacity, page browsing amount) etc. being analyzed from the angle of Url at lane database.Dimension table in logic each provisional capital represents that a unique record of this dimension is capable, as each of Dimurl dimension records provisional capital, represents a unique Url record.When the scale of data warehouse large to a certain extent time, keep away the situation that unavoidably there will be large dimension table, and these large dimension tables often have every day, a lot of records is capable need to be imported, after importing, also to guarantee the uniqueness of large dimension table, just need to when large dimension table imports record row, meet two conditions here: 1. import fast simultaneously; 2. guarantee the uniqueness of every record of large dimension table.
With SSIS instrument, carrying out the extraction of ETL(data, conversion, loading procedure) time, what at present general large dimension table imported employing is all Lookup control.When inserting each line item, Lookup control judges whether it exists in large dimension table, if exist, does not insert, if do not exist, inserts.This scheme is the scheme importing line by line, and efficiency is very low.
For importing the problem that data efficiency is lower in correlation technique in large dimension table, effective solution is not yet proposed at present.
Summary of the invention
Fundamental purpose of the present invention is to provide a kind of method and apparatus to importing data in dimension table, to solve in prior art, in large dimension table, imports the problem that data efficiency is lower.
To achieve these goals, according to an aspect of the present invention, provide a kind of method to importing data in dimension table, having comprised: set up the unique index of target dimension table, wherein, target dimension table is in database, to receive the dimension table of data source data; The attribute of the unique index of Offered target dimension table is default attribute, and default attribute representation's data source data is in target dimension table in already present situation, and not data inserting source data, and database does not report an error; And data source data is imported in target dimension table.
Further, in data source data is imported to target dimension table before, to the method that imports data in dimension table, also comprise: check whether data source data exists repetition; And if check out that data source data exist to repeat, delete the repeating part of data source data or choose arbitrary data source data as being imported into data from the data source data repeating.
Further, data source data being imported to target dimension table comprises: data source data is imported in the temporary table of database; Set up the unique index of temporary table; And by the data importing in temporary table in target dimension table.
Further, in data source data is imported to target dimension table before, to the method that imports data in dimension table, also comprise: calculate the mapping value of each data source data, wherein, the length of mapping value is less than the length of corresponding data source data.
Further, mapping value is cryptographic hash.
Further, according to the key assignments of target dimension table, set up the unique index of target dimension table.
To achieve these goals, according to a further aspect in the invention, provide a kind of device to importing data in dimension table, this device for carry out that foregoing of the present invention provides any to the method that imports data in dimension table.
To achieve these goals, according to a further aspect in the invention, provide a kind of device to importing data in dimension table, comprise: set up unit, for setting up the unique index of target dimension table, wherein, target dimension table is in database, to receive the dimension table of data source data; Setting unit, is default attribute for the attribute of the unique index of Offered target dimension table, and default attribute representation's data source data is in target dimension table in already present situation, and not data inserting source data, and database does not report an error; And importing unit, for data source data being imported to target dimension table.
Further, to the device that imports data in dimension table, also comprise: inspection unit, for before data source data is imported to target dimension table, checks whether data source data exists repetition; And processing unit, in the situation that checking out that data source data exists repetition, delete the repeating part of data source data or choose arbitrary data source data as being imported into data from the data source data repeating.
Further, import unit and comprise: first imports subelement, for data source data being imported to the temporary table of database; Set up subelement, for setting up the unique index of temporary table; And second import subelement, for by the data importing of temporary table in target dimension table.
Further, to the device that imports data in dimension table, also comprise: computing unit, for calculating the mapping value of each data source data, wherein, the length of mapping value is less than the length of corresponding data source data.
Further, computing unit is used hash algorithm to calculate mapping value.
Further, set up unit and according to the key assignments of target dimension table, set up the unique index of target dimension table.
The present invention adopts the unique index of setting up target dimension table, and wherein, target dimension table is in database, to receive the dimension table of data source data; The attribute that unique index is set is default attribute, and default attribute representation's data source data is in target dimension table in already present situation, and not data inserting source data, and database does not report an error; And data source data is imported in target dimension table, due to target dimension table has been set up to unique index, when target dimension table is arrived in data importing, data source data can be carried out to multi-to-multi with already present data in target dimension table mates, judge with using Lookup control in prior art line by line whether data source data exists and compare in database, and efficiency has greatly improved.And the database not data inserting source data that do not report an error the also when setup of attribute of unique index has been existed in target dimension table for finding data source data, can make the process of data importing do not interrupted the uniqueness that simultaneously guarantees data, solved in large dimension table and imported the problem that data efficiency is lower, and then reached the effect of mentioning data importing efficiency.
Accompanying drawing explanation
The accompanying drawing that forms the application's a part is used to provide a further understanding of the present invention, and schematic description and description of the present invention is used for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 be according to first embodiment of the invention to the process flow diagram that imports the method for data in dimension table;
Fig. 2 be according to second embodiment of the invention to the process flow diagram that imports the method for data in dimension table;
Fig. 3 be according to first embodiment of the invention to the structural representation that imports the device of data in dimension table; And
Fig. 4 be according to second embodiment of the invention to the structural representation that imports the device of data in dimension table.
Embodiment
It should be noted that, in the situation that not conflicting, embodiment and the feature in embodiment in the application can combine mutually.Describe below with reference to the accompanying drawings and in conjunction with the embodiments the present invention in detail.
The invention provides a kind of method to importing data in dimension table, the method to importing data in the dimension table below embodiment of the present invention being provided is specifically introduced:
Fig. 1 be first embodiment of the invention to the process flow diagram that imports the method for data in dimension table, as shown in Figure 1, the method comprises that following step S102 is to step S106:
Step S102, sets up the unique index of target dimension table, and wherein, target dimension table is in database, to receive the dimension table of data source data.Particularly, can set up according to the key assignments of target dimension table the unique index of target dimension table.
Step S104, the attribute of the unique index of Offered target dimension table is default attribute, default attribute representation's data source data in target dimension table in already present situation, data inserting source data not, and database does not report an error.Particularly, the embodiment of the present invention adopts database program Microsoft SQL Server, the attribute ignoreduplicate value of unique index is set to true, such setting is illustrated in while finding that data source data has existed in target dimension table, the database not data inserting that do not report an error also.
Step S106, imports data source data in target dimension table.Do not need to judge through Lookup control that line by line whether data source data exists in database, directly imports data.
Target dimension table has been set up to unique index, when target dimension table is arrived in data importing, data source data can be carried out to multi-to-multi with already present data in target dimension table mates, judge with using Lookup control in prior art line by line whether data source data exists and compare in database, and efficiency has greatly improved.And the database not data inserting source data that do not report an error the also when setup of attribute of unique index has been existed in target dimension table for finding data source data, can make the process of data importing do not interrupted the uniqueness that simultaneously guarantees data, solved in large dimension table and imported the problem that data efficiency is lower, and then reached the effect of mentioning data importing efficiency.
Further, before in data source data is imported to target dimension table, the method to importing data in dimension table of the embodiment of the present invention also comprises: check whether these data source data exist repetition, if check out that data source data exists repetition, delete the repeating part of data source data, or choose arbitrary data source data as being imported into data from the data source data repeating.Because data source data may exist repetition, before importing data, in SSIS, first it is carried out to the data volume that duplicate removal can reduce data source, further improve the efficiency of data importing.
Wherein, for the data that are the forms such as file (file such as CSV) for data source data, can take the mode of deleting duplicated data, if but data source data is database, adopting the method for deleting duplicated data is a kind of write operation consuming time, for this situation be choose a data source data in the data source data of repetition as follow-up by the data that are imported into that import in target dimension table.Further, in data source data is imported to target dimension table before, to the method that imports data in dimension table, also comprise: calculate the mapping value of each data source data, wherein, the length of mapping value is less than the length of corresponding data source data.Particularly, in embodiments of the present invention, can adopt hash algorithm to calculate the mapping value of each data source data, hash algorithm is mapped to the shorter binary data of length by longer binary data, and the different cryptographic hash of the unique correspondence of different data.Because the length of data source data may be very long, adopt cryptographic hash to carry out the efficiency that Data Matching can improve Data Matching.It should be noted that, the embodiment of the present invention adopts hash algorithm that data source data is mapped to the shorter data of length, but is not limited only to this, also can adopt the mapping value of other energy mapping (enum) data.
Fig. 2 be second embodiment of the invention to the process flow diagram that imports the method for data in dimension table, the method to importing data in dimension table that this second embodiment provides can be used as first embodiment of the invention to the preferred embodiment that imports the method for data in dimension table.As shown in Figure 2, the method comprises that following step S202 is to step S212:
Step S202, sets up the unique index of target dimension table, and wherein, target dimension table is in database, to receive the dimension table of data source data.Particularly, can set up unique index according to the key assignments of target dimension table.
Step S204, the attribute of the unique index of Offered target dimension table is default attribute, default attribute representation's data source data in target dimension table in already present situation, data inserting source data not, and database does not report an error.Particularly, the embodiment of the present invention adopts database program SQL Server, the attribute ignore duplicate value of unique index is set to true, and such setting is illustrated in while finding that data source data has existed in target dimension table, the database not data inserting that do not report an error also.
Step S206, checks whether data source data exists repetition, if data source data exists, repeats, and deletes the repeating part of data source data or choose arbitrary data source data as being imported into data from the data source data repeating.Because data source data may exist repetition, before importing data, in SSIS, first it is carried out to the data volume that duplicate removal can reduce data source, further improve the efficiency of data importing.
Step S208, imports to data source data in the temporary table of database.
Step S210, sets up the unique index of temporary table.
Step S212, by the data importing in temporary table in target dimension table.
When the quantity of data source data is larger, SQL Server does not know that data source data carried out duplicate removal in SSIS, just can judge that data likely exist repetition.Because the data that repeat can cause repeatedly carrying out from dimension table or relative index searching of identical recordings, i.e. " recoil " when data are inserted.In order to reduce the expense of " recoil ", SQL Server, when carrying out the executive plan of batch query, can be loaded into internal memory by all data in the relevant matches row of dimension table.And data source data had been done duplicate removal in the present embodiment, there is not the data source data of repetition, all data in the relevant matches row of dimension table are loaded into internal memory and have caused unnecessary memory cost.
Another kind of situation is, when SQL Server judges cost that all data in the relevant matches row of dimension table are loaded into internal memory higher than the cost of " recoil ", the executive plan meeting of SQL Server is used Nest Loops to inquire about line by line rather than is carried out batch operation, can make to reduce equally the efficiency of data importing.
The method to importing data in dimension table of second embodiment of the invention is first carried out duplicate removal to data source data, again data source data is put into the temporary table of database, data source data in temporary table is set up to unique index, so it is unique that the data in temporary table have, show that data source data is unique, does not exist repetition.SQL Server, when carrying out batch query, does not reexamine the repeatability of data source data, directly carries out batch coupling, has removed the expense that all data in dimension table relevant matches row is loaded into internal memory from, can not adopt Nest Loops to carry out inquiry line by line yet.Once to import mass data in dimension table in the situation that, the method to importing data in dimension table that the method to importing data in dimension table that adopts that second embodiment of the invention provides provides than the first embodiment has higher efficiency.
Further, in data source data is imported to target dimension table before, to the method that imports data in dimension table, also comprise: calculate the mapping value of each data source data, wherein, the length of mapping value is less than the length of corresponding data source data.Particularly, in embodiments of the present invention, can adopt hash algorithm to calculate the mapping value of each data source data, hash algorithm is mapped to the shorter binary data of length by longer binary data, and the different cryptographic hash of the unique correspondence of different data.Because the length of data source data may be very long, adopt cryptographic hash to carry out the efficiency that Data Matching can improve Data Matching.It should be noted that, the embodiment of the present invention adopts hash algorithm that data source data is mapped to the shorter data of length, but is not limited only to this, also can adopt the mapping value of other energy mapping (enum) data.
The embodiment of the present invention also provides a kind of device to importing data in dimension table, this device is mainly used in carrying out that the invention process foregoing provides to the method that imports data in dimension table, below the program that prevents that the embodiment of the present invention the is provided device of carrying out malicious operation be specifically introduced:
Fig. 3 be first embodiment of the invention to the structural drawing that imports the device of data in dimension table, as shown in Figure 3, this device comprises: set up unit 10, setting unit 20 and import unit 30.
Set up unit 10 for setting up the unique index of target dimension table, wherein, target dimension table is in database, to receive the dimension table of data source data.In addition, can set up according to the key assignments of target dimension table the unique index of target dimension table.
Setting unit 20 is default attribute for the attribute of the unique index of Offered target dimension table, the data of default attribute representation's data source in target dimension table in already present situation, data inserting source data not, and database does not report an error.Particularly, the embodiment of the present invention adopts database program Microsoft SQL Server, the attribute ignore duplicate value of unique index is set to true, such setting is illustrated in while finding that data source data has existed in target dimension table, the database not data inserting that do not report an error also.
Import unit 30 for data source data being imported to target dimension table.Do not need to judge through Lookup control that line by line whether data source data exists in database, directly imports data.
Target dimension table has been set up to unique index, when target dimension table is arrived in data importing, data source data can be carried out to multi-to-multi with already present data in target dimension table mates, judge with using Lookup control in prior art line by line whether data source data exists and compare in database, and efficiency has greatly improved.And the database not data inserting source data that do not report an error the also when setup of attribute of unique index has been existed in target dimension table for finding data source data, can make the process of data importing do not interrupted the uniqueness that simultaneously guarantees data, solved in large dimension table and imported the problem that data efficiency is lower, and then reached the effect of mentioning data importing efficiency.
Further, the device to importing data in dimension table of the embodiment of the present invention also comprises inspection unit 40 and processing unit 50, inspection unit 40 is for before importing target dimension table by data source data, check whether data source data exists repetition, processing unit 50 is in the situation that checking out that data source data exists, delete the repeating part of data source data, or choose arbitrary data source data as being imported into data from the data source data repeating.Because data source data may exist repetition, before importing data, in SSIS, first it is carried out to the data volume that duplicate removal can reduce data source, further improve the efficiency of data importing.
Wherein, for the data that are the forms such as file (file such as CSV) for data source data, can take the mode of deleting duplicated data, if but data source data is database, adopting the method for deleting duplicated data is a kind of write operation consuming time, for this situation be choose a data source data in the data source data of repetition as follow-up by the data that are imported into that import in target dimension table.
Further, the device to importing data in dimension table of the embodiment of the present invention also comprises computing unit, and computing unit, for before data source data is imported to target dimension table, is set up the mapping value of each data source data, wherein, the length of mapping value is less than corresponding data source data.Hash algorithm is mapped to the shorter binary data of length by longer binary data, and the different cryptographic hash of the unique correspondence of different data.Because the length of data source data may be very long, adopt cryptographic hash to carry out the efficiency that Data Matching can improve Data Matching.The embodiment of the present invention adopts hash algorithm that data source data is mapped to the shorter data of length, but is not limited only to this.
Fig. 4 be second embodiment of the invention to the structural drawing that imports the device of data in dimension table, as shown in Figure 4, this device comprises: set up unit 10, setting unit 20, import unit 30, inspection unit 40 and processing unit 50.Wherein, importing unit 30 comprises the first importing subelement 301, sets up subelement 302 and the second importing subelement 303.
Set up unit 10 for setting up the unique index of target dimension table, wherein, target dimension table is in database, to receive the dimension table of data source data.In addition, can set up according to the key assignments of target dimension table the unique index of target dimension table.
Setting unit 20 is default attribute for the attribute of the unique index of Offered target dimension table, the data of default attribute representation's data source in target dimension table in already present situation, data inserting source data not, and database does not report an error.Particularly, the embodiment of the present invention adopts database program Microsoft SQL Server, the attribute ignore duplicate value of unique index is set to true, such setting is illustrated in while finding that data source data has existed in target dimension table, the database not data inserting that do not report an error also.
Inspection unit 40, for before data source data is imported to target dimension table, checks whether a plurality of data source data exist repetition.
Processing unit 50, in the situation that checking out that data source data exists, is deleted the repeating part of data source data or choose arbitrary data source data as being imported into data from the data source data repeating.Because data source data may exist repetition, before importing data, in SSIS, first it is carried out to the data volume that duplicate removal can reduce data source, further improve the efficiency of data importing.
Import unit 30 for data source data being imported to target dimension table, import unit 30 and mainly comprise the first importing subelement 301, set up subelement 302 and the second importing unit 303.Wherein, first import subelement 301 for data source data being imported to the temporary table of database.Set up subelement 302 for setting up the unique index of temporary table.Second import subelement 303 for by the data importing of temporary table in target dimension table.
When the quantity of data source data is larger, SQL Server does not know that data source data carried out duplicate removal in SSIS, just can judge that data likely exist repetition.Because the data that repeat can cause repeatedly carrying out from dimension table or relative index searching of identical recordings, i.e. " recoil " when data are inserted.In order to reduce the expense of " recoil ", SQL Server, when carrying out the executive plan of batch query, can be loaded into internal memory by all data in the relevant matches row of dimension table.And data source data had been done duplicate removal in the present embodiment, there is not the data source data of repetition, all data in the relevant matches row of dimension table are loaded into internal memory and have caused unnecessary memory cost.
Another kind of situation is, when SQL Server judges cost that all data in the relevant matches row of dimension table are loaded into internal memory higher than the cost of " recoil ", the executive plan meeting of SQL Server is used Nest Loops to inquire about line by line rather than is carried out batch operation, can make to reduce equally the efficiency of data importing.
The device to importing data in dimension table of second embodiment of the invention first carries out duplicate removal to data source data, again data source data is put into the temporary table of database, data source data in temporary table is set up to unique index, so it is unique that the data in temporary table have, show that data source data is unique, does not exist repetition.SQL Server, when carrying out batch query, does not reexamine the repeatability of data source data, directly carries out batch coupling, has removed the expense that all data in dimension table relevant matches row is loaded into internal memory from, can not adopt Nest Loops to carry out inquiry line by line yet.Once to import mass data in dimension table in the situation that, the device to importing data in dimension table that the device to importing data in dimension table that adopts that the embodiment of the present invention provides provides than the first embodiment has higher efficiency.
Further, the device to importing data in dimension table of second embodiment of the invention also comprises computing unit, computing unit is for before importing target dimension table by data source data, set up the mapping value of each data source data, wherein, the length of mapping value is less than corresponding data source data.Hash algorithm is mapped to the shorter binary data of length by longer binary data, and the different cryptographic hash of the unique correspondence of different data.Because the length of data source data may be very long, adopt cryptographic hash to carry out the efficiency that Data Matching can improve Data Matching.The embodiment of the present invention adopts hash algorithm that data source data is mapped to the shorter data of length, but is not limited only to this.
As can be seen from the above description, adopt the present invention to realize in dimension table and import data in batches, reached the effect that improves the efficiency of data importing.
It should be noted that, in the step shown in the process flow diagram of accompanying drawing, can in the computer system such as one group of computer executable instructions, carry out, and, although there is shown logical order in flow process, but in some cases, can carry out shown or described step with the order being different from herein.
Obviously, those skilled in the art should be understood that, above-mentioned each module of the present invention or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on the network that a plurality of calculation elements form, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in memory storage and be carried out by calculation element, or they are made into respectively to each integrated circuit modules, or a plurality of modules in them or step are made into single integrated circuit module to be realized.Like this, the present invention is not restricted to any specific hardware and software combination.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (12)

1. to a method that imports data in dimension table, it is characterized in that, comprising:
Set up the unique index of target dimension table, wherein, described target dimension table is in database, to receive the dimension table of data source data;
The attribute that the unique index of described target dimension table is set is default attribute, and data source data in already present situation, is not inserted described data source data in described target dimension table described in described default attribute representation, and described database does not report an error; And
Described data source data is imported in described target dimension table.
2. the method to importing data in dimension table according to claim 1, is characterized in that, in described data source data is imported to described target dimension table before, the described method to importing data in dimension table also comprises:
Check whether described data source data exists repetition; And
If check out that described data source data exists repetition, delete the repeating part of described data source data or choose arbitrary described data source data as being imported into data from the described data source data repeating.
3. the method to importing data in dimension table according to claim 2, is characterized in that, described data source data is imported to described target dimension table and comprise:
Described data source data is imported in the temporary table of described database;
Set up the unique index of described temporary table; And
By the data importing in described temporary table in described target dimension table.
4. the method to importing data in dimension table according to claim 1, is characterized in that, in described data source data is imported to described target dimension table before, the described method to importing data in dimension table also comprises:
Calculate the mapping value of data source data described in each, wherein, the length of described mapping value is less than the length of corresponding described data source data.
5. the method to importing data in dimension table according to claim 4, is characterized in that, described mapping value is cryptographic hash.
6. the method to importing data in dimension table according to claim 1, is characterized in that, sets up the unique index of described target dimension table according to the key assignments of described target dimension table.
7. to a device that imports data in dimension table, it is characterized in that, comprising:
Set up unit, for setting up the unique index of target dimension table, wherein, described target dimension table is in database, to receive the dimension table of data source data;
Setting unit, for the attribute of the unique index of described target dimension table is set, it is default attribute, described in described default attribute representation, data source data, in described target dimension table in already present situation, is not inserted described data source data, and described database does not report an error; And
Import unit, for described data source data being imported to described target dimension table.
8. the device to importing data in dimension table according to claim 7, is characterized in that, the described device to importing data in dimension table also comprises:
Inspection unit, for before described data source data is imported to described target dimension table, checks whether described data source data exists repetition; And
Processing unit, in the situation that checking out that described data source data exists repetition, deletes the repeating part of described data source data or choose arbitrary described data source data as being imported into data from the described data source data repeating.
9. the device to importing data in dimension table according to claim 8, is characterized in that, described importing unit comprises:
First imports subelement, for described data source data being imported to the temporary table of described database;
Set up subelement, for setting up the unique index of described temporary table; And
Second imports subelement, for by the data importing of described temporary table in described target dimension table.
10. the device to importing data in dimension table according to claim 7, is characterized in that, the described device to importing data in dimension table also comprises:
Computing unit, for calculating the mapping value of data source data described in each, wherein, the length of described mapping value is less than the length of corresponding described data source data.
11. devices to importing data in dimension table according to claim 10, is characterized in that, computing unit is used hash algorithm to calculate described mapping value.
12. devices to importing data in dimension table according to claim 7, is characterized in that, the unique index of described target dimension table is set up in the described unit of setting up according to the key assignments of described target dimension table.
CN201310541634.3A 2013-11-05 2013-11-05 Method and device for importing data into dimension table Pending CN103559272A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310541634.3A CN103559272A (en) 2013-11-05 2013-11-05 Method and device for importing data into dimension table

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310541634.3A CN103559272A (en) 2013-11-05 2013-11-05 Method and device for importing data into dimension table

Publications (1)

Publication Number Publication Date
CN103559272A true CN103559272A (en) 2014-02-05

Family

ID=50013518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310541634.3A Pending CN103559272A (en) 2013-11-05 2013-11-05 Method and device for importing data into dimension table

Country Status (1)

Country Link
CN (1) CN103559272A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239567A (en) * 2014-09-28 2014-12-24 北京国双科技有限公司 Method and device for processing dimension in data warehouse
CN104408183A (en) * 2014-12-15 2015-03-11 北京国双科技有限公司 Data import method and device of data system
CN104615750A (en) * 2015-02-12 2015-05-13 中国农业银行股份有限公司 Realization method of main memory database under host system
CN105389404A (en) * 2015-12-29 2016-03-09 北京斗牛科技有限公司 Method and device for importing data into database association table
CN107256252A (en) * 2017-06-09 2017-10-17 浪潮软件集团有限公司 Third-party multidimensional data migration method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101266606A (en) * 2007-03-15 2008-09-17 阿里巴巴公司 On-line data migration method based on Oracle database
CN101286160A (en) * 2008-05-30 2008-10-15 同济大学 Data base indexing process
CN101382949A (en) * 2008-10-28 2009-03-11 阿里巴巴集团控股有限公司 Management method for database table and apparatus
CN103200293A (en) * 2013-03-05 2013-07-10 上海斐讯数据通信技术有限公司 Method of automatically combining tautonomy contacts in process of guiding contacts into contact list

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101266606A (en) * 2007-03-15 2008-09-17 阿里巴巴公司 On-line data migration method based on Oracle database
CN101286160A (en) * 2008-05-30 2008-10-15 同济大学 Data base indexing process
CN101382949A (en) * 2008-10-28 2009-03-11 阿里巴巴集团控股有限公司 Management method for database table and apparatus
CN103200293A (en) * 2013-03-05 2013-07-10 上海斐讯数据通信技术有限公司 Method of automatically combining tautonomy contacts in process of guiding contacts into contact list

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何玉洁: "《数据库基础与实践技术(SQL Server 2008)》", 31 March 2013 *
怡然: "sqlserver通过ignore_dup_key索引去除重复数据", 《HTTP://BLOG.163.COM/ZANGYUNLING%40126/BLOG/STATIC/1646245052010726112424695/》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239567A (en) * 2014-09-28 2014-12-24 北京国双科技有限公司 Method and device for processing dimension in data warehouse
CN104239567B (en) * 2014-09-28 2018-04-06 北京国双科技有限公司 Dimension treating method and apparatus in data warehouse
CN104408183A (en) * 2014-12-15 2015-03-11 北京国双科技有限公司 Data import method and device of data system
CN104408183B (en) * 2014-12-15 2018-05-15 北京国双科技有限公司 The data lead-in method and device of data system
CN104615750A (en) * 2015-02-12 2015-05-13 中国农业银行股份有限公司 Realization method of main memory database under host system
CN104615750B (en) * 2015-02-12 2017-11-03 中国农业银行股份有限公司 A kind of implementation method of memory database under host computer system
CN105389404A (en) * 2015-12-29 2016-03-09 北京斗牛科技有限公司 Method and device for importing data into database association table
CN105389404B (en) * 2015-12-29 2019-04-16 北京斗牛科技有限公司 A kind of method and apparatus importing data to database association table
CN107256252A (en) * 2017-06-09 2017-10-17 浪潮软件集团有限公司 Third-party multidimensional data migration method and device

Similar Documents

Publication Publication Date Title
AU2016382908B2 (en) Short link processing method, device and server
KR102097881B1 (en) Method and apparatus for processing a short link, and a short link server
CN111046034B (en) Method and system for managing memory data and maintaining data in memory
CN103020268B (en) Relevant database sequence number application process and system
CN103559272A (en) Method and device for importing data into dimension table
CN105404634B (en) Data managing method and system based on Key-Value data block
CN105488043A (en) Data query method and system based on Key-Value data blocks
CN107977396B (en) Method and device for updating data table of KeyValue database
US8370326B2 (en) System and method for parallel computation of frequency histograms on joined tables
US20160328445A1 (en) Data Query Method and Apparatus
CN109885614B (en) Data synchronization method and device
CN106471501B (en) Data query method, data object storage method and data system
CN105786808A (en) Method and apparatus for executing relation type calculating instruction in distributed way
CN104090962A (en) Nested query method oriented to mass distributed-type database
CN108228799B (en) Object index information storage method and device
CN103902544A (en) Data processing method and system
CN102169491B (en) Dynamic detection method for multi-data concentrated and repeated records
CN104239353B (en) WEB classification control and log audit method
CN110727702B (en) Data query method, device, terminal and computer readable storage medium
CN104572862A (en) Mass data storage access method and system
CN103617199A (en) Data operating method and data operating system
CN105574054A (en) Distributed cache range query method, apparatus and system
US11074259B2 (en) Optimize query based on unique attribute
CN101963993B (en) Method for fast searching database sheet table record
CN104408183B (en) The data lead-in method and device of data system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20140205

RJ01 Rejection of invention patent application after publication