Embodiment
It should be noted that, in the situation that not conflicting, embodiment and the feature in embodiment in the application can combine mutually.Describe below with reference to the accompanying drawings and in conjunction with the embodiments the present invention in detail.
In order to make those skilled in the art better understand the present invention program, below in conjunction with the accompanying drawing in the embodiment of the present invention, to being clearly and completely described in the embodiment of the present invention, obviously, described embodiment is only the embodiment of a part of the present invention, rather than whole embodiment.Embodiment based in the present invention, does not make the every other embodiment obtaining under creative work prerequisite those of ordinary skills, all should belong to protection scope of the present invention.
It should be noted that, the term " first " in instructions of the present invention and claims and above-mentioned accompanying drawing, " second " etc. are for distinguishing similar object, and needn't be for describing specific order or precedence.The data that should be appreciated that such use suitably can exchanged in situation, so as embodiments of the invention described herein can with except diagram here or describe those order enforcement.In addition, term " comprises " and " having " and their any distortion, is intended to be to cover not exclusive comprising.
According to embodiments of the invention, a kind of Data Integration disposal route for a plurality of data sources is provided, should for the Data Integration disposal route of a plurality of data sources, be used for the data of separate sources to carry out unified integration.Should may operate in computer-processing equipment for the Data Integration disposal route of data.
Fig. 1 is according to the process flow diagram of the Data Integration disposal route for a plurality of data sources of first embodiment of the invention.
As shown in Figure 1, should comprise that following step S101 was to step S103 for Data Integration disposal routes of a plurality of data sources:
Step S101, obtains the data from a plurality of different pieces of informations source.
In Internet advertising field, from the data in a plurality of different pieces of informations source, comprise conversion data and material data.After conversion data refers to that advertisement is thrown in by advertiser, due to user's registration, the data of behavior generation such as log in, browse, click and place an order, conversion data is corresponding with corresponding advertisement entity, the advertisement entity is here material, and advertisement entity can corresponding a plurality of conversion data, and this conversion data can be collected by material collection system.This conversion data can comprise the data of two kinds of forms, can split the data and the data that cannot split form of form.Wherein, can split that the data of form are comprised of the data of the first kind and the data of Second Type.The data acquisition of the first kind is stored by the mode of character string, and the data of the first kind can be the unique identifications of advertisement entity corresponding to conversion data, and above-mentioned character string can be used as the key word of advertisement entity corresponding to this conversion data.The data acquisition of Second Type is stored by the mode of key-value pair, and this key-value pair can be data type and data value pair, for example, and the value equity of the value of floating type data and floating type data to, shaping data and shaping data.The data that cannot split form do not comprise the data of the first kind, only comprise the data of Second Type, that is, the data that cannot split form do not have the uniquely identified character string as this conversion data.
Data source (Data Source) is to provide device or the original media of certain required data.As its name suggests, the source that data source is data.In data source, store the information that all building databases connect, as finding in file system by designated file name file, by correct DSN is provided, can find corresponding database to connect.
In embodiments of the present invention, obtain data from a plurality of different pieces of informations source for obtaining many data, can obtain by a plurality of database tables the data in a plurality of different pieces of informations source, it should be noted that, the data in a plurality of different pieces of informations source can be for describing the data of same internet entity, for example, Internet advertising for certain part clothes, the data that provide from advertiser can be provided, the data of the data that user oneself collects and website monitoring etc., and above-mentioned data be describe this part clothes relevant information (as, the title of clothes, quantity on order, unit price and total price etc.) data.
Step S102, the data that are the first kind by the Data Division in a plurality of different pieces of informations source and the data of Second Type.
In embodiments of the present invention, can detect the ingredient of the data in a plurality of different pieces of informations source, when detecting the character of data except part, this character string is partly split as to the data of the first kind, remaining data division is as the second data type.
Particularly, can be material data and conversion data by the Data Division in a plurality of different pieces of informations source.Wherein, material data is the keyword strings of description material, the interaction data that conversion data produces when material is operated for user.
In embodiments of the present invention, the data of the first kind that Data Division can be become are stored in the first tables of data, and the data of the Second Type that conversion data can be split into are stored in the second tables of data, wherein, the first tables of data and the second tables of data can be two lists of same application system, and the first tables of data and the second tables of data are respectively used to the data of store character string format and the data of key-value pair form.Particularly, after step S102 is split as the data of the first kind and the data of Second Type by the conversion data in a plurality of different pieces of informations source, should also comprise for the Data Integration disposal route of a plurality of data sources:
Step 1, searches the first tables of data and the second tables of data that set in advance.
For example, in embodiments of the present invention, searching the first tables of data and the second tables of data that set in advance can be to search respectively material (entrance) list and conversion (conversion) list setting in advance in material management system, wherein, material list can be for storage material data, and transforming list can be for storage conversion data.Material data can comprise the data of the types such as advertisement promotion plan, unit, keyword, advertisement source, advertising media, search engine, access time and sign, and wherein, the part type of material data can be sky.Conversion data can be with the form storage data of key-value pair, and conversion data can to comprise some can be empty extended attribute, for example, conversion data can comprise the attribute of expansion sign etc.
Step 2, is stored to the data of the first kind in the first tables of data.
In embodiments of the present invention, can the data of the first kind be stored in the first tables of data according to preset order, also can at random the data of the first kind be stored in the first tables of data.
Step 3, is stored to the data of Second Type in the second tables of data.
In embodiments of the present invention, can the data of Second Type be stored in the second tables of data according to preset order, also can at random the data of Second Type be stored in the second tables of data.
Step S103, carries out Data Integration processing to the data of the data of the first kind and Second Type.
In embodiments of the present invention, after step S103, conversion data and material data can be mated, and according to the concrete configuration of material management system, partly (for example first delete conversion data part that user do not pay close attention to and invalid conversion data, when user only pays close attention to the conversion data placing an order in certain advertisement, just can will only owing to browsing the conversion data producing, delete.), more remaining conversion data and material data are carried out to attribution processing, finally imported in the object table of material management system and showed user.
By the present invention, adopt and obtain the conversion data from a plurality of different pieces of informations source; The conversion data in a plurality of different pieces of informations source is split as to the data of the first kind and the data of Second Type; And the data of the data of the first kind and Second Type are carried out to Data Integration processing, solved in correlation technique and can not carry out to the conversion data of separate sources the problem of unified integration, and then reached the effect of the conversion data of unified integration separate sources.
Fig. 2 is according to the process flow diagram of the Data Integration disposal route for a plurality of data sources of second embodiment of the invention.
As shown in Figure 2, should comprise that following step S201 was to step S205 for the Data Integration disposal route of a plurality of data sources, this embodiment can be used as preferred implementation embodiment illustrated in fig. 1.
Step S201, judges whether the data in a plurality of different pieces of informations source have sign.
In embodiments of the present invention, before the conversion data in a plurality of different pieces of informations source is split as to the data of the first kind and the data of Second Type,, before the step S101 shown in Fig. 1, can judge whether the data in a plurality of different pieces of informations source have sign.Particularly, whether the data that can detect a plurality of different pieces of informations source have string data, when the above-mentioned data of detection have string data, the data of judging a plurality of different pieces of informations source have sign, wherein, this sign is above-mentioned detected character string, otherwise when the above-mentioned data of detection do not have string data, the data of judging a plurality of different pieces of informations source do not identify.It should be noted that, in embodiments of the present invention, the data acquisition only with a plurality of different pieces of informations source of sign can carry out deconsolidation process,, the Data Division in a plurality of different pieces of informations source can be become to material data and conversion data, and after splitting, material data and conversion data can identify by the sign of the data in the plurality of different pieces of information source.In embodiments of the present invention, data to a plurality of different pieces of informations source not identifying can not be carried out resolution, but can first for having a character string of the random generation of data in a plurality of different pieces of informations source of sign, this as it, not identify, this sign is material data, and then the data in above-mentioned a plurality of different pieces of informations source are entered to be split as material data and conversion data, and the material data that fractionation is obtained and conversion data are carried out uniquely tagged by this sign.It should be noted that, the sign corresponding from the different pieces of information in a plurality of different pieces of informations source is different, and the sign of each these data is unique.
Step S202A, has sign if judge the data in a plurality of different pieces of informations source, each data is all added to existing sign.
In embodiments of the present invention, if judge the data in a plurality of different pieces of informations source, there is sign, to each data, all add existing sign to refer to the data of the data of the first kind after each Data Division and Second Type are added respectively to existing sign, for example, can add existing sign in the corresponding position with extended attribute in material list and conversion list respectively to the material data after each Data Division and conversion data.
In embodiments of the present invention, can to each data, add sign in such a way:
Step 1, the Data Division that need to add inlet identity is the Keywords section and clicks part.
For example, the click data of session of take is example, wherein, session has sign sessionkey, this session data can be split as to the Keywords section of this session and click part, and the Keywords section can be stored in material list and transform in list as material data and conversion data respectively with clicking partly.
Step 2, adds respectively sign by the Keywords section and click part.
For example, according to the example in above-mentioned steps 1, the position that can store respectively the respective extension attribute of the Keywords section and click part in material list with in transforming list adds sign sessionkey.
Step S202B, does not identify if judge the data in a plurality of different pieces of informations source, each data is all added to unique identification.
In embodiments of the present invention, if judging the data in a plurality of different pieces of informations source does not identify, to each data, all add unique identification to refer to adding the data of the first kind and the data of Second Type after each Data Division of uniquely identified to add respectively unique identification, for example, can add material data and conversion data after uniquely identified Data Division to add unique sign in the corresponding position with extended attribute in material list and conversion list respectively to each.
Step S203 is to step S205, respectively with step S101 embodiment illustrated in fig. 1 to step S103, do not repeat them here.
According to embodiments of the invention, a kind of Data Integration treating apparatus for a plurality of data sources is provided, should for the Data Integration treating apparatus of a plurality of data sources, be used for the data of separate sources to carry out unified integration.It should be noted that, the Data Integration treating apparatus for a plurality of data sources of the embodiment of the present invention also can be for carrying out the Data Integration disposal route for a plurality of data sources of the embodiment of the present invention, and the Data Integration disposal route for a plurality of data sources that the embodiment of the present invention provides can be carried out by the Data Integration treating apparatus for a plurality of data sources of the embodiment of the present invention.
Fig. 3 is according to the schematic diagram of the Data Integration treating apparatus for a plurality of data sources of first embodiment of the invention.
As shown in Figure 3, this device comprises: acquiring unit 10, split cells 20 and integral unit 30.
Acquiring unit 10 is for obtaining the data from a plurality of different pieces of informations source.In Internet advertising field, from the data in a plurality of different pieces of informations source, comprise conversion data and material data.After conversion data refers to that advertisement is thrown in by advertiser, due to user's registration, the data of behavior generation such as log in, browse, click and place an order, conversion data is corresponding with corresponding advertisement entity, the advertisement entity is here material, and advertisement entity can corresponding a plurality of conversion data, and this conversion data can be collected by material collection system.This conversion data can comprise the data of two kinds of forms, can split the data and the data that cannot split form of form.Wherein, can split that the data of form are comprised of the data of the first kind and the data of Second Type.The data acquisition of the first kind is stored by the mode of character string, and the data of the first kind can be the unique identifications of advertisement entity corresponding to conversion data, and above-mentioned character string can be used as the key word of advertisement entity corresponding to this conversion data.The data acquisition of Second Type is stored by the mode of key-value pair, and this key-value pair can be data type and data value pair, for example, and the value equity of the value of floating type data and floating type data to, shaping data and shaping data.The data that cannot split form do not comprise the data of the first kind, only comprise the data of Second Type, that is, the data that cannot split form do not have the uniquely identified character string as this conversion data.
Data source (Data Source) is to provide device or the original media of certain required data.As its name suggests, the source that data source is data.In data source, store the information that all building databases connect, as finding in file system by designated file name file, by correct DSN is provided, can find corresponding database to connect.
In embodiments of the present invention, acquiring unit 10 obtains data from a plurality of different pieces of informations source for obtaining many data, acquiring unit 10 can obtain by a plurality of database tables the data in a plurality of different pieces of informations source, it should be noted that, the data in a plurality of different pieces of informations source can be for describing the data of same internet entity, for example, Internet advertising for certain part clothes, acquiring unit 10 can be provided by the data that provide from advertiser, the data of the data that user oneself collects and website monitoring etc., and above-mentioned data be describe this part clothes relevant information (as, the title of clothes, quantity on order, unit price and total price etc.) data.
Split cells 20 is for be the first kind by the Data Division in a plurality of different pieces of informations source data and data of Second Type.
In embodiments of the present invention, can detect the ingredient of the data in a plurality of different pieces of informations source, when detecting the character of data except part, split cells 20 is partly split as this character string the data of the first kind, and remaining data division is as the second data type.
Particularly, split cells 20 can be material data and conversion data by the Data Division in a plurality of different pieces of informations source.Wherein, material data is the keyword strings of description material, the interaction data that conversion data produces when material is operated for user.
In embodiments of the present invention, the data of the first kind that split cells 20 can become Data Division are stored in the first tables of data, and the data of the Second Type that conversion data can be split into are stored in the second tables of data, wherein, the first tables of data and the second tables of data can be two lists of same application system, and the first tables of data and the second tables of data are respectively used to the data of store character string format and the data of key-value pair form.Particularly, after the conversion data in a plurality of different pieces of informations source is split as to the data of the first kind and the data of Second Type, should also comprise for the Data Integration treating apparatus of a plurality of data sources: search unit, the first storage unit and the second storage unit.
Search unit for searching the first tables of data and the second tables of data setting in advance.
For example, in embodiments of the present invention, searching the first tables of data and the second tables of data that set in advance can be to search respectively material (entrance) list and conversion (conversion) list setting in advance in material management system, wherein, material list can be for storage material data, and transforming list can be for storage conversion data.Material data can comprise the data of the types such as advertisement promotion plan, unit, keyword, advertisement source, advertising media, search engine, access time and sign, and wherein, the part type of material data can be sky.Conversion data can be with the form storage data of key-value pair, and conversion data can to comprise some can be empty extended attribute, for example, conversion data can comprise the attribute of expansion sign etc.
The first storage unit is for being stored to the first tables of data by the data of the first kind.
In embodiments of the present invention, the first storage unit can be stored to the data of the first kind in the first tables of data according to preset order, and also the first storage unit can be stored to the data of the first kind in the first tables of data at random.
The second storage unit is for being stored to the second tables of data by the data of Second Type.
In embodiments of the present invention, the second storage unit can be stored to the data of Second Type in the second tables of data according to preset order, and also the second storage unit can be stored to the data of Second Type in the second tables of data at random.
Integral unit 30 is for carrying out Data Integration processing to the data of the data of the first kind and Second Type.
In embodiments of the present invention, integral unit 30 can be mated conversion data and material data, and according to the concrete configuration of material management system, partly (for example first delete conversion data part that user do not pay close attention to and invalid conversion data, when user only pays close attention to the conversion data placing an order in certain advertisement, just can will only owing to browsing the conversion data producing, delete.), more remaining conversion data and material data are carried out to attribution processing, finally imported in the object table of material management system and showed user.
By the present invention, solved in correlation technique and can not carry out to the conversion data of separate sources the problem of unified integration, and then reached the effect of the conversion data of unified integration separate sources.
Fig. 4 is according to the schematic diagram of the Data Integration treating apparatus for a plurality of data sources of second embodiment of the invention.
As shown in Figure 4, this embodiment can be used as preferred implementation embodiment illustrated in fig. 3, the data processing equipment that should represent for Webpage click is except comprising: acquiring unit 10, split cells 20 and integral unit 30, also comprise: judging unit 40, first adds unit 50 and second to add unit 60.
The effect of acquiring unit 10, split cells 20 and integral unit 30 with embodiment illustrated in fig. 5 in identical, do not repeat them here.
Judging unit 40 is for judging whether the data in a plurality of different pieces of informations source have sign.
In embodiments of the present invention, before the conversion data in a plurality of different pieces of informations source is split as to the data of the first kind and the data of Second Type, judging unit 40 can judge whether the data in a plurality of different pieces of informations source have sign.Particularly, whether the data that judging unit 40 can detect a plurality of different pieces of informations source have string data, when the above-mentioned data of detection have string data, the data of judging a plurality of different pieces of informations source have sign, wherein, this sign is above-mentioned detected character string, otherwise when the above-mentioned data of detection do not have string data, the data of judging a plurality of different pieces of informations source do not identify.It should be noted that, in embodiments of the present invention, the data acquisition only with a plurality of different pieces of informations source of sign can carry out deconsolidation process,, the Data Division in a plurality of different pieces of informations source can be become to material data and conversion data, and after splitting, material data and conversion data can identify by the sign of the data in the plurality of different pieces of information source.In embodiments of the present invention, data to a plurality of different pieces of informations source not identifying can not be carried out resolution, but can first for having a character string of the random generation of data in a plurality of different pieces of informations source of sign, this as it, not identify, this sign is material data, and then the data in above-mentioned a plurality of different pieces of informations source are entered to be split as material data and conversion data, and the material data that fractionation is obtained and conversion data are carried out uniquely tagged by this sign.It should be noted that, the sign corresponding from the different pieces of information in a plurality of different pieces of informations source is different, and the sign of each these data is unique.
If first adds unit 50 to have sign for judging the data in a plurality of different pieces of informations source, each data is all added to existing sign.
In embodiments of the present invention, if judge the data in a plurality of different pieces of informations source, there is sign, first adds unit 50 all to add existing sign to refer to each data the data of the data of the first kind after each Data Division and Second Type are added respectively to existing sign, for example, first adds unit 50 to add existing sign in the corresponding position with extended attribute in material list and conversion list respectively to the material data after each Data Division and conversion data.
In embodiments of the present invention, first adds unit 50 can comprise fractionation module and add module.
Splitting module is the Keywords section and click part for adding the Data Division of inlet identity.
For example, the click data of session of take is example, wherein, session has sign sessionkey, split module and this session data can be split as to the Keywords section of this session and click part, and the Keywords section can be stored in material list and transform in list as material data and conversion data respectively with click part.
Add module for the Keywords section and click part are added respectively to sign.
For example, according to above-mentioned example, the position that adds module can store respectively the respective extension attribute of the Keywords section and click part in material list with in transforming list adds sign sessionkey.
If second adds unit 60 not identify for judging the data in a plurality of different pieces of informations source, each data is all added to unique identification.
In embodiments of the present invention, if judging the data in a plurality of different pieces of informations source does not identify, second adds unit 60 all to add unique identification to refer to adding the data of the first kind and the data of Second Type after each Data Division of uniquely identified to add respectively unique identification to each data, for example, second adds unit 60 to add material data and conversion data after uniquely identified Data Division to add unique sign in the corresponding position with extended attribute in material list and conversion list respectively to each.
As can be seen from the above description, the present invention has realized different pieces of information source has been split into the conversion data that can identify and the object of material data, and then has reached the effect of the data of unified integration separate sources.
It should be noted that, in the step shown in the process flow diagram of accompanying drawing, can in the computer system such as one group of computer executable instructions, carry out, and, although there is shown logical order in flow process, but in some cases, can carry out shown or described step with the order being different from herein.
Obviously, those skilled in the art should be understood that, above-mentioned each module of the present invention or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on the network that a plurality of calculation elements form, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in memory storage and be carried out by calculation element, or they are made into respectively to each integrated circuit modules, or a plurality of modules in them or step are made into single integrated circuit module to be realized.Like this, the present invention is not restricted to any specific hardware and software combination.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.