CN103605715A - Method and device used for data integration processing of multiple data sources - Google Patents

Method and device used for data integration processing of multiple data sources Download PDF

Info

Publication number
CN103605715A
CN103605715A CN201310566735.6A CN201310566735A CN103605715A CN 103605715 A CN103605715 A CN 103605715A CN 201310566735 A CN201310566735 A CN 201310566735A CN 103605715 A CN103605715 A CN 103605715A
Authority
CN
China
Prior art keywords
data
different pieces
sign
type
informations source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310566735.6A
Other languages
Chinese (zh)
Other versions
CN103605715B (en
Inventor
陈改静
杨基彬
蔡波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201310566735.6A priority Critical patent/CN103605715B/en
Publication of CN103605715A publication Critical patent/CN103605715A/en
Application granted granted Critical
Publication of CN103605715B publication Critical patent/CN103605715B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device used for data integration processing of multiple data sources. The method includes: obtaining data from the multiple data sources; splitting the data from the multiple data sources into a first data type and a second data type; and performing data integration processing to the first data type and the second data type. By the aid of the method and the device, a problem that data from different sources can not be integrated uniformly in related technologies is solved, and furthermore, an effect of uniformly integrating the data from different sources is achieved.

Description

Data Integration disposal route and device for a plurality of data sources
Technical field
The present invention relates to data processing field, in particular to a kind of Data Integration disposal route and device for a plurality of data sources.
Background technology
In Internet advertising field, after conversion data refers to that advertisement is thrown in by advertiser, due to user's registration, the data of behavior generation such as log in, browse, click and place an order.Conversion data can be from a plurality of platforms, and for example, conversion data can be from different data sources such as third-party agent business, advertisement launching platform, website monitoring itself.The data source that these are different, in a different manner, form and channel etc. provide conversion data to integrate for import systems such as data management systems.
In order to integrate the conversion data from different data sources, in correlation technique, adopt following scheme: first for the conversion data of the data source from different, according to predetermined format, carry out customized development, and then the data after customized development are integrated to processing.Although this scheme can be integrated processing to conversion data, but, because conversion data is from different data sources, their data layout and data types are separately varied, and the same conversion data of different times may be also different, this just need to carry out different customized developments to each conversion data, so this kind of scheme exists following shortcoming:
A) for the conversion data of every type, all need to carry out different customized developments, so cost of development is higher.
B) data source of client and conversion data is all varied, and like this, the version of customized development will be a lot, thereby the conversion data type of safeguarding is more and more, and then safeguards that the cost of various version also can improve.
For not carrying out to the data of separate sources the problem of unified integration in correlation technique, effective solution is not yet proposed at present.
Summary of the invention
Fundamental purpose of the present invention is to provide a kind of Data Integration disposal route and device for a plurality of data sources, to solve in correlation technique, can not carry out unified integration problem to the data of separate sources.
To achieve these goals, according to an aspect of the present invention, provide a kind of Data Integration disposal route for a plurality of data sources.Should comprise for the Data Integration disposal route of a plurality of data sources: obtain the data from a plurality of different pieces of informations source; The data that are the first kind by the Data Division in a plurality of different pieces of informations source and the data of Second Type; And the data of the data of the first kind and Second Type are carried out to Data Integration processing.
Further, before the data of the data that are the first kind by the Data Division in a plurality of different pieces of informations source and Second Type, this Data Integration disposal route also comprises: whether the data that judge a plurality of different pieces of informations source have sign; If judge the data in a plurality of different pieces of informations source, there is sign, each data is all added to existing sign; And if the data of judging a plurality of different pieces of informations source are less than sign, each data all added to unique identification.
Further, in such a way each data is added to sign: the Data Division that need to add inlet identity is keyword fragment and clicks part; And keyword fragment and click part are added respectively to sign.
Further, after the data of the data that are the first kind by the Data Division in a plurality of different pieces of informations source and Second Type, Data Integration disposal route also comprises: search the first tables of data and the second tables of data that set in advance; The data of the first kind are stored in the first tables of data; And the data of Second Type are stored in the second tables of data.
Further, the data that are the first kind by the Data Division in a plurality of different pieces of informations source and the data of Second Type comprise: by the Data Division in a plurality of different pieces of informations source, be material data and conversion data.
To achieve these goals, according to a further aspect in the invention, provide a kind of Data Integration treating apparatus for a plurality of data sources.Should comprise for the Data Integration treating apparatus of a plurality of data sources: acquiring unit, for obtaining the data from a plurality of different pieces of informations source; Split cells, for be the first kind by the Data Division in a plurality of different pieces of informations source data and data of Second Type; And integral unit, for the data of the data of the first kind and Second Type are carried out to Data Integration processing.
Further, also comprise: judging unit, before the data for the data being the first kind by the Data Division in a plurality of different pieces of informations source and Second Type, judges whether the data in a plurality of different pieces of informations source have sign; First adds unit, if having sign for judging the data in a plurality of different pieces of informations source, each data is all added to existing sign; And second add unit, if for the data of judging a plurality of different pieces of informations source sign not, each data is all added to unique identification.
Further, first adds unit to comprise: splitting module, is keyword fragment and click part for adding the Data Division of inlet identity; And add module, for keyword fragment and click part are added respectively to sign.
Further, after the data of the data that are the first kind by the Data Division in a plurality of different pieces of informations source and Second Type, also comprise: search unit, for searching the first tables of data and the second tables of data setting in advance; The first storage unit, for being stored to the first tables of data by the data of the first kind; And second storage unit, for the data of Second Type are stored to the second tables of data.
Further, split cells is also for being material data and conversion data by the Data Division in a plurality of different pieces of informations source.
By the present invention, adopt and obtain the data from a plurality of different pieces of informations source; The data that are the first kind by the Data Division in a plurality of different pieces of informations source and the data of Second Type; And the data of the data of the first kind and Second Type are carried out to Data Integration processing, solved in correlation technique and can not carry out to the data of separate sources the problem of unified integration, and then reached the effect of the data of unified integration separate sources.
Accompanying drawing explanation
The accompanying drawing that forms the application's a part is used to provide a further understanding of the present invention, and schematic description and description of the present invention is used for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is according to the process flow diagram of the Data Integration disposal route for a plurality of data sources of first embodiment of the invention;
Fig. 2 is according to the process flow diagram of the Data Integration disposal route for a plurality of data sources of second embodiment of the invention;
Fig. 3 is according to the schematic diagram of the Data Integration treating apparatus for a plurality of data sources of first embodiment of the invention; And
Fig. 4 is according to the schematic diagram of the Data Integration treating apparatus for a plurality of data sources of second embodiment of the invention.
Embodiment
It should be noted that, in the situation that not conflicting, embodiment and the feature in embodiment in the application can combine mutually.Describe below with reference to the accompanying drawings and in conjunction with the embodiments the present invention in detail.
In order to make those skilled in the art better understand the present invention program, below in conjunction with the accompanying drawing in the embodiment of the present invention, to being clearly and completely described in the embodiment of the present invention, obviously, described embodiment is only the embodiment of a part of the present invention, rather than whole embodiment.Embodiment based in the present invention, does not make the every other embodiment obtaining under creative work prerequisite those of ordinary skills, all should belong to protection scope of the present invention.
It should be noted that, the term " first " in instructions of the present invention and claims and above-mentioned accompanying drawing, " second " etc. are for distinguishing similar object, and needn't be for describing specific order or precedence.The data that should be appreciated that such use suitably can exchanged in situation, so as embodiments of the invention described herein can with except diagram here or describe those order enforcement.In addition, term " comprises " and " having " and their any distortion, is intended to be to cover not exclusive comprising.
According to embodiments of the invention, a kind of Data Integration disposal route for a plurality of data sources is provided, should for the Data Integration disposal route of a plurality of data sources, be used for the data of separate sources to carry out unified integration.Should may operate in computer-processing equipment for the Data Integration disposal route of data.
Fig. 1 is according to the process flow diagram of the Data Integration disposal route for a plurality of data sources of first embodiment of the invention.
As shown in Figure 1, should comprise that following step S101 was to step S103 for Data Integration disposal routes of a plurality of data sources:
Step S101, obtains the data from a plurality of different pieces of informations source.
In Internet advertising field, from the data in a plurality of different pieces of informations source, comprise conversion data and material data.After conversion data refers to that advertisement is thrown in by advertiser, due to user's registration, the data of behavior generation such as log in, browse, click and place an order, conversion data is corresponding with corresponding advertisement entity, the advertisement entity is here material, and advertisement entity can corresponding a plurality of conversion data, and this conversion data can be collected by material collection system.This conversion data can comprise the data of two kinds of forms, can split the data and the data that cannot split form of form.Wherein, can split that the data of form are comprised of the data of the first kind and the data of Second Type.The data acquisition of the first kind is stored by the mode of character string, and the data of the first kind can be the unique identifications of advertisement entity corresponding to conversion data, and above-mentioned character string can be used as the key word of advertisement entity corresponding to this conversion data.The data acquisition of Second Type is stored by the mode of key-value pair, and this key-value pair can be data type and data value pair, for example, and the value equity of the value of floating type data and floating type data to, shaping data and shaping data.The data that cannot split form do not comprise the data of the first kind, only comprise the data of Second Type, that is, the data that cannot split form do not have the uniquely identified character string as this conversion data.
Data source (Data Source) is to provide device or the original media of certain required data.As its name suggests, the source that data source is data.In data source, store the information that all building databases connect, as finding in file system by designated file name file, by correct DSN is provided, can find corresponding database to connect.
In embodiments of the present invention, obtain data from a plurality of different pieces of informations source for obtaining many data, can obtain by a plurality of database tables the data in a plurality of different pieces of informations source, it should be noted that, the data in a plurality of different pieces of informations source can be for describing the data of same internet entity, for example, Internet advertising for certain part clothes, the data that provide from advertiser can be provided, the data of the data that user oneself collects and website monitoring etc., and above-mentioned data be describe this part clothes relevant information (as, the title of clothes, quantity on order, unit price and total price etc.) data.
Step S102, the data that are the first kind by the Data Division in a plurality of different pieces of informations source and the data of Second Type.
In embodiments of the present invention, can detect the ingredient of the data in a plurality of different pieces of informations source, when detecting the character of data except part, this character string is partly split as to the data of the first kind, remaining data division is as the second data type.
Particularly, can be material data and conversion data by the Data Division in a plurality of different pieces of informations source.Wherein, material data is the keyword strings of description material, the interaction data that conversion data produces when material is operated for user.
In embodiments of the present invention, the data of the first kind that Data Division can be become are stored in the first tables of data, and the data of the Second Type that conversion data can be split into are stored in the second tables of data, wherein, the first tables of data and the second tables of data can be two lists of same application system, and the first tables of data and the second tables of data are respectively used to the data of store character string format and the data of key-value pair form.Particularly, after step S102 is split as the data of the first kind and the data of Second Type by the conversion data in a plurality of different pieces of informations source, should also comprise for the Data Integration disposal route of a plurality of data sources:
Step 1, searches the first tables of data and the second tables of data that set in advance.
For example, in embodiments of the present invention, searching the first tables of data and the second tables of data that set in advance can be to search respectively material (entrance) list and conversion (conversion) list setting in advance in material management system, wherein, material list can be for storage material data, and transforming list can be for storage conversion data.Material data can comprise the data of the types such as advertisement promotion plan, unit, keyword, advertisement source, advertising media, search engine, access time and sign, and wherein, the part type of material data can be sky.Conversion data can be with the form storage data of key-value pair, and conversion data can to comprise some can be empty extended attribute, for example, conversion data can comprise the attribute of expansion sign etc.
Step 2, is stored to the data of the first kind in the first tables of data.
In embodiments of the present invention, can the data of the first kind be stored in the first tables of data according to preset order, also can at random the data of the first kind be stored in the first tables of data.
Step 3, is stored to the data of Second Type in the second tables of data.
In embodiments of the present invention, can the data of Second Type be stored in the second tables of data according to preset order, also can at random the data of Second Type be stored in the second tables of data.
Step S103, carries out Data Integration processing to the data of the data of the first kind and Second Type.
In embodiments of the present invention, after step S103, conversion data and material data can be mated, and according to the concrete configuration of material management system, partly (for example first delete conversion data part that user do not pay close attention to and invalid conversion data, when user only pays close attention to the conversion data placing an order in certain advertisement, just can will only owing to browsing the conversion data producing, delete.), more remaining conversion data and material data are carried out to attribution processing, finally imported in the object table of material management system and showed user.
By the present invention, adopt and obtain the conversion data from a plurality of different pieces of informations source; The conversion data in a plurality of different pieces of informations source is split as to the data of the first kind and the data of Second Type; And the data of the data of the first kind and Second Type are carried out to Data Integration processing, solved in correlation technique and can not carry out to the conversion data of separate sources the problem of unified integration, and then reached the effect of the conversion data of unified integration separate sources.
Fig. 2 is according to the process flow diagram of the Data Integration disposal route for a plurality of data sources of second embodiment of the invention.
As shown in Figure 2, should comprise that following step S201 was to step S205 for the Data Integration disposal route of a plurality of data sources, this embodiment can be used as preferred implementation embodiment illustrated in fig. 1.
Step S201, judges whether the data in a plurality of different pieces of informations source have sign.
In embodiments of the present invention, before the conversion data in a plurality of different pieces of informations source is split as to the data of the first kind and the data of Second Type,, before the step S101 shown in Fig. 1, can judge whether the data in a plurality of different pieces of informations source have sign.Particularly, whether the data that can detect a plurality of different pieces of informations source have string data, when the above-mentioned data of detection have string data, the data of judging a plurality of different pieces of informations source have sign, wherein, this sign is above-mentioned detected character string, otherwise when the above-mentioned data of detection do not have string data, the data of judging a plurality of different pieces of informations source do not identify.It should be noted that, in embodiments of the present invention, the data acquisition only with a plurality of different pieces of informations source of sign can carry out deconsolidation process,, the Data Division in a plurality of different pieces of informations source can be become to material data and conversion data, and after splitting, material data and conversion data can identify by the sign of the data in the plurality of different pieces of information source.In embodiments of the present invention, data to a plurality of different pieces of informations source not identifying can not be carried out resolution, but can first for having a character string of the random generation of data in a plurality of different pieces of informations source of sign, this as it, not identify, this sign is material data, and then the data in above-mentioned a plurality of different pieces of informations source are entered to be split as material data and conversion data, and the material data that fractionation is obtained and conversion data are carried out uniquely tagged by this sign.It should be noted that, the sign corresponding from the different pieces of information in a plurality of different pieces of informations source is different, and the sign of each these data is unique.
Step S202A, has sign if judge the data in a plurality of different pieces of informations source, each data is all added to existing sign.
In embodiments of the present invention, if judge the data in a plurality of different pieces of informations source, there is sign, to each data, all add existing sign to refer to the data of the data of the first kind after each Data Division and Second Type are added respectively to existing sign, for example, can add existing sign in the corresponding position with extended attribute in material list and conversion list respectively to the material data after each Data Division and conversion data.
In embodiments of the present invention, can to each data, add sign in such a way:
Step 1, the Data Division that need to add inlet identity is the Keywords section and clicks part.
For example, the click data of session of take is example, wherein, session has sign sessionkey, this session data can be split as to the Keywords section of this session and click part, and the Keywords section can be stored in material list and transform in list as material data and conversion data respectively with clicking partly.
Step 2, adds respectively sign by the Keywords section and click part.
For example, according to the example in above-mentioned steps 1, the position that can store respectively the respective extension attribute of the Keywords section and click part in material list with in transforming list adds sign sessionkey.
Step S202B, does not identify if judge the data in a plurality of different pieces of informations source, each data is all added to unique identification.
In embodiments of the present invention, if judging the data in a plurality of different pieces of informations source does not identify, to each data, all add unique identification to refer to adding the data of the first kind and the data of Second Type after each Data Division of uniquely identified to add respectively unique identification, for example, can add material data and conversion data after uniquely identified Data Division to add unique sign in the corresponding position with extended attribute in material list and conversion list respectively to each.
Step S203 is to step S205, respectively with step S101 embodiment illustrated in fig. 1 to step S103, do not repeat them here.
According to embodiments of the invention, a kind of Data Integration treating apparatus for a plurality of data sources is provided, should for the Data Integration treating apparatus of a plurality of data sources, be used for the data of separate sources to carry out unified integration.It should be noted that, the Data Integration treating apparatus for a plurality of data sources of the embodiment of the present invention also can be for carrying out the Data Integration disposal route for a plurality of data sources of the embodiment of the present invention, and the Data Integration disposal route for a plurality of data sources that the embodiment of the present invention provides can be carried out by the Data Integration treating apparatus for a plurality of data sources of the embodiment of the present invention.
Fig. 3 is according to the schematic diagram of the Data Integration treating apparatus for a plurality of data sources of first embodiment of the invention.
As shown in Figure 3, this device comprises: acquiring unit 10, split cells 20 and integral unit 30.
Acquiring unit 10 is for obtaining the data from a plurality of different pieces of informations source.In Internet advertising field, from the data in a plurality of different pieces of informations source, comprise conversion data and material data.After conversion data refers to that advertisement is thrown in by advertiser, due to user's registration, the data of behavior generation such as log in, browse, click and place an order, conversion data is corresponding with corresponding advertisement entity, the advertisement entity is here material, and advertisement entity can corresponding a plurality of conversion data, and this conversion data can be collected by material collection system.This conversion data can comprise the data of two kinds of forms, can split the data and the data that cannot split form of form.Wherein, can split that the data of form are comprised of the data of the first kind and the data of Second Type.The data acquisition of the first kind is stored by the mode of character string, and the data of the first kind can be the unique identifications of advertisement entity corresponding to conversion data, and above-mentioned character string can be used as the key word of advertisement entity corresponding to this conversion data.The data acquisition of Second Type is stored by the mode of key-value pair, and this key-value pair can be data type and data value pair, for example, and the value equity of the value of floating type data and floating type data to, shaping data and shaping data.The data that cannot split form do not comprise the data of the first kind, only comprise the data of Second Type, that is, the data that cannot split form do not have the uniquely identified character string as this conversion data.
Data source (Data Source) is to provide device or the original media of certain required data.As its name suggests, the source that data source is data.In data source, store the information that all building databases connect, as finding in file system by designated file name file, by correct DSN is provided, can find corresponding database to connect.
In embodiments of the present invention, acquiring unit 10 obtains data from a plurality of different pieces of informations source for obtaining many data, acquiring unit 10 can obtain by a plurality of database tables the data in a plurality of different pieces of informations source, it should be noted that, the data in a plurality of different pieces of informations source can be for describing the data of same internet entity, for example, Internet advertising for certain part clothes, acquiring unit 10 can be provided by the data that provide from advertiser, the data of the data that user oneself collects and website monitoring etc., and above-mentioned data be describe this part clothes relevant information (as, the title of clothes, quantity on order, unit price and total price etc.) data.
Split cells 20 is for be the first kind by the Data Division in a plurality of different pieces of informations source data and data of Second Type.
In embodiments of the present invention, can detect the ingredient of the data in a plurality of different pieces of informations source, when detecting the character of data except part, split cells 20 is partly split as this character string the data of the first kind, and remaining data division is as the second data type.
Particularly, split cells 20 can be material data and conversion data by the Data Division in a plurality of different pieces of informations source.Wherein, material data is the keyword strings of description material, the interaction data that conversion data produces when material is operated for user.
In embodiments of the present invention, the data of the first kind that split cells 20 can become Data Division are stored in the first tables of data, and the data of the Second Type that conversion data can be split into are stored in the second tables of data, wherein, the first tables of data and the second tables of data can be two lists of same application system, and the first tables of data and the second tables of data are respectively used to the data of store character string format and the data of key-value pair form.Particularly, after the conversion data in a plurality of different pieces of informations source is split as to the data of the first kind and the data of Second Type, should also comprise for the Data Integration treating apparatus of a plurality of data sources: search unit, the first storage unit and the second storage unit.
Search unit for searching the first tables of data and the second tables of data setting in advance.
For example, in embodiments of the present invention, searching the first tables of data and the second tables of data that set in advance can be to search respectively material (entrance) list and conversion (conversion) list setting in advance in material management system, wherein, material list can be for storage material data, and transforming list can be for storage conversion data.Material data can comprise the data of the types such as advertisement promotion plan, unit, keyword, advertisement source, advertising media, search engine, access time and sign, and wherein, the part type of material data can be sky.Conversion data can be with the form storage data of key-value pair, and conversion data can to comprise some can be empty extended attribute, for example, conversion data can comprise the attribute of expansion sign etc.
The first storage unit is for being stored to the first tables of data by the data of the first kind.
In embodiments of the present invention, the first storage unit can be stored to the data of the first kind in the first tables of data according to preset order, and also the first storage unit can be stored to the data of the first kind in the first tables of data at random.
The second storage unit is for being stored to the second tables of data by the data of Second Type.
In embodiments of the present invention, the second storage unit can be stored to the data of Second Type in the second tables of data according to preset order, and also the second storage unit can be stored to the data of Second Type in the second tables of data at random.
Integral unit 30 is for carrying out Data Integration processing to the data of the data of the first kind and Second Type.
In embodiments of the present invention, integral unit 30 can be mated conversion data and material data, and according to the concrete configuration of material management system, partly (for example first delete conversion data part that user do not pay close attention to and invalid conversion data, when user only pays close attention to the conversion data placing an order in certain advertisement, just can will only owing to browsing the conversion data producing, delete.), more remaining conversion data and material data are carried out to attribution processing, finally imported in the object table of material management system and showed user.
By the present invention, solved in correlation technique and can not carry out to the conversion data of separate sources the problem of unified integration, and then reached the effect of the conversion data of unified integration separate sources.
Fig. 4 is according to the schematic diagram of the Data Integration treating apparatus for a plurality of data sources of second embodiment of the invention.
As shown in Figure 4, this embodiment can be used as preferred implementation embodiment illustrated in fig. 3, the data processing equipment that should represent for Webpage click is except comprising: acquiring unit 10, split cells 20 and integral unit 30, also comprise: judging unit 40, first adds unit 50 and second to add unit 60.
The effect of acquiring unit 10, split cells 20 and integral unit 30 with embodiment illustrated in fig. 5 in identical, do not repeat them here.
Judging unit 40 is for judging whether the data in a plurality of different pieces of informations source have sign.
In embodiments of the present invention, before the conversion data in a plurality of different pieces of informations source is split as to the data of the first kind and the data of Second Type, judging unit 40 can judge whether the data in a plurality of different pieces of informations source have sign.Particularly, whether the data that judging unit 40 can detect a plurality of different pieces of informations source have string data, when the above-mentioned data of detection have string data, the data of judging a plurality of different pieces of informations source have sign, wherein, this sign is above-mentioned detected character string, otherwise when the above-mentioned data of detection do not have string data, the data of judging a plurality of different pieces of informations source do not identify.It should be noted that, in embodiments of the present invention, the data acquisition only with a plurality of different pieces of informations source of sign can carry out deconsolidation process,, the Data Division in a plurality of different pieces of informations source can be become to material data and conversion data, and after splitting, material data and conversion data can identify by the sign of the data in the plurality of different pieces of information source.In embodiments of the present invention, data to a plurality of different pieces of informations source not identifying can not be carried out resolution, but can first for having a character string of the random generation of data in a plurality of different pieces of informations source of sign, this as it, not identify, this sign is material data, and then the data in above-mentioned a plurality of different pieces of informations source are entered to be split as material data and conversion data, and the material data that fractionation is obtained and conversion data are carried out uniquely tagged by this sign.It should be noted that, the sign corresponding from the different pieces of information in a plurality of different pieces of informations source is different, and the sign of each these data is unique.
If first adds unit 50 to have sign for judging the data in a plurality of different pieces of informations source, each data is all added to existing sign.
In embodiments of the present invention, if judge the data in a plurality of different pieces of informations source, there is sign, first adds unit 50 all to add existing sign to refer to each data the data of the data of the first kind after each Data Division and Second Type are added respectively to existing sign, for example, first adds unit 50 to add existing sign in the corresponding position with extended attribute in material list and conversion list respectively to the material data after each Data Division and conversion data.
In embodiments of the present invention, first adds unit 50 can comprise fractionation module and add module.
Splitting module is the Keywords section and click part for adding the Data Division of inlet identity.
For example, the click data of session of take is example, wherein, session has sign sessionkey, split module and this session data can be split as to the Keywords section of this session and click part, and the Keywords section can be stored in material list and transform in list as material data and conversion data respectively with click part.
Add module for the Keywords section and click part are added respectively to sign.
For example, according to above-mentioned example, the position that adds module can store respectively the respective extension attribute of the Keywords section and click part in material list with in transforming list adds sign sessionkey.
If second adds unit 60 not identify for judging the data in a plurality of different pieces of informations source, each data is all added to unique identification.
In embodiments of the present invention, if judging the data in a plurality of different pieces of informations source does not identify, second adds unit 60 all to add unique identification to refer to adding the data of the first kind and the data of Second Type after each Data Division of uniquely identified to add respectively unique identification to each data, for example, second adds unit 60 to add material data and conversion data after uniquely identified Data Division to add unique sign in the corresponding position with extended attribute in material list and conversion list respectively to each.
As can be seen from the above description, the present invention has realized different pieces of information source has been split into the conversion data that can identify and the object of material data, and then has reached the effect of the data of unified integration separate sources.
It should be noted that, in the step shown in the process flow diagram of accompanying drawing, can in the computer system such as one group of computer executable instructions, carry out, and, although there is shown logical order in flow process, but in some cases, can carry out shown or described step with the order being different from herein.
Obviously, those skilled in the art should be understood that, above-mentioned each module of the present invention or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on the network that a plurality of calculation elements form, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in memory storage and be carried out by calculation element, or they are made into respectively to each integrated circuit modules, or a plurality of modules in them or step are made into single integrated circuit module to be realized.Like this, the present invention is not restricted to any specific hardware and software combination.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1. for a Data Integration disposal route for a plurality of data sources, it is characterized in that, comprising:
Obtain the data from a plurality of different pieces of informations source;
The data that are the first kind by the Data Division in described a plurality of different pieces of informations source and the data of Second Type; And
The data of the data of the described first kind and described Second Type are carried out to Data Integration processing.
2. Data Integration disposal route according to claim 1, is characterized in that, before the data of the data that are the first kind by the Data Division in described a plurality of different pieces of informations source and Second Type, described Data Integration disposal route also comprises:
Whether the data that judge described a plurality of different pieces of informations source have sign;
If judge the data in described a plurality of different pieces of informations source, there is sign, each data is all added to existing sign; And
If judge the data in described a plurality of different pieces of informations source, do not identify, each data is all added to unique identification.
3. Data Integration disposal route according to claim 2, is characterized in that, in such a way each data is added to sign:
The Data Division that need to add inlet identity is keyword fragment and clicks part; And
Described keyword fragment and described click part are added respectively to sign.
4. Data Integration disposal route according to claim 1, is characterized in that, after the data of the data that are the first kind by the Data Division in described a plurality of different pieces of informations source and Second Type, described Data Integration disposal route also comprises:
Search the first tables of data and the second tables of data that set in advance;
The data of the described first kind are stored in described the first tables of data; And
The data of described Second Type are stored in described the second tables of data.
5. Data Integration disposal route according to claim 1, is characterized in that, the data that are the first kind by the Data Division in described a plurality of different pieces of informations source and the data of Second Type comprise:
By the Data Division in described a plurality of different pieces of informations source, be material data and conversion data.
6. for a Data Integration treating apparatus for a plurality of data sources, it is characterized in that, comprising:
Acquiring unit, for obtaining the data from a plurality of different pieces of informations source;
Split cells, for be the first kind by the Data Division in described a plurality of different pieces of informations source data and data of Second Type; And
Integral unit, for carrying out Data Integration processing to the data of the data of the described first kind and described Second Type.
7. Data Integration treating apparatus according to claim 6, is characterized in that, also comprises:
Judging unit, before the data for the data being the first kind by the Data Division in described a plurality of different pieces of informations source and Second Type, judges whether the data in described a plurality of different pieces of informations source have sign;
First adds unit, if having sign for judging the data in described a plurality of different pieces of informations source, each data is all added to existing sign; And
Second adds unit, if do not identified for judging the data in described a plurality of different pieces of informations source, each data is all added to unique identification.
8. Data Integration treating apparatus according to claim 7, is characterized in that, described first adds unit to comprise:
Splitting module, is keyword fragment and click part for adding the Data Division of inlet identity; And
Add module, for described keyword fragment and described click part are added respectively to sign.
9. Data Integration treating apparatus according to claim 6, is characterized in that, after the data of the data that are the first kind by the Data Division in described a plurality of different pieces of informations source and Second Type, also comprises:
Search unit, for searching the first tables of data and the second tables of data setting in advance;
The first storage unit, for being stored to the data of the described first kind described the first tables of data; And
The second storage unit, for being stored to the data of described Second Type described the second tables of data.
10. Data Integration treating apparatus according to claim 6, is characterized in that, described split cells is also for being material data and conversion data by the Data Division in described a plurality of different pieces of informations source.
CN201310566735.6A 2013-11-14 2013-11-14 Data Integration treating method and apparatus for multiple data sources Active CN103605715B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310566735.6A CN103605715B (en) 2013-11-14 2013-11-14 Data Integration treating method and apparatus for multiple data sources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310566735.6A CN103605715B (en) 2013-11-14 2013-11-14 Data Integration treating method and apparatus for multiple data sources

Publications (2)

Publication Number Publication Date
CN103605715A true CN103605715A (en) 2014-02-26
CN103605715B CN103605715B (en) 2017-09-08

Family

ID=50123938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310566735.6A Active CN103605715B (en) 2013-11-14 2013-11-14 Data Integration treating method and apparatus for multiple data sources

Country Status (1)

Country Link
CN (1) CN103605715B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463460A (en) * 2014-12-03 2015-03-25 北京国双科技有限公司 Method and device for processing scheduling information for network data delivery
CN104504029A (en) * 2014-12-11 2015-04-08 北京国双科技有限公司 Data transformation information processing method and device
CN105262838A (en) * 2015-11-03 2016-01-20 广州市优普计算机有限公司 Cloud computing system
CN105335896A (en) * 2014-08-13 2016-02-17 国家电网公司 Multi-source heterogeneous data processing method and device used for power grid
CN107729519A (en) * 2017-10-27 2018-02-23 上海数据交易中心有限公司 Appraisal procedure and device, terminal based on multi-source multidimensional data
CN108307081A (en) * 2018-02-23 2018-07-20 北京奇虎科技有限公司 Harass batch data processing method and processing device
CN108446301A (en) * 2018-01-26 2018-08-24 阿里巴巴集团控股有限公司 Service scripts splits method of summary, device and equipment
CN109104468A (en) * 2018-07-25 2018-12-28 河南太龙药业股份有限公司 A kind of Chinese materia medica preparation production process data acquisition system
CN110046942A (en) * 2019-04-25 2019-07-23 秒针信息技术有限公司 A kind of method and device for launching data processing
CN110851506A (en) * 2018-07-25 2020-02-28 上海柯林布瑞信息技术有限公司 Clinical big data searching method and device, storage medium and server
WO2020087962A1 (en) 2018-11-02 2020-05-07 珠海赛纳三维科技有限公司 Color 3d printing method, printing apparatus and terminal device
CN112992301A (en) * 2019-12-02 2021-06-18 金色熊猫有限公司 Data processing method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050120342A1 (en) * 2001-01-17 2005-06-02 International Business Machines Corporation Mapping data from multiple data sources into a single or multiple reusable software components
CN102567418A (en) * 2010-12-23 2012-07-11 北大方正集团有限公司 Methods and devices for integrating and searching data
CN102567335A (en) * 2010-12-15 2012-07-11 上海杉达学院 Service system based on heterogeneous data
CN103020227A (en) * 2012-12-13 2013-04-03 中国银行股份有限公司 Data processing method and system in computer equipment
CN103294754A (en) * 2013-02-04 2013-09-11 税友软件集团股份有限公司 Splitting and merging method and system for mass data loading
CN103309907A (en) * 2012-03-16 2013-09-18 上海安捷力信息系统有限公司 Method and system for standardized processing of service data from different sources

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050120342A1 (en) * 2001-01-17 2005-06-02 International Business Machines Corporation Mapping data from multiple data sources into a single or multiple reusable software components
CN102567335A (en) * 2010-12-15 2012-07-11 上海杉达学院 Service system based on heterogeneous data
CN102567418A (en) * 2010-12-23 2012-07-11 北大方正集团有限公司 Methods and devices for integrating and searching data
CN103309907A (en) * 2012-03-16 2013-09-18 上海安捷力信息系统有限公司 Method and system for standardized processing of service data from different sources
CN103020227A (en) * 2012-12-13 2013-04-03 中国银行股份有限公司 Data processing method and system in computer equipment
CN103294754A (en) * 2013-02-04 2013-09-11 税友软件集团股份有限公司 Splitting and merging method and system for mass data loading

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335896A (en) * 2014-08-13 2016-02-17 国家电网公司 Multi-source heterogeneous data processing method and device used for power grid
CN104463460A (en) * 2014-12-03 2015-03-25 北京国双科技有限公司 Method and device for processing scheduling information for network data delivery
CN104504029A (en) * 2014-12-11 2015-04-08 北京国双科技有限公司 Data transformation information processing method and device
CN104504029B (en) * 2014-12-11 2018-06-26 北京国双科技有限公司 Data convert information processing method and device
CN105262838A (en) * 2015-11-03 2016-01-20 广州市优普计算机有限公司 Cloud computing system
CN105262838B (en) * 2015-11-03 2019-01-15 广州市优普计算机有限公司 A kind of cloud computing system
CN107729519B (en) * 2017-10-27 2020-06-09 上海数据交易中心有限公司 Multi-source multi-dimensional data-based evaluation method and device, and terminal
CN107729519A (en) * 2017-10-27 2018-02-23 上海数据交易中心有限公司 Appraisal procedure and device, terminal based on multi-source multidimensional data
CN108446301A (en) * 2018-01-26 2018-08-24 阿里巴巴集团控股有限公司 Service scripts splits method of summary, device and equipment
CN108446301B (en) * 2018-01-26 2021-10-29 创新先进技术有限公司 Business file splitting and summarizing method, device and equipment
CN108307081A (en) * 2018-02-23 2018-07-20 北京奇虎科技有限公司 Harass batch data processing method and processing device
CN109104468A (en) * 2018-07-25 2018-12-28 河南太龙药业股份有限公司 A kind of Chinese materia medica preparation production process data acquisition system
CN110851506A (en) * 2018-07-25 2020-02-28 上海柯林布瑞信息技术有限公司 Clinical big data searching method and device, storage medium and server
WO2020087962A1 (en) 2018-11-02 2020-05-07 珠海赛纳三维科技有限公司 Color 3d printing method, printing apparatus and terminal device
CN110046942A (en) * 2019-04-25 2019-07-23 秒针信息技术有限公司 A kind of method and device for launching data processing
CN112992301A (en) * 2019-12-02 2021-06-18 金色熊猫有限公司 Data processing method and device, electronic equipment and storage medium
CN112992301B (en) * 2019-12-02 2024-03-29 金色熊猫有限公司 Data processing method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN103605715B (en) 2017-09-08

Similar Documents

Publication Publication Date Title
CN103605715A (en) Method and device used for data integration processing of multiple data sources
CN106682150B (en) Information processing method and device
CN106383887B (en) Method and system for collecting, recommending and displaying environment-friendly news data
US8166013B2 (en) Method and system for crawling, mapping and extracting information associated with a business using heuristic and semantic analysis
Purves et al. The design and implementation of SPIRIT: a spatially aware search engine for information retrieval on the Internet
CN103294781B (en) A kind of method and apparatus for processing page data
US20070198506A1 (en) System and method for context-based knowledge search, tagging, collaboration, management, and advertisement
US20110302148A1 (en) System and Method for Indexing Food Providers and Use of the Index in Search Engines
CN103577549A (en) Crowd portrayal system and method based on microblog label
CN104102639B (en) Popularization triggering method based on text classification and device
Venkataramani et al. Discovery of technical expertise from open source code repositories
US20090240670A1 (en) Uniform resource identifier alignment
Nesi et al. Geographical localization of web domains and organization addresses recognition by employing natural language processing, Pattern Matching and clustering
CN104516910A (en) Method and system for recommending content in client-side server environment
JP6872258B2 (en) A recording medium that can be read by a computer that embodies the Internet content providing server and its method.
CN102231152B (en) Searching method for precisely inquiring based on IP (Internet Protocol) address of mobile terminal
CN101772766A (en) Method and system for user centered information searching
CN103365876B (en) Method and equipment for generating network operation auxiliary information based on relational graph
US20110264683A1 (en) System and method for managing information map
CN115168401A (en) Data grading processing method and device, electronic equipment and computer readable medium
Chen et al. Finding keywords in blogs: Efficient keyword extraction in blog mining via user behaviors
US20170235835A1 (en) Information identification and extraction
CN107291951B (en) Data processing method, device, storage medium and processor
KR20110069018A (en) Indexing system
CN110543457A (en) Track type document processing method and device, storage medium and electronic device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Method and device used for data integration processing of multiple data sources

Effective date of registration: 20190531

Granted publication date: 20170908

Pledgee: Shenzhen Black Horse World Investment Consulting Co.,Ltd.

Pledgor: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

Registration number: 2019990000503

PE01 Entry into force of the registration of the contract for pledge of patent right
CP02 Change in the address of a patent holder

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Patentee after: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

Address before: 100086 Beijing city Haidian District Shuangyushu Area No. 76 Zhichun Road cuigongfandian 8 layer A

Patentee before: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd.

CP02 Change in the address of a patent holder
PP01 Preservation of patent right

Effective date of registration: 20240604

Granted publication date: 20170908