CN108415965A

CN108415965A - A kind of data processing method and device based on multi-data source

Info

Publication number: CN108415965A
Application number: CN201810122574.4A
Authority: CN
Inventors: 龙凯; 赵相龙; 刘屹州; 高瑞鑫
Original assignee: Five Dimensional Gravity (shanghai) Marketing Data Services Ltd
Current assignee: Five Dimensional Gravity (shanghai) Marketing Data Services Ltd
Priority date: 2018-02-07
Filing date: 2018-02-07
Publication date: 2018-08-17

Abstract

The present invention provides a kind of data processing method and device based on multi-data source, and method includes：The first data list is obtained from the first data source and obtains the second data list from the second data source；Wherein, every group of data of the first data list include：First identifier data and the first geographic position data corresponding with first identifier data and first time data；Every group of data of the second data list include：Second identifier data and the second geographic position data corresponding with second identifier data and the second time data；Screening Treatment is carried out according to space screening conditions and the first data list of time screening conditions pair and the second data list；Wherein, space screening conditions are：First geographic position data and the second geographic position data are within default geographic range；Time screening conditions are：First time data and the second time data are within the first preset time range；According to the first data list and the second data list after Screening Treatment, third data list is built.

Description

A kind of data processing method and device based on multi-data source

Technical field

The present invention relates to field of computer technology more particularly to a kind of data processing methods and dress based on multi-data source It sets.

Background technology

Big data (big data, mega data) or flood tide data, refer to needing new tupe that could have Stronger decision edge, the magnanimity of insight and process optimization ability, high growth rate and diversified information assets.Based on big data Analyzing processing is carried out, user's portrait service can be provided, to analyze user property and demand etc..

However in the prior art, information agent (such as a same equipment, the same user, same enterprise) institute is surrounded Generate or associated data are both dispersed in many different data sources, have no to be associated between different data sources, are formed Several data silo.Analyzing processing is carried out based on individual data source, can only obtain the portrait of fragmentation, it is difficult to carry for information agent For complete information view.

Invention content

Present invention aims at a kind of data processing method and device based on multi-data source are provided, to solve do not having mutually There is the problem of data correlation is carried out between multiple data sources of service logic.

Data processing method provided by the invention based on multi-data source, including：

The first data list is obtained from the first data source and obtains the second data list from the second data source；Wherein, Every group of data of one data list include：First identifier data and the first geographical position corresponding with the first identifier data Set data and first time data；Every group of data of the second data list include：Second identifier data and with it is described second mark Know corresponding second geographic position data of data and the second time data；

According to space screening conditions and time screening conditions to first data list and second data list into Row Screening Treatment；Wherein, the space screening conditions are：First geographic position data and second geographic position data Within default geographic range；The time screening conditions are：The first time data and second time data are Within one preset time range；

According to after Screening Treatment first data list and second data list, build third data list.

Further, data processing method of the present invention further includes：

The 4th data list is built according to the third data list；Wherein, the 4th data list includes described the Whole syntagmatics of the first identifier data and the second identifier data under three data lists；

Calculate the iterations of first identifier data described in every group and the combination of the second identifier data；

The combination of the first identifier data and the second identifier data is carried out at screening according to iteration screening conditions Reason；Wherein, the iteration screening conditions are：The iterations are more than preset times threshold value；

According to the combination of the first identifier data and the second identifier data after Screening Treatment, the 5th data are built List；Wherein, the 5th data list includes：The first identifier data mutually with correspondence and second mark Know data.

Further, data processing method of the present invention builds the 4th data list according to the third data list The step of specifically include：

According to n described default n third data lists of geographic range structure, also, in each third number The first identifier data and the second identifier data are subjected to combinations of pairs according in list, to build n the 4th numbers According to list；

Alternatively, according to m described m third data lists of first preset time range structure, also, in each institute It states in third data list and the first identifier data and the second identifier data is subjected to combinations of pairs, to build m institute State the 4th data list；

Alternatively, according to n described default n third data lists of geographic range structure and according to m described first Preset time range builds the m third data lists, also, is marked described first in each third data list Know data and carries out combinations of pairs with the second identifier data to build n+m the 4th data lists；

Wherein, the n default geographic range non-overlapping copies, m the first preset time range non-overlapping copies, n are Natural number and n >=2, m are natural number and m >=2.

Further, data processing method of the present invention is calculating iterations after building the 4th data list Before, further include：

Screening Treatment is carried out to every group of data of the 4th data list according to data screening condition；

Wherein, the data screening condition includes：

The first time data and the difference of second time data are within the first preset time difference；

And/or the first time data and second time data are within the second preset time range；Wherein, Second the first preset time ranges of preset time range ＜；

And/or in the case of at least two second identifier data are identical, when between second time data When difference is within the second preset time difference, only retain one of them described second identifier data；

And/or abandon coincidence data.

Further, data processing method of the present invention, the third data list include：

First identifier data, second identifier data, geographical indication；

Alternatively, first identifier data, second identifier data, geographical indication and time identifier；

Wherein, the geographical indication is corresponding with the default geographic range, and the time identifier is default with described first Time range is corresponding.

Further, data processing method of the present invention, the first identifier data include：Financial account information or branch Pay software account information；

First data source includes：Including the number of deals of the financial account information or the payment software account information According to；

The second identifier data include：Terminal device number, application user account, telephone number information, biology are special Reference ceases or identity information；

Second data source includes：Including the terminal device number, the application user account, the phone number Code information, the motion trace data of the biological information or the identity information.

Data processing equipment provided by the invention based on multi-data source, including：

First data acquisition module, for obtaining the first data list from the first data source；Wherein, first data list Every group of data include：First identifier data and the first geographic position data corresponding with the first identifier data and first Time data；

Second data acquisition module, for obtaining the second data list from the second data source；Wherein, second data list Every group of data include：Second identifier data and the second geographic position data corresponding with the second identifier data and second Time data；

Spatial processing module is used for according to space screening conditions to first data list and second data list Carry out Screening Treatment；Wherein, the space screening conditions are：First geographic position data and second geographical location number According within default geographic range；

Time-triggered protocol module is used for according to time screening conditions to first data list and second data list Carry out Screening Treatment；Wherein, the time screening conditions are：The first time data and second time data are first Within preset time range；

Data build module, for according to after Screening Treatment first data list and second data list, Build third data list.

Further, data processing equipment of the present invention further includes：

Data combination module, for building the 4th data list according to the third data list；Wherein, the 4th number According to list pass is combined with the whole of the second identifier data comprising the first identifier data under the third data list System；

Interative computation module, for calculate first identifier data described in every group and the second identifier data combination repeatedly Generation number；

Iteration screening module is used for according to iteration screening conditions to the first identifier data and the second identifier data Combination carry out Screening Treatment；Wherein, the iteration screening conditions are：The iterations are more than preset times threshold value；

Data match module, for according to the first identifier data and the second identifier data after Screening Treatment Combination builds the 5th data list；Wherein, the 5th data list includes：Mutually first mark with correspondence Know data and the second identifier data.

Further, data processing equipment of the present invention, the data combination module include：

Space iteration module, for building the n third data lists according to the n default geographic ranges, also, The first identifier data and the second identifier data are subjected to combinations of pairs in each third data list, with structure Build n the 4th data lists；

Alternatively, time iteration module, for according to m described m third data of first preset time range structure List, also, match the first identifier data and the second identifier data in each third data list To combination, to build m the 4th data lists；

Alternatively, space-time iteration module, for according to n described default n third data lists of geographic range structure And according to m described m third data lists of first preset time range structure, also, in each third data The first identifier data and the second identifier data are subjected to combinations of pairs to build n+m the 4th data in list List；

Further, data processing equipment of the present invention further includes：

Data screening module, for being screened to every group of data of the 4th data list according to data screening condition Processing；

Wherein, the data screening condition includes：

And/or abandon coincidence data.

Further, data processing equipment of the present invention, the third data list include：

First identifier data, second identifier data and geographical indication；

Further, data processing equipment of the present invention, the first identifier data include：Financial account information or branch Pay software account information；

Data processing method and device provided by the invention based on multi-data source can patrol mutual no business The first data source and the second data source collected, according to its respective time data, spatial data, in two mutually independent data Association is established between source, breaks data silo, and the data between multiple data sources, which are got through, provides basis, in favor of for information master Body provides complete information view.

Description of the drawings

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, of the invention other Feature, objects and advantages will become more apparent upon：

Fig. 1 is the flow diagram of the data processing method based on multi-data source of the embodiment of the present invention 1；

Fig. 2 is the flow diagram of the data processing method based on multi-data source of the embodiment of the present invention 2；

Fig. 3 is the structural schematic diagram of the data processing equipment based on multi-data source of the embodiment of the present invention 3；

Fig. 4 is the structural schematic diagram of the data processing equipment based on multi-data source of the embodiment of the present invention 4.

Same or analogous reference numeral represents same or analogous component in attached drawing.

Specific implementation mode

Present invention is further described in detail below in conjunction with the accompanying drawings.

Embodiment 1

Fig. 1 is the flow diagram of the data processing method based on multi-data source of the embodiment of the present invention 1, as shown in Figure 1, Data processing method provided by the invention based on multi-data source includes：

Step S101 obtains the first data list from the first data source and obtains the second data row from the second data source Table.

Wherein, every group of data of the first data list include：First identifier data and corresponding with first identifier data The first geographic position data and first time data.First identifier data may include：Financial account information or payment software Account information, such as bank's card number, credit number, Alipay account, wechat payment account etc..First data source includes：Including The transaction data of financial account information or payment software account information.Such as the transaction data package that bank and other financial mechanism provides It includes：Bank's card number, credit number, the ATM position (ATM, Automatic Teller Machine) or point-of-sale terminal The data such as (POS, Point of Sale) position, transaction number, transaction amount, exchange hour.The internet platforms such as Alipay carry The transaction data of confession includes：The data such as Alipay account, transaction amount, transaction address, exchange hour.

Every group of data of the second data list include：Second identifier data and corresponding with second identifier data second Geographic position data and the second time data.Second identifier data include：Terminal device number, application user account, phone Number information, biological information or identity information.Such as mobile device international identity code (IMEI, International Mobile Equipment Identity), APP (Application, third party application) user account, phone number, Finger print information, facial characteristics identification information, identification card number, driving license number etc..Second data source includes：Including terminal device number, The motion trace data of application user account, telephone number information, biological information or identity information.Such as user hand The user of the APP records of the intelligent terminals such as machine, PAD (Personal Digital Assistant, palm PC) installation Motion trace data, motion trace data include：The data such as APP user accounts, terminal device number, geographical location, time.This Outside, since the expansion of intelligent terminal function turns, such as face recognition, unlocked by fingerprint etc., it also can be by fingerprint, face feature information, DNA The biological informations such as information either expect someone's call number information or identification card number, driving license by phone number, fixed telephone number The identity informations such as number, are associated with motion trace data to constitute the second data source.

First data source and the second data source are orthogonal two separate data sources, especially under certain business scenario Without service logic relationship.Service logic relationship refers to, such as user carries out on the net under Web bank's payment transaction scene Payment, bank's platform can send transaction to user mobile phone and inform that short message, short message include bank's card number, transaction amount, exchange hour Etc. information.Thus between the transaction data source of bank's platform construction and the note data source of mobile phone operators structure, exist and be based on Short message informing business and the service logic relationship generated.Data are carried out for this kind of different data sources with service logic relationship (ID-Mapping) is got through, data correlation can be carried out by means of service logic relationship.However between more separate data sources Without service logic relationship, therefore data can only be carried out by means of the inherent space-time mapping relations of data and got through, that is, this The data processing method based on multi-data source of inventive embodiments.Certainly, between the different data sources with service logic relationship, As long as having geographic position data and time data, the data processing method of the embodiment of the present invention can also be used.

Extraction identifies first from the transaction data of the first data source, position and time are to establish the first data list, and And extraction identifies from the motion trace data of the second data source, position and time are to establish the second data list.Wherein, two Every data in data list all has mark and position corresponding with mark and temporal information.Further according to position, time Information is associated based on inherent space-time mapping relations, is built between mutually independent first data source and the second data source The vertical contact for contributing to user to draw a portrait.

Step S102, according to space screening conditions and the first data list of time screening conditions pair and the second data list into Row Screening Treatment.

Wherein, space screening conditions are：

First geographic position data and the second geographic position data are within default geographic range.

When Screening Treatment, first identifier data of first geographic position data except default geographic range are abandoned, also, Abandon second identifier data of second geographic position data except default geographic range.

Wherein, time screening conditions are：

First time data and the second time data are within the first preset time range.

When Screening Treatment, first identifier data of the first time data except the first preset time range are abandoned, also, Abandon second identifier data of second time data except the first preset time range.

Rights protection scope of the present invention screens space and the order of time screening does not limit, and can also first carry out the time Screening, then carry out space screening.

For example, the every group of data extracted from the first data source (transaction data) include：Bank's card number (first identifier number According to) and its corresponding geographical location and time, such as charge time, place and card number.From the second data source (motion trace data) Extraction every group of data include：Cell-phone number (second identifier data) and its corresponding geographical location and time.The default ground of setting The range of Nanjing Road commercial circle can be arranged by the way that longitude and latitude range is arranged in reason ranging from Nanjing Road commercial circle.It is arranged according to the first data Geographical location in table and the second data list abandons geographical location not from the first data list and the second data list respectively Data in Nanjing Road commercial circle, also, according to the temporal information in the first data list and the second data list, respectively from first Data of the time not within some day are abandoned in data list and the second data list, thus remaining is exactly in Nanjing Road The user bank card number of transaction and its data set of cell-phone number were carried out within the scope of commercial circle within some day.

Step S103 builds third data list according to the first data list and the second data list after Screening Treatment.

Wherein, third data list includes：First identifier data, second identifier data and geographical indication；Alternatively, the first mark Know data, second identifier data, geographical indication and time identifier.Wherein, geographical indication is corresponding with default geographic range, the time Mark is corresponding with the first preset time range.

For example, the geographical indication for the Nanjing Road commercial circle (default geographic range) that Nanjing Road is used as, it will Christmas Day conduct The time identifier of (the first preset time range), then can obtain according to step S102 between 0 point to 24 points of December 24 in 2017 To on December 24th, 2017 Nanjing Road had consumer record whole user banks card number and and its cell-phone number, in this, as number Third data list is established according to collection.The third party service provider can obtain according to bank transaction data source and APP motion trace datas The third data list that source obtains after being arranged pushes Nanjing Road Christmas Day advertising campaign according to the cell-phone number of each user to it Information.Third data list may include geographical indication, can also include geographical indication and time identifier.

1 data processing method based on multi-data source through the embodiment of the present invention, can close without service logic The incidence relation generated based on space-time mapping relations is established between multiple separate data sources of system, and it is whole to be conducive to big data processing Reason, and lay the first stone for the service for further providing for whole user portrait.

Based on the data processing method of the embodiment of the present invention 1, the embodiment of the present invention 2 will be in first identifier data and the second mark Correspondence is established between knowledge data, to realize that the data between transaction data source and motion trace data source get through (ID- Mapping).For example, bank's card number in transaction data source is associated with the terminal device number in motion trace data source, For bank's card number, its corresponding cell-phone number is found.Embodiment 2 is explained in detail below.

Embodiment 2

Fig. 2 is the flow diagram of the data processing method based on multi-data source of the embodiment of the present invention 2, as shown in Fig. 2, Data processing method provided by the invention based on multi-data source includes：

Step S201 obtains the first data list from the first data source and obtains the second data row from the second data source Table.

Wherein, every group of data of the first data list include：First identifier data and corresponding with first identifier data The first geographic position data and first time data.First identifier data may include：Financial account information or payment software Account information.First data source includes：Including the transaction data of financial account information or payment software account information.Second data Every group of data of list include：Second identifier data and the second geographic position data corresponding with second identifier data and Two time datas.Second identifier data include：Terminal device number, application user account, telephone number information, biological characteristic Information or identity information.Second data source includes：Including terminal device number, application user account, telephone number information, life The motion trace data of object characteristic information or identity information.First data source and the second data source are orthogonal two independences Data source does not have service logic relationship especially under certain business scenario.The step S101 phases of step S201 and embodiment 1 Together, it can refer to the step S101 of embodiment 1 herein, details are not described herein again.

Step S202, according to space screening conditions and the first data list of time screening conditions pair and the second data list into Row Screening Treatment.

Wherein, space screening conditions are：First geographic position data and the second geographic position data are in default geographic range Within.Time screening conditions are：First time data and the second time data are within the first preset time range.Step S202 It is identical as the step S102 of embodiment 1, it can refer to the step S102 of embodiment 1 herein, details are not described herein again.

Step S203, according to after Screening Treatment first data list and second data list, build third Data list.Step S202 is similar to the step S102 of embodiment 1, the difference is that need not be provided for third data list Geographical indication or time identifier, rest part can be with the step S103 of reference implementation example 1, and details are not described herein again.

Step S204 builds the 4th data list according to the third data list.

Wherein, the 4th data list includes the first identifier data and described the under the third data list Whole syntagmatics of two mark datas.For example, the transaction data of the first data source includes several first identifier data, such as silver Row card number 1 is to bank card number P, and corresponding first geographic position data of each bank card number and first time data.Second number Include several second identifier data, such as cell-phone number 1 to cell-phone number R according to the motion trace data in source, and each cell-phone number corresponds to The second geographic position data and the second time data.P, R is natural number and P >=2, R >=2.By first identifier data and second Mark data combination of two, common property gives birth to P × Q syntagmatic, by the wherein first identifier data of each and second identifier number According to one group data of the combination as the 4th data list, build to obtain the 4th data list with this.

Specifically, step S204 may comprise steps of：Step S2041, step S2041 or step S2043.

Step S2041, according to n described default n third data lists of geographic range structure, also, each The first identifier data and the second identifier data are subjected to combinations of pairs in the third data list, to build n 4th data list.

Wherein, the n default geographic range non-overlapping copies, n are natural number and n >=2.For example, respectively with Nanjing Road, five Angle field, Lujiazui, Jin Kelu obtain following four the 4th data lists as geographical preset range：

Nanjing Road East { bank's card number 1- cell-phone numbers 1, bank's card number 2- cell-phone numbers 2, bank's card number 3- cell-phone numbers 3 }；

Wujiao Court { bank's card number 2- cell-phone numbers 2, bank's card number 4- cell-phone numbers 4, bank's card number 5- cell-phone numbers 5 }；

Lujiazui { bank's card number 6- cell-phone numbers 6, bank's card number 2- cell-phone numbers 2, bank's card number 7- cell-phone numbers 7 }；

Jin Kelu { bank's card number 8- cell-phone numbers 8, bank's card number 9- cell-phone numbers 9, bank's card number 10- cell-phone numbers 10 }.

Alternatively, step S2042, according to m described m third data lists of first preset time range structure, and And the first identifier data and the second identifier data are subjected to combinations of pairs in each third data list, To build m the 4th data lists.

Wherein, m the first preset time range non-overlapping copies, m are natural number and m >=2.As respectively with 20171201,20171202,20171203 it is used as time preset range, obtains the 4th data list of following three：

On December 01st, 2017 { bank's card number 1- cell-phone numbers 1, bank's card number 2- cell-phone numbers 2, bank's card number 3- cell-phone numbers 3}；

On December 02nd, 2017 { bank's card number 2- cell-phone numbers 2, bank's card number 4- cell-phone numbers 4, bank's card number 5- cell-phone numbers 5}；

On December 03rd, 2017 { bank's card number 6- cell-phone numbers 6, bank's card number 2- cell-phone numbers 2, bank's card number 7- cell-phone numbers 7}。

Alternatively, step S2043, according to n described default n third data lists of geographic range structure and according to m A described m third data lists of first preset time range structure, also, will in each third data list The first identifier data carry out combinations of pairs to be built into n+m the 4th data lists with the second identifier data.

Wherein, step S2043 is the combination of step S2041 and step S2042.For example, respectively with Nanjing Road, Wujiao Court, Lujiazui, Jin Kelu preset model using 20171201,20171202,20171203 as time respectively as geographical preset range It encloses, obtains following seven the 4th data lists：

Jin Kelu { bank's card number 8- cell-phone numbers 8, bank's card number 9- cell-phone numbers 9, bank's card number 10- cell-phone numbers 10 }；

20171201 { bank's card number 1- cell-phone numbers 1, bank's card number 2- cell-phone numbers 2, bank's card number 3- cell-phone numbers 3 }；

20171202 { bank's card number 2- cell-phone numbers 2, bank's card number 4- cell-phone numbers 4, bank's card number 5- cell-phone numbers 5 }；

20171203 { bank's card number 6- cell-phone numbers 6, bank's card number 2- cell-phone numbers 2, bank's card number 7- cell-phone numbers 7 }.

Step S205 carries out Screening Treatment according to data screening condition to every group of data of the 4th data list.

Wherein, the data screening condition includes following at least one condition：Condition C 101, condition C 102, condition C 103, Condition C 104.Data for not meeting data screening condition abandon, to leave the data for meeting data screening condition.

Specifically, the difference of condition C 101, the first time data and second time data is when first is default Between within difference.

For example, the first preset time difference of setting is 2 hours, the first time data of bank's card number 6 are 12 points, with bank Second time data of the combined cell-phone number 6 of card number 6 is 15 points.Due to the difference of first time data and the second time data More than 2 hours, illustrate that bank's card number 6 is likely to not have incidence relation with cell-phone number 6, after merchandising 2 hours such as bank's card number 6, Enter with the cell-phone number 6 of 6 onrelevant of bank card number and presets geographic range.Therefore according to data screening condition C 101, by bank card The data of number 6- cell-phone numbers 6 abandon, and other data of remaining eligible C101 are retained.

Condition C 102, the first time data and second time data are within the second preset time range；Its In, second the first preset time ranges of preset time range ＜.

For example, the second preset time range of setting is 9 points to 10 points, the first time data of bank's card number 3 are 9: 30, Second time data of combined cell-phone number 3 is 10: 30 with bank card number 3.Since the second time data is default not second Time range illustrates that bank's card number 3 is likely to not have incidence relation with cell-phone number 3, by the data of bank's card number 3- cell-phone numbers 3 It abandons, other data of remaining eligible C102 is retained.

Condition C 103, in the case of at least two second identifier data are identical, when second time data it Between difference within the second preset time difference when, only retain one of them described second identifier data.

Wherein, when being acquired for motion trace data, it is substantially multi collect in the short time.For example, the second number According in list, a plurality of data all have cell-phone number 5, i.e., the second identifier data of a plurality of data are identical, and corresponding with cell-phone number 5 The second time data be respectively 12：03、12：04、12：05、……、12:09.That is the equipment acquisition one per minute of cell-phone number 5 Thus secondary motion trace data brings a plurality of data about cell-phone number 5, needs to remove redundant data.Therefore setting the Two preset time differences are 10 minutes, for second identifier data all comprising cell-phone number 5 and within ten minutes, only retain one It is a, it avoids computing repeatedly and brings error.

Condition C 104 abandons coincidence data.

Wherein, since the 4th data list need to include whole syntagmatics of first identifier data and second identifier data, In the presence of a large amount of repetitions, identical data, for identical data, needs to abandon redundant data, only retain one of them, to avoid weight The error brought is calculated again.

Step S206 calculates the iterations of first identifier data described in every group and the combination of the second identifier data.

For example, for 4 the 4th data lists that step S2041 is formed, only 2 this group of number of bank's card number 2- cell-phone numbers Occur 3 times according to combination, and the combination of other data only occurs 1 time, then bank's card number 2 and cell-phone number 2 have very big probability to be It is mutually related.The iterations for then recording 2 data of bank's card number 2- cell-phone numbers are 3, record other bank's card number-cell-phone numbers Iterations are 1.

Similarly, 3 the 4th data lists formed for step S2042, only 2 this group of number of bank's card number 2- cell-phone numbers Occur 3 times according to combination, and the combination of other data only occurs 1 time, then bank's card number 2 and cell-phone number 2 have very big probability to be It is mutually related.The iterations for then recording 2 data of bank's card number 2- cell-phone numbers are 3, record other bank's card number-cell-phone numbers Iterations are 1.

Similarly, 7 the 4th data lists formed for step S2043, only 2 this group of number of bank's card number 2- cell-phone numbers Occur 6 times according to combination, and the combination of other data only occurs 1 time, then bank's card number 2 and cell-phone number 2 have very big probability to be It is mutually related.The iterations for then recording 2 data of bank's card number 2- cell-phone numbers are 3, record other bank's card number-cell-phone numbers Iterations are 1.

Repeated multiple times iterative calculation can also be carried out to step S206, make the iterations of related data combination more Height, the iterations without the data combination of incidence relation are lower.

Step S207, according to iteration screening conditions to the combinations of the first identifier data and the second identifier data into Row Screening Treatment.

Wherein, the iteration screening conditions are：The iterations are more than preset times threshold value.

According to the iterations that step S206 is calculated, iterations are higher, illustrate first identifier data and the second mark The combination for knowing data more has the possibility that is mutually related.Iterations are lower, illustrate first identifier data and second identifier number According to combination do not have more and be mutually related possibility.Therefore by preset times threshold value, can there will be high probability association may First identifier data and the combined sorting of second identifier data come out, constitute first identifier data and second identifier data pair It should be related to.For example, for the iterations of the step S206 each data combinations being calculated, setting preset times threshold value is 2, only Retain iterations 2 or more first identifier data and second identifier data combination, then obtained bank's card number 2- mobile phones Numbers 2 data combination, thereby determines that bank's card number 2 has correspondence with cell-phone number 2.In data, more and iterations are more In the case of more, the matching accuracy between mark is higher.

Step S208, according to the combination of the first identifier data and the second identifier data after Screening Treatment, structure Build the 5th data list.

Wherein, the 5th data list includes：The first identifier data mutually with correspondence and described the Two mark datas.The combination of the first identifier data filtered out by above-mentioned steps and second identifier data is configured to the 5th number According to list, the 5th data list reflects the correspondence of first identifier data and second identifier data.Such as by above-mentioned steps, The incidence relation between bank's card number 2 and cell-phone number 2 has been filtered out, to associate bank's card number with cell-phone number, has been realized Mark data gets through across source between transaction data source and motion trace data source.

2 data processing method through the embodiment of the present invention, can be in the multiple numbers for not having service logic relationship mutually According between source, based in it space-time mapping relations, combined by data between two data source identifications of structure it is all can Energy combining form, then by conditions such as time screening, space screening, data screening, iteration screenings, gradually reduce the scope, establish Incidence relation between source data, realizes and is got through across source data, eliminates data silo, is capable of providing more complete use It draws a portrait at family.

The data processing method of the embodiment of the present invention 2 is mainly used for the multiple data for not having service logic relationship mutually Between source, but for multiple data sources mutually with service logic relationship, as long as with time corresponding with mark, geography Information can similarly establish association between mark, realize that data are got through.For simplicity of exposition, the embodiment of the present invention 2 is only right Data between two data sources, which are got through, to explain, but the data between multiple data sources are got through, and the present invention is implemented The above method of example stands good.For example, for 1 to X data source, if from each data source capability go out mark and with mark Know corresponding time, geography information, establish several third data lists, each third data list is extended according to syntagmatic It for the 4th data list, is then screened by above-mentioned series of steps, can equally realize the data between X data source It gets through.The claims of the present invention are not limited with the quantity of data source.

Embodiment 3

Fig. 3 is the structural schematic diagram of the data processing equipment based on multi-data source of the embodiment of the present invention 3, as shown in figure 3, The data processing equipment based on multi-data source of the embodiment of the present invention includes：First data acquisition module 31, the second data acquisition Module 32, spatial processing module 33, time-triggered protocol module 34 and data build module 35.

First data acquisition module 31, for obtaining the first data list from the first data source.

Wherein, every group of data of the first data list include：First identifier data and corresponding with first identifier data The first geographic position data and first time data.First identifier data include：Financial account information or payment software account Information.First data source includes：Including the transaction data of financial account information or payment software account information.

Second data acquisition module 32, for obtaining the second data list from the second data source.

Wherein, every group of data of the second data list include：Second identifier data and corresponding with second identifier data The second geographic position data and the second time data.Second identifier data include：Terminal device number, application user account Number, telephone number information, biological information or identity information.Second data source includes：Including terminal device number, application program The motion trace data of user account, telephone number information, biological information or identity information.

Spatial processing module 33, for being sieved according to the first data list of space screening conditions pair and the second data list Choosing is handled.

Wherein, space screening conditions are：First geographic position data and the second geographic position data are in default geographic range Within.

Time-triggered protocol module 34, for being sieved according to the first data list of time screening conditions pair and the second data list Choosing is handled.

Wherein, time screening conditions are：First time data and the second time data are within the first preset time range.

Data build module 35, for according to the first data list and the second data list after Screening Treatment, structure the Three data lists.

Wherein, third data list includes：First identifier data, second identifier data and geographical indication.Alternatively, the first mark Know data, second identifier data, geographical indication and time identifier.Wherein, geographical indication is corresponding with default geographic range, the time Mark is corresponding with the first preset time range.

The data processing equipment based on multi-data source of the embodiment of the present invention 3 is the embodiment of the present invention 1 based on most evidences The realization device of the data processing method in source, principle is identical as 1 data processing method of embodiment, specifically refers to embodiment 1 Related content, details are not described herein again.

Embodiment 4

Fig. 4 is the structural schematic diagram of the data processing equipment based on multi-data source of the embodiment of the present invention 4, and the present invention is implemented The data processing equipment based on multi-data source of example 4 includes：First data acquisition module 31, the second data acquisition module 32, sky Between processing module 33, time-triggered protocol module 34, data structure module 35, data combination module 36, data screening module 37, iteration Computing module 38, iteration screening module 39 and data match module 40.

Data combination module 36, for building the 4th data list according to the third data list.

Wherein, the 4th data list includes the first identifier data and described the under the third data list Whole syntagmatics of two mark datas.

Specifically, data combination module 36 includes：Space iteration module 361, time iteration module 362 or space-time iteration Module 363.

Space iteration module 361, for building the n third data lists according to the n default geographic ranges, and And the first identifier data and the second identifier data are subjected to combinations of pairs in each third data list, To build n the 4th data lists.

Alternatively, time iteration module 362, for according to m described m third numbers of first preset time range structure According to list, also, the first identifier data and the second identifier data are carried out in each third data list Combinations of pairs, to build m the 4th data lists.

Alternatively, space-time iteration module 363, for being arranged according to n described default n third data of geographic range structure Table simultaneously builds the m third data lists according to m first preset time ranges, also, in each third number The first identifier data and the second identifier data are subjected to combinations of pairs to be built into n+m the described 4th according in list Data list.

Data screening module 37, for being sieved to every group of data of the 4th data list according to data screening condition Choosing is handled.

Wherein, the data screening condition includes：Condition C 101, condition C 102, condition C 103 and/or condition C 104.

Condition C 101, the difference of the first time data and second time data the first preset time difference it It is interior.

Condition C 102, the first time data and second time data are within the second preset time range.Its In, second the first preset time ranges of preset time range ＜.

Condition C 104 abandons coincidence data.

Interative computation module 38, for calculating first identifier data described in every group and the combination of the second identifier data Iterations.

Iteration screening module 39 is used for according to iteration screening conditions to the first identifier data and the second identifier number According to combination carry out Screening Treatment.Wherein, the iteration screening conditions are：The iterations are more than preset times threshold value.

Data match module 40, for according to the first identifier data and the second identifier data after Screening Treatment Combination, build the 5th data list.Wherein, the 5th data list includes：Mutually have described the first of correspondence Mark data and the second identifier data.

The data processing equipment based on multi-data source of the embodiment of the present invention 4 is the embodiment of the present invention 2 based on most evidences The realization device of the data processing method in source, principle is identical as 2 data processing method of embodiment, specifically refers to embodiment 2 Related content, details are not described herein again.

It should be noted that the present invention can be carried out in the assembly of software and/or software and hardware, for example, can adopt With application-specific integrated circuit (ASIC), general purpose computer or any other realized similar to hardware device.In one embodiment In, software program of the invention can be executed by processor to realize steps described above or function.Similarly, of the invention Software program (including relevant data structure) can be stored in computer readable recording medium storing program for performing, for example, RAM memory, Magnetic or optical driver or floppy disc and similar devices.In addition, hardware can be used to realize in some steps or function of the present invention, example Such as, coordinate to execute the circuit of each step or function as with processor.

In addition, the part of the present invention can be applied to computer program product, such as computer program instructions, when its quilt When computer executes, by the operation of the computer, it can call or provide according to the method for the present invention and/or technical solution. And the program instruction of the method for the present invention is called, it is possibly stored in fixed or moveable recording medium, and/or pass through Broadcast or the data flow in other signal loaded mediums and be transmitted, and/or be stored according to described program instruction operation In the working storage of computer equipment.Here, including a device according to one embodiment of present invention, which includes using Memory in storage computer program instructions and processor for executing program instructions, wherein when the computer program refers to When order is executed by the processor, method and/or skill of the device operation based on aforementioned multiple embodiments according to the present invention are triggered Art scheme.

It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Profit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent requirements of the claims Variation includes within the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.This Outside, it is clear that one word of " comprising " is not excluded for other units or step, and odd number is not excluded for plural number.That is stated in device claim is multiple Unit or device can also be realized by a unit or device by software or hardware.The first, the second equal words are used for table Show title, and does not represent any particular order.

Claims

1. a kind of data processing method based on multi-data source, which is characterized in that including：

The first data list is obtained from the first data source and obtains the second data list from the second data source；Wherein, the first number Include according to every group of data of list：First identifier data and the first geographical location number corresponding with the first identifier data According to first time data；Every group of data of the second data list include：Second identifier data and with the second identifier number According to corresponding second geographic position data and the second time data；

First data list and second data list are sieved according to space screening conditions and time screening conditions Choosing is handled；Wherein, the space screening conditions are：First geographic position data and second geographic position data are pre- If within geographic range；The time screening conditions are：The first time data and second time data are pre- first If within time range；

2. data processing method according to claim 1, which is characterized in that further include：

The 4th data list is built according to the third data list；Wherein, the 4th data list includes the third number According to whole syntagmatics of the first identifier data and the second identifier data under list；

Screening Treatment is carried out to the combination of the first identifier data and the second identifier data according to iteration screening conditions；Its In, the iteration screening conditions are：The iterations are more than preset times threshold value；

According to the combination of the first identifier data and the second identifier data after Screening Treatment, structure the 5th data row Table；Wherein, the 5th data list includes：Mutually the first identifier data with correspondence and the second identifier Data.

3. data processing method according to claim 2, which is characterized in that build the 4th according to the third data list The step of data list, specifically includes：

According to n described default n third data lists of geographic range structure, also, arranged in each third data The first identifier data and the second identifier data are subjected to combinations of pairs in table, are arranged with building n the 4th data Table；

Alternatively, according to the described m third data lists of first preset time ranges structure of m, also, each described the The first identifier data and the second identifier data are subjected to combinations of pairs in three data lists, to build m described the Four data lists；

Alternatively, according to n described default n third data lists of geographic range structure and default according to m described first Time range builds the m third data lists, also, by the first identifier number in each third data list A 4th data lists of n+m are built according to combinations of pairs is carried out with the second identifier data；

Wherein, the n default geographic range non-overlapping copies, m the first preset time range non-overlapping copies, n is nature Number and n >=2, m are natural number and m >=2.

4. data processing method according to claim 2, which is characterized in that after building the 4th data list, counting Before calculating iterations, further include：

Wherein, the data screening condition includes：

And/or in the case of at least two second identifier data are identical, the difference between second time data When within the second preset time difference, only retain one of them described second identifier data；

And/or abandon coincidence data.

5. data processing method according to claim 1, which is characterized in that the third data list includes：

First identifier data, second identifier data, geographical indication；

Wherein, the geographical indication is corresponding with the default geographic range, the time identifier and first preset time Range is corresponding.

6. data processing method according to any one of claim 1 to 5, which is characterized in that

The first identifier data include：Financial account information or payment software account information；

First data source includes：Including the transaction data of the financial account information or the payment software account information；

The second identifier data include：Terminal device number, application user account, telephone number information, biological characteristic letter Breath or identity information；

Second data source includes：Including the terminal device number, the application user account, the telephone number are believed The motion trace data of breath, the biological information or the identity information.

7. a kind of data processing equipment based on multi-data source, which is characterized in that including：

First data acquisition module, for obtaining the first data list from the first data source；Wherein, every group of the first data list Data include：First identifier data and the first geographic position data corresponding with the first identifier data and first time Data；

Second data acquisition module, for obtaining the second data list from the second data source；Wherein, every group of the second data list Data include：Second identifier data and the second geographic position data corresponding with the second identifier data and the second time Data；

Spatial processing module, for being carried out to first data list and second data list according to space screening conditions Screening Treatment；Wherein, the space screening conditions are：First geographic position data and second geographic position data exist Within default geographic range；

Time-triggered protocol module, for being carried out to first data list and second data list according to time screening conditions Screening Treatment；Wherein, the time screening conditions are：The first time data and second time data are default first Within time range；

Data build module, for according to after Screening Treatment first data list and second data list, structure Third data list.

8. data processing equipment according to claim 7, which is characterized in that further include：

Data combination module, for building the 4th data list according to the third data list；Wherein, the 4th data row Table includes whole syntagmatics of the first identifier data and the second identifier data under the third data list；

Interative computation module, for calculating iteration time of the first identifier data described in every group with the combination of the second identifier data Number；

Iteration screening module is used for according to iteration screening conditions to the group of the first identifier data and the second identifier data It closes and carries out Screening Treatment；Wherein, the iteration screening conditions are：The iterations are more than preset times threshold value；

Data match module, for the group according to the first identifier data and the second identifier data after Screening Treatment It closes, builds the 5th data list；Wherein, the 5th data list includes：The mutually first identifier with correspondence Data and the second identifier data.

9. data processing equipment according to claim 8, which is characterized in that the data combination module includes：

Space iteration module, for building the n third data lists according to the n default geographic ranges, also, every The first identifier data and the second identifier data are subjected to combinations of pairs in a third data list, to build n A 4th data list；

Alternatively, time iteration module, for building the m third data lists according to m first preset time ranges, Also, the first identifier data and the second identifier data are subjected to matched group in each third data list It closes, to build m the 4th data lists；

Alternatively, space-time iteration module, for according to n described default n third data lists of geographic range structure and root According to m described m third data lists of first preset time range structure, also, in each third data list It is middle that the first identifier data and the second identifier data are subjected to combinations of pairs to build n+m the 4th data row Table；

10. data processing equipment according to claim 8, which is characterized in that further include：

Data screening module, for being carried out at screening to every group of data of the 4th data list according to data screening condition Reason；

Wherein, the data screening condition includes：

And/or abandon coincidence data.

11. data processing equipment according to claim 7, which is characterized in that the third data list includes：

First identifier data, second identifier data and geographical indication；

12. the data processing equipment according to any one of claim 7 to 11, which is characterized in that