CN109190035A

CN109190035A - ID data network data analysis method, device and calculating equipment

Info

Publication number: CN109190035A
Application number: CN201810973801.4A
Authority: CN
Inventors: 李晓明; 王斌锋; 马征
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2018-08-24
Filing date: 2018-08-24
Publication date: 2019-01-11

Abstract

The invention discloses a kind of ID data network data analysis method, device, calculate equipment and computer storage medium, wherein ID data network data analysis method includes: to obtain the ID data network comprising the incidence relation between ID data and ID data；ID data include: User ID data and/or device id data；The incidence relation between ID data and ID data for being included according to ID data network constructs ID relation data；ID relation data includes several ID relationships pair；Combination is compared to ID relation data, obtains several ID data subnets.The technical solution effectively improves ID data network data analysis efficiency, several ID data subnets can accurately and rapidly be obtained, realize effective division to ID data network, compared with ID data network, the ID data that ID data subnet is included have stronger, reliable incidence relation, it can recognize the ID data for same user, help to construct complete, effective user's portrait.

Description

ID data network data analysis method, device and calculating equipment

Technical field

The present invention relates to Internet technical fields, and in particular to a kind of ID data network data analysis method, device, calculating are set Standby and computer storage medium.

Background technique

In order to meet the different use demand of user, be developed online, do shopping, make a reservation, ordering train ticket, payment etc. Multiple business are selected and are used for user.Business can be according to account of the user in business or equipment used by a user Deng, be user setting ID data, for being identified to user.ID number can be constructed according to the ID data from multiple business It, can be to user's gender, age of user, browsing hobby, click hobby, liveness, article purchase happiness based on ID data network according to net The user characteristics such as good, article purchase potentiality, game hobby are analyzed, and complete, effective user's portrait are constructed, to realize to new The accurate recommendation of news, game, advertisement etc..However the ID data of multiple business are various, the incidence relation between ID data is complicated, number It is larger according to treating capacity, and different business is different for the setting rule of ID data, can not accurately and rapidly be wrapped from ID data network The ID data corresponding to same user are identified in a large amount of ID data contained.

Summary of the invention

In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind It states the ID data network data analysis method of problem, device, calculate equipment and computer storage medium.

According to an aspect of the invention, there is provided a kind of ID data network data analysis method, this method comprises: obtaining packet The ID data network of incidence relation between data containing ID and ID data；ID data include: User ID data and/or device id number According to；The incidence relation between ID data and ID data for being included according to ID data network constructs ID relation data；ID relationship number According to including several ID relationships pair；Combination is compared to ID relation data, obtains several ID data subnets.

Further, combination is compared to ID relation data, obtaining several ID data subnets further comprises: full dose is multiple ID relation data processed is into memory；ID relation data is compared with the ID relation data that full dose copies in memory and is combined, Data Integration is carried out according to combined result is compared, obtains several ID data subnets.

Further, ID relation data is compared with the ID relation data that full dose copies in memory and is combined, according to It compares combined result and carries out Data Integration, obtaining several ID data subnets further comprises: ID relation data being divided into multiple Fragment；The ID relation data that multiple fragments concurrently copy in memory with full dose is compared and is combined, all fragments are obtained Comparison combined result；The comparison combined result of all fragments is subjected to Data Integration, obtains several ID data subnets.

Further, the ID relation data that multiple fragments concurrently copy in memory with full dose is compared and is combined, The comparison combined result for obtaining all fragments further comprises: being directed to any fragment, the fragment and full dose are copied in memory ID relation data combination is compared, obtain the fragment comparison combination intermediate result；Iteration executes this step, until meeting Default iterated conditional: the comparison combination intermediate result of all fragments is divided into the sub- fragment in multiple centres, and will be multiple intermediate sub The ID relation data that fragment concurrently copies in memory with full dose, which is compared, to be combined, and all of next iteration operation are obtained Intermediate result is combined in the comparison of fragment；After iterative process, the comparison combined result of all fragments is obtained.

Further, default iterated conditional includes: that the number of iterations reaches default the number of iterations.

According to another aspect of the present invention, a kind of ID data network data analytical equipment is provided, which includes: acquisition mould Block, suitable for obtaining the ID data network comprising the incidence relation between ID data and ID data；ID data include: User ID data And/or device id data；First building module, suitable for included according to ID data network ID data and ID data between pass Connection relationship constructs ID relation data；ID relation data includes several ID relationships pair；Composite module is compared, is suitable for ID relationship number According to combination is compared, several ID data subnets are obtained.

Further, compare composite module to be further adapted for: full dose replicates ID relation data into memory；By ID relationship number It is compared and combines according to the ID relation data copied in memory with full dose, carry out Data Integration according to combined result is compared, obtain To several ID data subnets.

Further, it compares composite module to be further adapted for: ID relation data is divided into multiple fragments；By multiple fragments The ID relation data concurrently copied in memory with full dose, which is compared, to be combined, and the comparison combined result of all fragments is obtained； The comparison combined result of all fragments is subjected to Data Integration, obtains several ID data subnets.

Further, it compares composite module to be further adapted for: for any fragment, the fragment and full dose being copied into memory In ID relation data combination is compared, obtain the fragment comparison combination intermediate result；Iteration executes this step, until symbol It closes default iterated conditional: the comparison combination intermediate result of all fragments is divided into the sub- fragment in multiple centres, and by multiple centres The ID relation data that sub- fragment concurrently copies in memory with full dose, which is compared, to be combined, and the institute of next iteration operation is obtained There is the comparison combination intermediate result of fragment；After iterative process, the comparison combined result of all fragments is obtained.

According to another aspect of the invention, provide a kind of calculating equipment, comprising: processor, memory, communication interface and Communication bus, processor, memory and communication interface complete mutual communication by communication bus；

Memory makes processor execute above-mentioned ID data network data for storing an at least executable instruction, executable instruction The corresponding operation of analysis method.

In accordance with a further aspect of the present invention, a kind of computer storage medium is provided, at least one is stored in storage medium Executable instruction, executable instruction make processor execute such as the corresponding operation of above-mentioned ID data network data analysis method.

The technical solution provided according to the present invention, can be based between the ID data and ID data that ID data network is included Incidence relation, construct ID relation data, and combination be compared to ID relation data, accurately and rapidly obtain several ID numbers According to subnet, compared with ID data network, the ID data that ID data subnet is included have stronger, reliable incidence relation, can know Not Wei same user ID data；And the data volume of ID data subnet is far smaller than the data volume of ID data network, is based on ID number Accurately and rapidly user characteristics can be analyzed according to subnet, construct complete, effective user portrait, with realize to news, The accurate recommendation of game, advertisement etc..

The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.

Detailed description of the invention

By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:

Fig. 1 shows the flow diagram of ID data network processing method according to an embodiment of the invention；

Fig. 2 a shows the flow diagram of ID data network processing method in accordance with another embodiment of the present invention；

Fig. 2 b shows ID data network schematic diagram；

Fig. 3 shows the flow diagram of ID data network beta pruning preprocess method according to an embodiment of the invention；

Fig. 4 shows the flow diagram of ID data network data analysis method according to an embodiment of the invention；

Fig. 5 a shows the flow diagram of ID data network data analysis method in accordance with another embodiment of the present invention；

Figure 5b shows that ID relationship to the processing schematic for carrying out oriented positive sequence and oriented backward；

Fig. 6 shows the flow diagram of ID data subnet processing method according to an embodiment of the invention；

Fig. 7 shows the structural block diagram of ID data network processing unit according to an embodiment of the invention；

Fig. 8 shows the structural block diagram of ID data network beta pruning pretreatment unit according to an embodiment of the invention；

Fig. 9 shows the structural block diagram of ID data network data analytical equipment according to an embodiment of the invention；

Figure 10 shows the structural block diagram of ID data network data analytical equipment in accordance with another embodiment of the present invention；

Figure 11 shows the structural block diagram of ID data subnet processing unit according to an embodiment of the invention；

Figure 12 shows a kind of structural schematic diagram for calculating equipment according to an embodiment of the present invention.

Specific embodiment

Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.

Fig. 1 shows the flow diagram of ID data network processing method according to an embodiment of the invention, such as Fig. 1 institute Show, this method comprises the following steps:

Step S100 obtains the ID data network comprising the incidence relation between ID data and ID data.

Wherein, the ID data network constructed in advance can be obtained from data analysis system etc., ID data network can be for according to multiple What the daily record data of business was constructed, ID data network includes the incidence relation between ID data and ID data, ID Data refer to the data for identity user identity, ID data can include: User ID data and/or device id data.Each ID There is incidence relation, incidence relation includes direct correlation relationship and indirect association relationship between data.

Specifically, User ID data refer to account data of the user in business, such as cell-phone number, WeChat ID, QQ number, clear Look at device ID etc..For example, a certain user has been logged in using cell-phone number " 189****2677 ", wechat is applied and QQ is applied, and the user WeChat ID in wechat application is " wxid_1 ", and the QQ number in QQ application is " 12345 ", then cell-phone number " 189**** 2677 " have direct correlation relationship with WeChat ID " wxid_1 ", and cell-phone number " 189****2677 " also has with QQ number " 12345 " Direct correlation relationship.

The mark data of device id data used equipment when referring to user using business, such as the equipment of mobile device Number MD5 value, device number+system program version number+handset serial MD5 value of mobile device, the MAC Address of mobile device MD5 value in 32,44 etc. in the MD5 value of the MAC Address of mobile device.Different business is set for device id data Set regular difference.If using same business by the multiple User ID data of same equipment utilization, then the business is the equipment The device id data and above-mentioned multiple User ID data marked all have incidence relation.For example, using wechat by certain mobile phone Number " wxid_1 " and the logged wechat application of WeChat ID " wxid_2 ", and the device id data markers of the mobile phone are by wechat application " m1 ", then device id data " m1 " and WeChat ID " wxid_1 " have direct correlation relationship, device id data " m1 " and wechat Number " wxid_2 " also has direct correlation relationship.

Step S101 carries out data analysis to ID data network, obtains several ID data subnets.

Wherein, data point are carried out by the incidence relation between the ID data and ID data that are included to ID data network Analysis, is divided into several ID data subnets for ID data network.It can will be several according to the quantity for the ID data that ID data subnet is included ID data sub-network division is concentrated to n ID data subnet, and n is the natural number greater than 0.The ID data that different ID data subnets are concentrated The quantity for the ID data that subnet is included is different.For example, several ID data subnets include the quantity of 200 ID data for being included The ID data for including by 3 ID data subnet and 100 by 2 ID data subnet, the quantity of 300 ID data for including Quantity be 4 ID data subnet, then can according to the ID data that ID data subnet is included quantity by this several ID data Sub-network division is concentrated to 3 ID data subnets, specifically, the ID data subnet that the quantity for the ID data for being included by 200 is 2 It is divided into first ID data subnet to concentrate, the ID data sub-network division that the quantity for the ID data for being included by 300 is 3 to the Two ID data subnets are concentrated, the ID data sub-network division that the quantity for the ID data for being included by 100 is 4 to third ID number It is concentrated according to subnet.

Compared with ID data network, the ID data that ID data subnet is included have stronger, reliable incidence relation, can incite somebody to action The ID data that ID data subnet is included are identified as the ID data of same user.And the ID data that ID data subnet is included Quantity is far smaller than the quantity for the ID data that ID data network is included, and the data volume of ID data subnet is far smaller than ID data network Data volume is based on ID data subnet, can accurately and rapidly like user's gender, age of user, browsing hobby, click, is living The user characteristics such as jerk, article purchase hobby, article purchase potentiality, game hobby are analyzed, and complete, effective user is constructed Portrait.

According to ID data network processing method provided in this embodiment, the ID data and ID data for being included to ID data network Between incidence relation carry out data analysis, ID data network rapidly can be divided into several ID data subnets, with ID data Net is compared, and the ID data that ID data subnet is included have stronger, reliable incidence relation, can recognize as the ID of same user Data；And the data volume of ID data subnet is far smaller than the data volume of ID data network, can be accurate, fast based on ID data subnet User characteristics are analyzed on fast ground, complete, effective user's portrait are constructed, to realize to the accurate of news, game, advertisement etc. Recommend.

Fig. 2 a shows the flow diagram of ID data network processing method in accordance with another embodiment of the present invention, such as Fig. 2 a Shown, this method comprises the following steps:

Step S200 carries out data analysis to the daily record data of multiple business, determines between ID data and ID data Incidence relation.

Wherein, the daily record data of multiple business is obtained, daily record data can be to be obtained by multiple business active uploads, can also To be to make requests to obtain to multiple business.For the daily record data of a business, can record in daily record data using the industry The ID data of business and other ID data illustrate there is incidence relation between ID data and other ID data using the business, Data analysis is carried out by the daily record data to multiple business, is capable of determining that the association between ID data and ID data is closed System.Specifically, ID data can include: User ID data and/or device id data.

Step S201, according to the incidence relation between ID data, determines the connection between node using ID data as node Relationship, construction obtain ID data network.

After the incidence relation between ID data and ID data has been determined, can according to identified ID data and Incidence relation between ID data constructs ID data network, specifically, using ID data as node, according to the pass between ID data Connection relationship determines the connection relationship between node, thus construction obtain ID data network, the ID data network include ID data and Incidence relation between ID data can clearly illustrate the incidence relation between each ID data and ID data.

Assuming that identified ID data include " a1 ", " b1 ", " a2 ", " b2 ", " c2 ", " a3 ", " b3 ", " c3 ", " d3 ", " a4 ", " b4 ", " c4 ", " d4 ", " e4 ", " f4 ", " g4 ", " h4 ", wherein between ID data " a1 " and ID data " b1 ", ID number According between " a2 " and ID data " b2 ", between ID data " a2 " and ID data " c2 ", between ID data " a3 " and ID data " b3 ", Between ID data " a3 " and ID data " c3 ", between ID data " c3 " and ID data " d3 ", ID data " a4 " and ID data " b4 " Between, between ID data " a4 " and ID data " c4 ", between ID data " a4 " and ID data " f4 ", ID data " b4 " and ID data Between " d4 ", between ID data " b4 " and ID data " e4 ", between ID data " b4 " and ID data " h4 " and ID data " e4 " Between ID data " g4 " have direct correlation relationship, then between ID data " b2 " and ID data " c2 ", ID data " b3 " and There is indirect association relationship between ID data " c3 ", between ID data " a3 " and ID data " d3 " etc., then extremely by ID data " a1 " ID data " h4 " are respectively as the node a1 to node h4 in ID data network, and according to the incidence relation between each ID data, Node a1 in ID data network is connected with node b1, node a2 is connected with node b2 and node c2 respectively, by node a3 points Be not connected with node b3 and node c3, node c3 be connected with node d3, by node a4 respectively with node b4, node c4 and node F4 is connected, and node b4 is connected with node d4, node e4 and node h4 respectively, node e4 is connected with node g4, is configured to The ID data network 210 arrived can be as shown in Figure 2 b.

Step S202 obtains the ID data network comprising the incidence relation between ID data and ID data.

After the construction for completing ID data network, the ID data network is obtained, to carry out beta pruning pretreatment to the ID data network And the processing such as data analysis.

Step S203 carries out beta pruning pretreatment to ID data network, obtains the pretreated ID data network of beta pruning.

It wherein, can be according to the quantity of the association frequency and other ID data being directly linked with ID data between ID data Deng to the progress beta pruning pretreatment of ID data network, to obtain the pretreated ID data network of beta pruning.Specifically, part that can be removed The ID data directly incidence relation between other associated ID data, realizes the pretreatment of the beta pruning to ID data network, has Effect ground eliminates in ID data network insecure incidence relation between ID data, can not only help to improve ID data network and handle Accuracy, but also the data volume of subsequent data analysis can be reduced.

Step S204, ID data network pretreated to beta pruning carry out data analysis, obtain several ID data subnets.

It, can be by being included to the pretreated ID data network of beta pruning after obtaining the pretreated ID data network of beta pruning ID data and ID data between incidence relation carry out data analysis, which is divided into several ID data Net.Can according to the ID data that ID data subnet is included quantity by several ID data sub-network divisions to n ID data subnet collection In, n is the natural number greater than 0.The quantity for the ID data that the ID data subnet that different ID data subnets are concentrated is included is different.With ID data network is compared, and the ID data that ID data subnet is included have stronger, reliable incidence relation.

Step S205 is greater than ID data of the first preset quantity threshold value for the quantity of any included ID data Net is clustered and is divided to the ID data in the ID data subnet, obtains several 3rd ID corresponding to the ID data subnet Data subnet.

It may be still including in obtained several ID data subnets after being analyzed by the data of step S204 The a fairly large number of ID data subnet for the ID data for including, although the ID data in these ID data subnets have stronger association Relationship, but the ID data of same user may and be not belonging to, if these ID data to be identified as to the ID data of same user, will lead Cause the user characteristics obtained based on these ID data subnets analysis can not situation that is effective, being truly reflected user's reality.In order to The reliability for further increasing these ID data subnets also needs that these ID data subnets are further processed, such as to this A little ID data subnets are clustered and are divided.

Specifically, the first preset quantity threshold value and the second preset quantity threshold value can be preset, for several ID data The quantity of any included ID data in net is greater than the ID data subnet of the first preset quantity threshold value, to the ID data subnet In ID data clustered and divided, several 3rd ID data subnets corresponding to the ID data subnet are obtained, thus should ID data in ID data subnet with stronger, more structurally sound incidence relation are gathered for one kind, and are divided to same 3rd ID In data subnet.Wherein, any to refer to any one；The quantity for the ID data that 3rd ID data subnet is included is less than or equal to Second preset quantity threshold value.Compared with the quantity for the ID data for being included is greater than the ID data subnet of the first preset quantity threshold value, ID data in 3rd ID data subnet have stronger, more structurally sound incidence relation, can recognize the ID number for same user According to can accurately and efficiently be analyzed user characteristics based on the 3rd ID data subnet, to construct complete, effective user Portrait.And the 3rd the data volume of ID data subnet be far smaller than the quantity of included ID data and be greater than the first preset quantity threshold The data volume of the ID data subnet of value, is more convenient for user feature analysis, helps to improve analysis efficiency.

Those skilled in the art can according to actual needs carry out the first preset quantity threshold value and the second preset quantity threshold value Setting, herein without limitation.For example, 50 can be set by the first preset quantity threshold value, set the second preset quantity threshold value to 10, then it is greater than 50 ID data subnet for the quantity for the ID data that any one of several ID data subnets are included, It requires that the ID data in the ID data subnet are clustered and divided, which is divided into several included The quantity of ID data is less than or equal to 10 the 3rd ID data subnet.

According to ID data network processing method provided in this embodiment, data point are carried out by the daily record data to multiple business Analysis, can rapidly construct to obtain ID data network；And beta pruning pretreatment is carried out to ID data network, is effectively and quickly eliminated Insecure incidence relation between ID data in ID data network can not only help to improve the accuracy of ID data network processing, But also the data volume of data analysis can be reduced；In addition, between the ID data and ID data that are included to ID data network Incidence relation carries out data analysis, ID data network rapidly can be divided into several ID data subnets, ID data subnet is wrapped The ID data contained have stronger, reliable incidence relation, can recognize the ID data for same user, are based on ID data subnet energy It is enough that accurately and rapidly user characteristics are analyzed, to construct complete, effective user's portrait.

The present invention also provides a kind of ID data network beta pruning preprocess method, this method comprises: obtain comprising ID data with And the ID data network of the incidence relation between ID data；Beta pruning pretreatment is carried out to ID data network, it is pretreated to obtain beta pruning ID data network.Wherein, ID data include: User ID data and/or device id data.Below by specific implementation shown in Fig. 3 The ID data network beta pruning preprocess method is described in example.

Fig. 3 shows the flow diagram of ID data network beta pruning preprocess method according to an embodiment of the invention, such as Shown in Fig. 3, this method comprises the following steps:

Step S300 obtains the ID data network comprising the incidence relation between ID data and ID data.

Description in embodiment illustrated in fig. 1 to step S100 can refer to the description of the step, details are not described herein again.

Step S301 carries out data analysis to the daily record data of multiple business, obtains the association frequency between ID data.

The ID data and other ID using the business can be recorded for the daily record data of a business, in daily record data Data illustrate there is incidence relation between ID data and other ID data using the business, pass through the log to multiple business Data carry out data analysis, can not only determine the incidence relation between ID data and ID data, additionally it is possible to determine ID The association frequency between data.

Specifically, data analysis is carried out to the daily record data of multiple business, calculates the actual association frequency between ID data. In practical applications, the actual association frequency between ID data can be calculated according to the default unit time.To preset the unit time For day, if analyzing to obtain by carrying out data to daily record data, some ID data and another ID data have 50 days to have pass The actual association frequency between the two ID data is then denoted as 50 by connection relationship.According to the method described above, each ID is calculated The actual association frequency between data and other ID data.

In practical applications, there is also the feelings that multiple users successively use same business in different times by same equipment Condition, the User ID data of this multiple user have incidence relation, but its actual association all between the device id data of the equipment The frequency can not be truly reflected user corresponding to the equipment current period reality.For example, two users are on the same mobile phone It is applied using 360 security guards, then 360 accounts of the two users have association all between the device id data of the mobile phone Relationship, it is assumed that obtained according to the daily record data that 360 security guards apply, wherein first 360 account is before 1 year frequently by this Mobile phone logs in 360 security guards application, the actual association frequency between first 360 account and the device id data of the mobile phone It is 100, but first 360 account no longer pass through the mobile phone and log in 360 security guards application before half a year, but second 360 accounts before half a year frequently by the mobile phone log in 360 security guards application, second 360 account and the mobile phone Device id data between the actual association frequency be 50.Although between first 360 account and the device id data of the mobile phone The actual association frequency be higher than the actual association frequency between second 360 account and the device id data of the mobile phone, but the The corresponding daily record data of one 360 account is daily record data the year before, and the temporal information of the daily record data is apart from current time Farther out, it is clear that the corresponding user of second 360 account is only user corresponding to the mobile phone current period reality, if according only to reality The border association frequency can not be truly reflected user corresponding to the mobile phone current period reality.

To solve the above-mentioned problems, the present invention is that the corresponding daily record data of ID data introduces corresponding time weighting, according to According to the temporal information and time weighting of the actual association frequency, the corresponding daily record data of ID data between ID data, calculate To the association frequency between ID data.Wherein, the weight size of time weighting corresponding to the corresponding daily record data of ID data with How far of the corresponding daily record data of ID data apart from current time is related.If the time of the corresponding daily record data of ID data believes Breath is closer apart from current time, then the weight of time weighting corresponding to the corresponding daily record data of ID data is bigger；If ID data The temporal information of corresponding daily record data is remoter apart from current time, then the time corresponding to the corresponding daily record data of ID data weighs The weight of weight is smaller.Attenuation processing is carried out to the actual association frequency between ID data by time weighting, after attenuation processing Obtained numerical value is as the association frequency between ID data.The association frequency between obtained ID data in this way It can accurately reflect true correlation degree between current period ID data, reference value with higher facilitates precisely Ground carries out beta pruning pretreatment to ID data network.

Step S302, for any ID data in ID data network, according to other ID numbers being directly linked with the ID data According to quantity and/or the ID data and other ID data between be associated with the frequency, between the ID data and other ID data Incidence relation carries out beta pruning pretreatment.

The present invention is provided with each threshold value of defined in prune rule and prune rule by data analysis repeatedly, Wherein, prune rule includes: for any ID data in ID data network, if other ID data being directly linked with the ID data Quantity be greater than between first threshold and the ID data and other any ID data and be associated with the frequency less than or equal to second threshold, Then remove the incidence relation between the ID data and other any ID data；If other ID numbers being directly linked with the ID data According to quantity be greater than the sum of the frequency that is associated between third threshold value and the ID data and other each ID data and be greater than or equal to the Four threshold values then remove the incidence relation between the ID data and other each ID data；If the ID data and other each ID numbers The sum of association frequency between is greater than or equal to the 5th threshold value；Then remove the pass between the ID data and other each ID data Connection relationship；For other situations in addition to above-mentioned three kinds of situations, then retain between the ID data and other each ID data Incidence relation, without being removed.As long as invention provides for meet above-mentioned three kinds need to remove incidence relation in the case where It is any, just remove corresponding incidence relation.

Whether any ID data for the ease of judging in ID data network meet above-mentioned prune rule, can first be directed to ID data Any ID data in net are constructed with the intermediate subnet of the ID data grid technology, specifically, the ID for being included according to ID data network Incidence relation between data and ID data constructs ID relation data, wherein and ID relation data includes several ID relationships pair, Each ID relationship to comprising relationship between two ID and two ID, for example, ID data " a1 " and ID data " b1 " have it is direct Incidence relation, then constructed corresponding ID relationship is to being (a1, b1), a1 and b1 for two included in the ID relationship pair ID, and indicate that there is relationship between the two ID with ().Then according to major key ID group technology, to all ID relationships to dividing Group obtains intermediate subnet according to group result, wherein refers to according to major key ID group technology and is divided according to set major key ID The method of group.For example, being major key ID according to the ID in the left side of all ID relationship centerings, by groupByKey method to all ID Relationship is to being grouped, and the intermediate subnet centered on obtaining all ID by left side according to group result is to get having arrived with ID number According to the intermediate subnet of any ID data grid technology in net.After having obtained intermediate subnet, so that it may easily carry out ID data Whether the judgement of above-mentioned prune rule is met.

In practical applications, after whether meeting the judgement of prune rule, setting beta pruning can be marked for ID relationship Whether position is the incidence relation for needing to remove for the relationship between two ID of Tag ID relationship centering.If some ID relationship Relationship between two ID of centering is the incidence relation for needing to remove, then sets 1 for the beta pruning marker bit of the ID relationship pair； If the relationship between two ID of some ID relationship centering is not the incidence relation for needing to remove, by the beta pruning of the ID relationship pair Marker bit is set as 0.By beta pruning marker bit can clearly know the relationship between two ID of ID relationship centering whether be The incidence relation for needing to remove.

It specifically, can be according to the intermediate son of the ID data grid technology for any one of ID data network ID data Net, the quantity for other ID data that judgement and the ID data are directly linked whether be greater than first threshold and the ID data and it is any its The association frequency between his ID data is less than or equal to second threshold；If so, removing the ID data and other any ID numbers Incidence relation between.Wherein, first threshold can be 2, and second threshold can be 5, then judgement is directly linked with the ID data The frequencys that is associated with that whether are greater than between 2 and the ID data and other any ID data of quantity of other ID data be less than or equal to 5；If so, illustrating that the incidence relation between the ID data and other any ID data is insecure incidence relation, then remove Incidence relation between the ID data and other any ID data.Assuming that for the ID data " a4 " in ID data network, according to Intermediate subnet centered on ID data " a4 " it is found that with ID data " a4 " ID data being directly linked include ID data " b4 ", ID data " c4 " and ID data " f4 ", wherein the frequency that is associated between ID data " a4 " and ID data " b4 " is 20, ID data The frequency that is associated between " a4 " and ID data " c4 " be the frequency that is associated between 30, ID data " a4 " and ID data " f4 " is 3, that Quantity with ID data " a4 " other ID data being directly linked is 3, is greater than 2, and ID data " a4 " and ID data " f4 " it Between the association frequency less than 5, then remove the incidence relation between ID data " a4 " and ID data " f4 ".

For any one of ID data network ID data, also according to the intermediate subnet of the ID data grid technology, judgement Whether the quantity for other ID data being directly linked with the ID data is greater than third threshold value and the ID data and other each ID numbers The sum of association frequency between is greater than or equal to the 4th threshold value；If so, remove the ID data and other each ID data it Between incidence relation.Wherein, third threshold value can be 299, and the 4th threshold value can be 100, then judgement is directly linked with the ID data The sum of frequencys that is associated with for whether being greater than between 299 and the ID data and other each ID data of quantity of other ID data be greater than Or it is equal to 100；If so, illustrating that the incidence relation between the ID data and other each ID data is that insecure association is closed System, then remove the incidence relation between the ID data and other each ID data.In addition, also can determine whether the ID data and it is each its Whether the sum of association frequency between his ID data is greater than or equal to the 5th threshold value；If so, remove the ID data and it is each its Incidence relation between his ID data.Wherein, the 5th threshold value can be 1000, then judging the ID data and other each ID data Between the sum of the association frequency whether be greater than or equal to 1000；If so, illustrating between the ID data and other each ID data Incidence relation is insecure incidence relation, then removes the incidence relation between the ID data and other each ID data.

Step S303 obtains the pretreated ID data network of beta pruning.

The judgement for whether meeting prune rule is being completed for any ID data in ID data network, and according to judgement As a result after carrying out beta pruning pretreatment to the incidence relation between the ID data and other ID data, it is pretreated to obtain beta pruning ID data network, so that insecure incidence relation between ID data is effectively removed in ID data network, so that beta pruning pre-processes The incidence relation between the ID data in ID data network afterwards is stronger, reliable incidence relation, can not only facilitate to mention The accuracy of high ID data network processing, but also the data volume of subsequent data analysis can be reduced.

According to ID data network beta pruning preprocess method provided in this embodiment, data are carried out to the daily record data of multiple business Analysis, the association frequency being quickly obtained between ID data, for any ID data in ID data network, according to the ID data It is associated with the frequency between the quantity and/or the ID data and other ID data of other ID data being directly linked, to the ID data Incidence relation between other ID data carries out beta pruning pretreatment, effectively and quickly eliminate in ID data network ID data it Between insecure incidence relation so that the incidence relation between ID data in the pretreated ID data network of beta pruning be compared with By force, reliable incidence relation, can not only help to improve the accuracy of ID data network processing, but also can reduce data point The data volume of analysis.Optionally, corresponding time weighting also is introduced for daily record data, by time weighting between ID data The actual association frequency carries out attenuation processing, using numerical value obtained after attenuation processing as the association frequency between ID data, with Just accurately reflect true correlation degree between current period ID data, reference value with higher facilitates accurately Beta pruning pretreatment is carried out to ID data network.

The present invention also provides a kind of ID data network data analysis method, this method comprises: obtain comprising ID data and The ID data network of incidence relation between ID data；The association between ID data and ID data for being included according to ID data network Relationship constructs ID relation data；ID relation data includes several ID relationships pair；Combination is compared to ID relation data, is obtained Several ID data subnets.Wherein, ID data include: User ID data and/or device id data.Below by tool shown in Fig. 4 The ID data network data analysis method is described in body embodiment.

Fig. 4 shows the flow diagram of ID data network data analysis method according to an embodiment of the invention, such as Fig. 4 Shown, this method comprises the following steps:

Step S400 obtains the ID data network comprising the incidence relation between ID data and ID data.

Step S401, the incidence relation between ID data and ID data for being included according to ID data network, building ID are closed Coefficient evidence.

After obtaining ID data network, so that it may the pass between ID data and ID data for being included according to ID data network Connection relationship constructs ID relation data, and constructed ID relation data includes several ID relationships pair, each ID relationship to comprising: two Relationship between a ID and two ID, for example, ID data " a1 " and ID data " b1 " have direct correlation relationship, ID data " a2 " There is direct correlation relationship with ID data " b2 ", ID data " a2 " and ID data " c2 " have direct correlation relationship, then institute's structure The corresponding ID relationship built is to for (a1, b1), (a2, b2) and (a2, c2), and there are two ID to separately including for above-mentioned ID relationship, and Indicate that there is relationship between the two ID with ().With ID relationship to for for (a1, b1), two ID for being included are respectively a1 And b1, the two ID are included together with (), indicate that there is relationship between the two ID.The institute for being included for ID data network Incidence relation between some ID data and all ID data constructs several ID relationships pair using above-mentioned construction method, from And complete the building of ID relation data.

Step S402, full dose replicate ID relation data into memory.

Before combination is compared, need full dose duplication ID relation data into memory, so that including complete in memory The ID relation data of amount, so as to which combination quickly and easily is compared to ID relation data.

ID relation data is compared with the ID relation data that full dose copies in memory and combines by step S403, according to It compares combined result and carries out Data Integration, obtain several ID data subnets.

It, can be by each of ID relation data ID relationship pair after ID relation data full dose is copied in memory The ID relation data copied in memory with full dose respectively, which is compared, to be combined, then whole according to combined result progress data are compared It closes, obtains several ID data subnets.Wherein, for each of ID relation data ID relationship pair, by comparing from memory ID relation data in find with the ID relationship at least exist an identical ID ID relationship pair, wrapped according to ID relationship centering The relationship between two ID contained, the ID of the ID relationship centering and the ID for the ID relationship centering found are combined, obtained Intermediate result is combined in the comparison of the ID relationship pair.For example, being closed to (a2, b2) by the ID compared from memory for ID relationship Coefficient according in find with the ID relationship to (a2, b2) there are the ID relationship of at least one identical ID to include ID relationship to (a2, B2) and ID relationship is to (a2, c2), then the ID of the ID relationship centering and the ID for the ID relationship centering found are combined, The obtained ID relationship is " c2-a2-b2 " to the comparison combination intermediate result of (a2, b2), wherein the "-" between two ID Indicate that there is relationship between two ID.

The case where may possibly still be present non-complete combination in view of obtained comparison combination intermediate result, will then own The comparison combination intermediate result continuation of ID relationship pair is compared with the ID relation data that full dose copies in memory combines, and obtains Intermediate result is combined in the comparison of next iteration operation, and iteration executes this step, until meeting default iterated conditional.When iteration mistake After journey, obtain comparing combined result.Wherein, it compares and is had recorded in multiple groups ID and every group of ID between ID in combined result Relationship includes one or more ID in every group of ID.According in the multiple groups ID and every group of ID compared in combined result between ID Relationship carries out Data Integration, obtains several ID data subnets, specifically, for any group compared in combined result in multiple groups ID ID carries out Data Integration according to the relationship between ID in this group of ID, is integrated into an ID data subnet.

Optionally, ID relation data can be divided into multiple fragments, combination is concurrently compared by fragment, with into one Step improves ID data network data analysis efficiency.The ID relation data multiple fragments concurrently copied to full dose in memory carries out Combination is compared, the comparison combined result of all fragments is obtained, the comparison combined result of all fragments is then subjected to Data Integration, Obtain several ID data subnets.The comparison combined result of all fragments has recorded the pass in multiple groups ID and every group of ID between ID System carries out Data Integration according to the relationship in the multiple groups ID and every group of ID in the comparison combined result of all fragments between ID, Obtain several ID data subnets.Wherein, for any fragment, by the fragment and full dose copy to the ID relation data in memory into Row compares combination, obtains the comparison combination intermediate result of the fragment.Specifically, for each of fragment ID relationship pair, It is found from the ID relation data in memory with the ID relationship by comparing to the ID relationship pair that at least there is an identical ID, is pressed According to the relationship between two ID included in ID relationship pair, by the ID of the ID relationship centering and the ID relationship centering found ID is combined, and the comparison combination intermediate result of the ID relationship pair is obtained, until all ID relationships are to being completed in the fragment It is combined with the comparison of the ID relation data in memory, obtains the comparison combination intermediate result of the fragment, the comparison combination of the fragment Intermediate result includes: the comparison combination intermediate result of all ID relationships pair in the fragment.

The case where may possibly still be present non-complete combination in view of the comparison combination intermediate result of obtained all fragments, The present invention is after intermediate result is combined in the comparison for obtaining all fragments, and iteration executes following intermediate comparison step, until symbol Close default iterated conditional, wherein centre compares step are as follows: the comparison combination intermediate result of all fragments is divided into multiple centres Sub- fragment, and the ID relation data that the sub- fragment in multiple centres concurrently copies in memory with full dose is compared and is combined, it obtains Intermediate result is combined in the comparison of all fragments run to next iteration.After iterative process, all fragments are obtained Compare combined result.In such a way that above-mentioned iteration executes, it the comparison of fragment can combine intermediate result and carry out fully group It closes, to carry out Data Integration.Those skilled in the art can according to actual needs be configured default iterated conditional, herein not It limits.For example, default iterated conditional can include: the number of iterations reaches default the number of iterations, wherein those skilled in the art can Default the number of iterations is set according to actual needs, such as sets 3 for default the number of iterations.

It, can be based on the ID data that ID data network is included according to ID data network data analysis method provided in this embodiment And the incidence relation between ID data, ID relation data is constructed, then copies to ID relation data and full dose in memory Combination is compared in ID relation data, carries out Data Integration according to combined result is compared, accurately and rapidly obtains several ID data Subnet, to realize effective division to ID data network.Optionally, ID relation data can be also divided into multiple fragments, led to It crosses the ID relation data that fragment concurrently copies in memory with full dose and is compared and combine, further improve ID data netting index According to analysis efficiency.Compared with ID data network, the ID data that ID data subnet is included have stronger, reliable incidence relation, It can recognize the ID data for same user, accurately and rapidly user characteristics can be analyzed based on ID data subnet, with structure Build complete, effective user's portrait.

The present invention also provides another ID data network data analysis method, this method comprises: obtain comprising ID data with And the ID data network of the incidence relation between ID data；The pass between ID data and ID data for being included according to ID data network Connection relationship constructs ID relation data；ID relation data includes several ID relationships pair, and each ID relationship is to including two ID and two Relationship between a ID；ID relation data is grouped, several ID data subnets are obtained.Wherein, ID data include: User ID Data and/or device id data.The ID data network data analysis method is retouched below by specific embodiment shown in fig. 5 It states.

Fig. 5 a shows the flow diagram of ID data network data analysis method in accordance with another embodiment of the present invention, such as Shown in Fig. 5 a, this method comprises the following steps:

Step S500 obtains the ID data network comprising the incidence relation between ID data and ID data.

Step S501, the incidence relation between ID data and ID data for being included according to ID data network, building ID are closed Coefficient evidence.

Wherein, ID relation data includes several ID relationships pair, each ID relationship to comprising: between two ID and two ID Relationship.Description in embodiment illustrated in fig. 4 to step S401 can refer to the description of the step, details are not described herein again.

Step S502 obtains each ID relationship to institute by each ID relationship to oriented positive sequence and the processing of oriented backward is carried out The oriented relationship pair of corresponding two ID.

For the ease of being grouped processing, the present invention is provided with oriented positive sequence processing method and oriented backward processing method, Specifically, positive sequence is set by the sequence of left side ID to right side ID by the centering of ID relationship, by the centering of ID relationship by right side ID a to left side Side ID is set as backward, and two ID of ID relationship centering are ranked up referred to as oriented positive sequence according to positive sequence and are handled, by ID relationship Two ID of centering are ranked up referred to as oriented backward according to backward and handle.By each ID relationship to carrying out oriented positive sequence and oriented After backward processing, each ID relationship can be obtained to two corresponding oriented relationships pair of ID.In order to easily know ID Oriented relationship can be the oriented relationship of each ID to setting relationship position, wherein same ID is closed to whether same ID relationship pair is corresponded to It is, different ID relationships pass to corresponding ID oriented relationship pair identical to the relationship position of two corresponding oriented relationships pair of ID It is position difference.

It wherein, can be as shown in Figure 5 b to the processing schematic for carrying out oriented positive sequence and oriented backward to ID relationship.Fig. 5 b's Left part show ID relationship included by ID relation data to for (a1, b1), (a2, b2), (a2, c2), (a3, b3), (a3, c3) and (c3, d3).For ID relationship to (a1, b1), (a1, b1) is subjected to oriented positive sequence processing, obtains the oriented relationship of ID To (a1-b1-01), (a1, b1) is subjected to oriented backward processing, obtains the oriented relationship of ID to (b1-a1-01), then ID is oriented Relationship is ID relationship to two oriented passes ID corresponding to (a1, b1) to (b1-a1-01) to (a1-b1-01) and the oriented relationship of ID System pair, wherein the relationship position of the oriented relationship centering of the two ID is identical, and is all 01.In the manner described above, respectively to (a2, B2), (a2, c2), (a3, b3), (a3, c3) and (c3, d3) carries out oriented positive sequence and the processing of oriented backward, to obtain Fig. 5 b's The oriented relationship pair of ID shown in right part.Any oriented relationship centering of ID determines major key ID according to preset rules.Ability Preset rules can be arranged in field technique personnel according to actual needs, herein without limitation.For example, preset rules include: that ID is oriented The ID in the left side of relationship centering is as major key ID.

Step S503, using according to major key ID group technology, to the oriented relationship of all ID to being grouped, according to group result Obtain several ID data subnets.

Wherein, using according to major key ID group technology, to the oriented relationship of all ID to being grouped, several first points are obtained Group；For any first grouping, which is determined according to the quantity of the included oriented relationship pair of ID of first grouping Meter digital；Extract meter digital be the first count value at least one first grouping, according to relationship position to it is extracted at least one The included oriented relationship of ID of first grouping obtains at least one the first ID data subnet to processing is combined；First ID number The quantity for the ID data for being included according to subnet is 2.Wherein, the first count value is 1.

By taking the oriented relationship of all ID is to the oriented relationship pair of ID shown in the right part for Fig. 5 b as an example, according to the oriented pass ID The ID for being the left side of centering is major key ID, will be led to the oriented relationship of all ID to being grouped by groupByKey method The oriented relationship of the identical ID of key ID is to one first grouping is divided into, to obtain several first groupings, this several first grouping divides Wei not include the oriented relationship of ID to first grouping 1 of (a1-b1-01), include the oriented relationship of ID to (a2-b2-02) and (a2-c2-03) first is grouped 2, includes the oriented relationship of ID to first grouping 3 of (a3-b3-04) and (a3-c3-05), packet 4 are grouped to the first of (b1-a1-01) containing the oriented relationship of ID, include first grouping of the oriented relationship of ID to (b2-a2-02) 5, include the oriented relationship of ID to first grouping 6 of (b3-a3-04), include the oriented relationship of ID to the first of (c2-a2-03) Grouping 7 includes the oriented relationship of ID to first grouping 8 of (c3-a3-05) and (c3-d3-06) and includes the oriented relationship of ID 9 are grouped to the first of (d3-c3-06).Then for any one the first grouping, according to first grouping, included ID is oriented The quantity of relationship pair determines the meter digital of first grouping, wherein first grouping the 4, first 5, first points of grouping of the 1, first grouping The meter digital of group the 6, first grouping 7 and the first grouping 9 is 1, the counting of first grouping the 2, first grouping 3 and the first grouping 8 Position is 2.

The first grouping that meter digital is 1 is extracted from 1 to the first grouping 9 of the first grouping, extracted first grouping includes First the 1, first grouping of grouping the 5, first grouping of the 4, first grouping the 6, first grouping 7 and the first grouping 9, it is right then according to relationship position These extracted first included oriented relationships of ID of grouping to being combined processing, that is, by it is extracted these first The oriented relationship of the identical ID in relationship position is combined into a first ID data subnet to group in grouping, and the first ID data subnet is included The quantity of ID data is 2.In the included oriented relationship pair of ID of these extracted the first groupings, the oriented relationship pair of only ID (a1-b1-01) identical with the relationship position of (b1-a1-01), then the oriented relationship of the two ID is combined into a first ID data to group Subnet specifically determines between two nodes using a1 and b1 as node according to the incidence relation between a1 and b1 Connection relationship, to obtain the first ID data subnet.

By above-mentioned packet transaction mode, can quickly and easily obtain included ID data quantity be 2 first ID data subnet.In addition, the present invention can also quickly and easily obtain the 2nd ID data that the quantity of included ID data is 3 Subnet, specific processing mode are as follows:

During above-mentioned packet transaction, after the meter digital that all first groupings have been determined, extracting meter digital is the At least one first grouping of two count values；For extracted any first grouping, according to the included ID of first grouping Oriented relationship pair obtains the corresponding oriented relationship group of ID of first grouping；Each oriented relationship group of ID includes: three ID and three Relationship between a ID；Wherein, major key ID is determined according to preset rules in the oriented relationship group of any ID；It and is the oriented pass each ID System's group setting relationship position；Wherein, the relationship position of the corresponding oriented relationship group of ID of same first grouping is identical, and difference first is grouped The relationship position of the oriented relationship group of corresponding ID is different.Followed by according to major key ID group technology, to the oriented relationship group of all ID It is grouped, obtains several second packets, for any second packet, the oriented relationship group of ID for being included according to the second packet Quantity determine the meter digital of the second packet, then extract at least one second packet that meter digital is third count value, press The oriented relationship group of ID for being included at least one extracted second packet according to relationship position is combined processing, obtains at least one A 2nd ID data subnet；The quantity for the ID data that 2nd ID data subnet is included is 3.Wherein, the second count value is 2, the Three count values are 1.

According to above-mentioned example it is found that the first 7 and of grouping the 1, first grouping the 4, first grouping the 6, first grouping of the 5, first grouping The meter digital of first grouping 9 is 1, and the meter digital of first grouping the 2, first grouping 3 and the first grouping 8 is 2, from first point The first grouping that meter digital is 2 is extracted in 1 to the first grouping 9 of group, extracted first grouping includes the first 2, first points of grouping Group 3 and the first grouping 8.For the first grouping 2, the oriented relationship of ID that the first grouping 2 is included to for (a2-b2-02) and (a2-c2-03), it is oriented to be obtained to (a2-b2-02) and (a2-c2-03) according to the oriented relationship of ID by ID corresponding to the first grouping 2 Relationship group, specifically, the oriented relationship group of ID corresponding to the first grouping 2 includes the oriented relationship group of 3 ID, for example, obtained The oriented relationship group of ID corresponding to first grouping 2 includes the oriented relationship group (a2-b2-c2-001) of ID, the oriented relationship group (b2- of ID ) and the oriented relationship group (c2-a2-b2-001) of ID a2-c2-001, wherein the relationship position in the oriented relationship group of these three ID is identical, It and is all 001.In the manner described above, it respectively obtains corresponding to the oriented relationship group of ID corresponding to the first grouping 3 and the first grouping 8 The oriented relationship group of ID, wherein first grouping 3 corresponding to the oriented relationship group of ID include (a3-b3-c3-002), (b3-a3- C3-002) and (c3-a3-b3-002), the oriented relationship group of ID corresponding to the first grouping 8 includes (c3-a3-d3-003), (a3- ) and (d3-c3-a3-003) c3-d3-003.It is major key ID according to the ID in the left side in the oriented relationship group of ID, passes through GroupByKey method is grouped the oriented relationship group of all ID, i.e., the oriented relationship group of the identical ID of major key ID is divided into one Second packet, to obtain several second packets, this several second packet includes respectively the oriented relationship group (a2-b2- of ID C2-001 second packet 1), include ID oriented relationship group (a3-b3-c3-002) and (a3-c3-d3-003) second packet 2, include the second packet 3 of the oriented relationship group (b2-a2-c2-001) of ID, include the oriented relationship group (b3-a3-c3- of ID 002) second packet 4, includes the oriented relationship of ID at the second packet 5 for including the oriented relationship group (c2-a2-b2-001) of ID The second packet 6 of group (c3-a3-b3-002) and (c3-a3-d3-003) and include the oriented relationship group (d3-c3-a3- of ID 003) second packet 7.Then for any second packet, according to the quantity for the oriented relationship group of ID that the second packet is included Determine the meter digital of the second packet, wherein second packet 1, second packet 3, second packet 4, second packet 5 and second packet 7 meter digital is 1, and the meter digital of second packet 2 and second packet 6 is 2.

The second packet that meter digital is 1 is extracted into second packet 7 from second packet 1, extracted second packet includes Second packet 1, second packet 3, second packet 4, second packet 5 and second packet 7, then according to relationship position to it is extracted this The oriented relationship group of ID that a little second packets are included is combined processing, that is, by relationship in these extracted second packets The identical oriented relationship group group of ID in position is combined into a 2nd ID data subnet, the number for the ID data that the 2nd ID data subnet is included Amount is 3.In the oriented relationship group of ID that these extracted second packets are included, the oriented relationship group (a2-b2-c2- of only ID 001), (b2-a2-c2-001) is identical with the relationship position of (c2-a2-b2-001), then is combined into the oriented relationship group group of these three ID One the 2nd ID data subnet specifically using a2, b2 and c2 as node, is closed according to the association between a2, b2 and c2 System, determines the connection relationship between three nodes, obtains the 2nd ID data subnet, specifically, can be according to the oriented relationship group of ID (a2-b2-c2-001), the oriented relationship of ID corresponding to (b2-a2-c2-001) and (c2-a2-b2-001) to (a2-b2-02) and (a2-c2-03), it determines the connection relationship between tri- nodes of a2, b2 and c2, node a2 is connected with node b2, by node a2 It is connected with node c2, to obtain the 2nd ID data subnet.

By above-mentioned packet transaction mode, can quickly and easily obtain included ID data quantity be 2 first The 2nd ID data subnet that the quantity of ID data subnet and the ID data for being included is 3, certain those skilled in the art can also join According to above-mentioned packet transaction mode and so on, other ID data subnets that the quantity of included ID data is 4,5,6 etc. are obtained, Details are not described herein again.

It, can be based on the ID data that ID data network is included according to ID data network data analysis method provided in this embodiment And the incidence relation between ID data, ID relation data is constructed, is then handled by oriented positive sequence and oriented backward, obtains ID In relation data then each ID relationship utilizes according to major key ID group technology, to institute two corresponding oriented relationships pair of ID There is the oriented relationship of ID to being grouped, effectively improves ID data network data analysis efficiency, can accurately and rapidly be counted A ID data subnet, to realize effective division to ID data network.Optionally, using the meter digital of obtained grouping with And for the oriented relationship of ID to and the oriented relationship group of ID set by relationship position, can quickly and easily obtain the first ID data Net and the 2nd ID data subnet.

Those skilled in the art can also be by ID data network data analysis method shown in Fig. 5 a and ID data network shown in Fig. 4 Data analysing method combines, and further increases ID data network data analysis efficiency.For example, first with ID data shown in Fig. 5 a Network data analysis method is grouped ID relation data, and the quantity for obtaining included ID data is 2 the first ID data The 2nd ID data subnet that the quantity of net and the ID data for being included is 3 will then remove the first ID data in ID relation data Other ID relationships except ID relationship pair corresponding to net and the 2nd ID data subnet are to multiple fragments are divided into, by multiple fragments The ID relation data concurrently copied in memory with full dose, which is compared, to be combined, and the comparison combined result of all fragments is obtained, Then the comparison combined result of all fragments is subjected to Data Integration, the quantity for obtaining included ID data is 4,5,6 etc. Other ID data subnets.The quantity that included ID data can not only be quickly and easily obtained by this processing mode is 2 The first ID data subnet and the quantity of the ID data that are included be 3 the 2nd ID data subnet, but also effectively reduce Combined data processing amount is compared, ID data network data analysis efficiency is improved.

The present invention also provides a kind of ID data subnet processing methods, this method comprises: calculating in several ID data subnets The quantity for the ID data that each ID data subnet is included；The quantity for extracting included ID data is more than the first preset quantity The ID data subnet of threshold value；It is greater than ID data of the first preset quantity threshold value for the quantity of any included ID data Net is clustered and is divided to the ID data in the ID data subnet, obtains several 3rd ID corresponding to the ID data subnet Data subnet；The quantity for the ID data that 3rd ID data subnet is included is less than or equal to the second preset quantity threshold value.Lead to below Specific embodiment shown in fig. 6 is crossed the ID data subnet processing method is described.

Fig. 6 shows the flow diagram of ID data subnet processing method according to an embodiment of the invention, such as Fig. 6 institute Show, this method comprises the following steps:

Step S600 calculates the quantity for the ID data that each ID data subnet is included in several ID data subnets.

Wherein, several ID data subnets are analyzed by carrying out data to ID data network, and ID data subnet includes There is the incidence relation between ID data and ID data, the quantity for the ID data that ID data subnet is included is far smaller than ID data Net the quantity of included ID data.The quantity for the ID data that may still included in several ID data subnets is more ID data subnet may and be not belonging to same although ID data in these ID data subnets have stronger incidence relation The ID data of one user will lead to if these ID data to be identified as to the ID data of same user based on these ID data subnets Analyzing obtained user characteristics can not situation that is effective, being truly reflected user's reality.In order to further increase these ID data The reliability of subnet also needs that these ID data subnets are further processed.In order to easily from several ID data subnets The ID data subnet handled is found, can first calculate each ID data subnet in several ID data subnets is included The quantity of ID data.

Step S601, the quantity for extracting included ID data is more than the ID data subnet of the first preset quantity threshold value.

After the quantity for calculating the ID data that each ID data subnet is included, mentioned from several ID data subnets Take included ID data quantity be more than the first preset quantity threshold value ID data subnet, wherein those skilled in the art can The first preset quantity threshold value is configured according to actual needs, herein without limitation.For example, can be by the first preset quantity threshold value 50 are set as, then the quantity for extracting included ID data from several ID data subnets is more than 50 ID data subnet.

Step S602 is more than ID data of the first preset quantity threshold value in the quantity of extracted included ID data The ID data subnet that selection one be not selected in net.

After the quantity for being extracted included ID data is more than the ID data subnet of the first preset quantity threshold value, in order to The 3rd ID data subnet can be effectively obtained, is greater than the first preset quantity threshold value for the quantity of any included ID data ID data subnet, the ID data in the ID data subnet are clustered and are divided, are obtained corresponding to the ID data subnet Several 3rd ID data subnets.It specifically, is more than first in the quantity of extracted included ID data in step S602 The ID data subnet that selection one be not selected in the ID data subnet of preset quantity threshold value.

Step S603 carries out data analysis to the daily record data of multiple business corresponding with the ID data subnet, and determining should The association frequency in ID data subnet between ID data.

Wherein, daily record data corresponding with the ID data subnet can be searched from the daily record data of multiple business, specifically, The ID data and other ID data using the business can be recorded for the daily record data of a business, in daily record data, said There is incidence relation, then can be from the daily record data of multiple business between bright ID data and other ID data using the business Daily record data corresponding with ID data in the ID data subnet is searched, by multiple business corresponding with the ID data subnet Daily record data carries out data analysis, is capable of determining that the association frequency in the ID data subnet between ID data.

Specifically, data analysis is carried out to the daily record data of multiple business corresponding with the ID data subnet, calculates the ID The actual association frequency in data subnet between ID data.In practical applications, ID number can be calculated according to the default unit time The actual association frequency between.By taking the default unit time is day as an example, if analyzing to obtain by carrying out data to daily record data, Another ID data in some ID data and the ID data subnet in the ID data subnet have 50 days to have incidence relation, then will The actual association frequency between the two ID data is denoted as 50.According to the method described above, it is calculated each in the ID data subnet The actual association frequency in a ID data and the ID data subnet between other ID data.

In view of in practical applications, there is also multiple users successively to use same industry by same equipment in different times The case where business, the User ID data of this multiple user have incidence relation all between the device id data of the equipment, but in fact The border association frequency can not be truly reflected user corresponding to the equipment current period reality.Therefore, the present invention is ID data pair The daily record data answered introduces corresponding time weighting, according to the actual association frequency between ID data, ID data corresponding day The association frequency between ID data is calculated in the temporal information and time weighting of will data.Wherein, ID data corresponding day How far of the weight size of time weighting corresponding to the will data daily record data corresponding with ID data apart from current time It is related.If the temporal information of the corresponding daily record data of ID data is closer apart from current time, the corresponding daily record data of ID data The weight of corresponding time weighting is bigger；If the temporal information of the corresponding daily record data of ID data is remoter apart from current time, Then the weight of time weighting corresponding to the corresponding daily record data of ID data is smaller.By time weighting to the reality between ID data Border is associated with the frequency and carries out attenuation processing, using numerical value obtained after attenuation processing as the association frequency between ID data.Pass through The association frequency between this obtained ID data of mode, which can accurately reflect between current period ID data, really closes Connection degree, reference value with higher help accurately to cluster the ID data in the ID data subnet.

Step S604, for any ID data in the ID data subnet, according between the ID data and other ID data The association frequency, calculate the distance between the ID data and other ID data.

Wherein, be associated with the frequency bigger, the obtained ID data and other ID between the ID data and other ID data The distance between data are smaller.Specific calculation can be arranged in those skilled in the art according to actual needs, herein without limitation. For example, divided by between the ID data and other ID data the frequency can be associated with preset value, then using obtained numerical value as The distance between the ID data and other ID data.Assuming that preset value is 1, obtained in the ID data subnet through step S603 determination The frequencys that is associated with of ID data " d5 " and the ID data " e5 " in the ID data subnet be 50, then with 1 divided by the association frequency, Numerical value 0.02 is obtained, then regard numerical value 0.02 as the distance between ID data " d5 " and ID data " e5 ".When for the ID number According to any ID data in subnet, be completed the ID data with after the calculating of the distance between other ID data to get arriving The distance between ID data in the ID data subnet.

Step S605, according to the distance between ID data in the ID data subnet and default clustering rule, to the ID ID data in data subnet are clustered, and several cluster set are obtained.

Those skilled in the art can according to actual needs be configured default clustering rule, herein without limitation.For example, Default clustering rule defines default neighborhood radius, predetermined minimum and the second preset quantity threshold value, specifically, according to the ID number According to the distance between ID data in subnet and default neighborhood radius, determine to count from the ID data in the ID data subnet Then a core I D data are directed to any core I D data, search the default neighbour in the ID data subnet in core I D data Other ID data in the radius of domain, and according to the second preset quantity threshold value, by core I D data and other ID numbers found According to being clustered, cluster set is obtained, thus the ID that will there is stronger, more structurally sound incidence relation in the ID data subnet Data clusters are cluster set.

Wherein, for any ID data in the ID data subnet, according between the ID data and other ID data away from From the quantity is more than predetermined minimum by the quantity of other ID data of the calculating in the default neighborhood radius of the ID data ID data are determined as core I D data.For example, default neighborhood radius is 1, predetermined minimum 3 is wrapped in the ID data subnet The ID data contained include " d5 ", " e5 ", " f5 ", " g5 ", " h5 " etc., for ID data " d5 ", according to ID data " d5 " and other The distance between ID data are it is found that the distance between ID data " d5 " and ID data " e5 ", ID data " d5 " and ID data " f5 " The distance between, between the distance between ID data " d5 " and ID data " g5 " and ID data " d5 " and ID data " h5 " away from It is equal at a distance from the ID data in addition to ID data " e5 ", " f5 ", " g5 " and " h5 " from being respectively less than or being equal to 1, ID data " d5 " Greater than 1, then in the default neighborhood radius of ID data " d5 " other existing ID data include ID data " e5 ", " f5 ", " g5 " and " h5 ", the i.e. quantity of other ID data in the corresponding default neighborhood radius of ID data " d5 " are 4, which is more than ID data " d5 " are then determined as core I D data by predetermined minimum.In the manner described above, from the ID number in the ID data subnet All core I D data are determined in.

In determining the ID data subnet after all core I D data, for any core in all core I D data Heart ID data, search other ID data in the ID data subnet in the default neighborhood radius of core I D data, and according to Second preset quantity threshold value clusters core I D data and other ID data found, obtains cluster set.Specifically Ground can be selected from other ID data found according to core I D data and the distance between other ID data found Then access amount gathers core I D data and selected ID data less than the ID data of the second preset quantity threshold value Class obtains a cluster set.For example, the second preset quantity threshold value is 10, the default neighbour in core I D data found The quantity of other ID data in the radius of domain has 15, be greater than the second preset quantity threshold value, then can from 15 found its 9 nearest ID data of selected distance core I D data in his ID data, by core I D data and 9 selected ID numbers According to being clustered, a cluster set is obtained.For another example, other in the default neighborhood radius of core I D data found The quantity of ID data has 8, less than the second preset quantity threshold value, then without from this 8 ID data decimation ID data, it can be direct Core I D data and this 8 ID data are clustered, a cluster set is obtained.

Step S606 gathers according to several clusters, is split to the ID data subnet, and it is right to obtain the ID data subnet institute The several 3rd ID data subnets answered.

After having obtained several cluster set, needs to gather according to several clusters, which is split. In the ID data subnet, gather for any cluster, removes except the ID data and the cluster set in the cluster set Incidence relation between ID data realizes effective segmentation to the ID data subnet, obtains number corresponding to the ID data subnet A 3rd ID data subnet.Specifically, it removes between the ID data in the ID data and other cluster set in the cluster set Incidence relation and removing be not clustered in ID data in the cluster set and the ID data subnet to several clusters set In ID data between incidence relation.For example, ID data " d5 " and another ID number clustered in set in the cluster set According to having incidence relation between " a5 ", the ID data " d5 " in the cluster set are not clustered with the ID data subnet to number also There is incidence relation between ID data " b5 " in a cluster set, then can remove between ID data " d5 " and ID data " a5 " Incidence relation, and remove the incidence relation between ID data " d5 " and ID data " b5 ".

Compared with the quantity for the ID data for being included is greater than the ID data subnet of the first preset quantity threshold value, the 3rd ID data ID data in subnet have stronger, more structurally sound incidence relation, can recognize the ID data for same user, according to third ID data subnet can accurately and efficiently analyze user characteristics, to construct complete, effective user's portrait.And the The quantity that the data volume of three ID data subnets is far smaller than included ID data is greater than the ID data of the first preset quantity threshold value The data volume of subnet, is more convenient for user feature analysis, helps to improve analysis efficiency.It in practical applications, can be ID It needs to remove ID relationship corresponding to the ID data of incidence relation in data subnet to setting dividing mark position, is closed for Tag ID It is relationship between two ID of centering whether is the incidence relation for needing to remove in cutting procedure.If some ID relationship centering Two ID between relationship be the incidence relation for needing to remove in cutting procedure, then by the dividing mark position of the ID relationship pair It is set as 1；If the relationship between two ID of some ID relationship centering is not the incidence relation for needing to remove in cutting procedure, Then 0 is set by the dividing mark position of the ID relationship pair.The two of ID relationship centering can be clearly known by dividing mark position Whether the relationship between a ID is the incidence relation for needing to remove in cutting procedure.

Step S607, judges whether the ID data subnet in extracted ID data subnet is all selected；If so, should Method terminates；If it is not, thening follow the steps S602.

If it is determined that the quantity for obtaining extracted included ID data is more than the ID data of the first preset quantity threshold value ID data subnet in subnet is all selected, and is illustrated for each of extracted ID data subnet ID data subnet all It completes and ID data therein is clustered and divided, then this method terminates；If it is determined that obtaining all not being selected, then hold Row step S602.

According to ID data subnet processing method provided in this embodiment, the quantity for any included ID data is more than The ID data subnet of first preset quantity threshold value, can be according to the association frequency and default clustering rule between ID data, will ID data in the ID data subnet with stronger, more structurally sound incidence relation are gathered for one kind, and are divided to same third In ID data subnet, to obtain corresponding several 3rd ID data subnets, realizes and ID data subnet is effectively treated.With ID data subnet before processing is compared, and the ID data in the 3rd ID data subnet have stronger, more structurally sound incidence relation, It can recognize the ID data for same user, accurately and efficiently user characteristics can be analyzed based on the 3rd ID data subnet, To construct complete, effective user's portrait.And the 3rd the data volume of ID data subnet be far smaller than ID data before handling The data volume of net, is more convenient for user feature analysis, helps to improve analysis efficiency.

Fig. 7 shows the structural block diagram of ID data network processing unit according to an embodiment of the invention, as shown in fig. 7, The device includes: to obtain module 710 and ID data network analysis module 720.

It obtains module 710 to be suitable for: obtaining the ID data network comprising the incidence relation between ID data and ID data；ID number According to including: User ID data and/or device id data.

ID data network analysis module 720 is suitable for: carrying out data analysis to ID data network, obtains several ID data subnets；Its The quantity of the middle ID data for being included according to ID data subnet concentrates several ID data sub-network divisions to n ID data subnet, n For the natural number greater than 0；The quantity for the ID data that the ID data subnet that different ID data subnets are concentrated is included is different.

Optionally, device further include: daily record data analysis module 730 is carried out suitable for the daily record data to multiple business Data analysis, determines the incidence relation between ID data and ID data；Constructing module 740 is suitable for using ID data as node, According to the incidence relation between ID data, the connection relationship between node is determined, construction obtains ID data network.

Optionally, device further include: beta pruning preprocessing module 750 is suitable for carrying out beta pruning pretreatment to ID data network, obtain To the pretreated ID data network of beta pruning；ID data network analysis module 720 is further adapted for: ID data pretreated to beta pruning Net carries out data analysis, obtains several ID data subnets.

Optionally, beta pruning preprocessing module 750 is further adapted for: data analysis is carried out to the daily record data of multiple business, Obtain the association frequency between ID data；For any ID data in ID data network, according to what is be directly linked with the ID data It is associated with the frequency between the quantity of other ID data and/or the ID data and other ID data, to the ID data and other ID numbers Incidence relation between carries out beta pruning pretreatment；Obtain the pretreated ID data network of beta pruning.

Optionally, beta pruning preprocessing module 750 is further adapted for: data analysis is carried out to the daily record data of multiple business, Calculate the actual association frequency between ID data；According to the actual association frequency between ID data, the corresponding log number of ID data According to temporal information and time weighting, the association frequency between ID data is calculated.

Optionally, beta pruning preprocessing module 750 is further adapted for: other ID data that judgement is directly linked with the ID data Quantity whether be greater than first threshold and be associated with the frequency less than or equal to second between the ID data and other any ID data Threshold value；If so, removing the incidence relation between the ID data and other any ID data.Beta pruning preprocessing module 750 into One step is suitable for: the quantity for other ID data that judgement is directly linked with the ID data whether be greater than third threshold value and the ID data with The sum of association frequency between other each ID data is greater than or equal to the 4th threshold value；If so, remove the ID data with it is each Incidence relation between other ID data.Beta pruning preprocessing module 750 is further adapted for: judging the ID data and other each ID Whether the sum of association frequency between data is greater than or equal to the 5th threshold value；If so, removing the ID data and other each ID Incidence relation between data.

Optionally, ID data network analysis module 720 is further adapted for: the ID data and ID for being included according to ID data network Incidence relation between data constructs ID relation data；ID relation data includes several ID relationships pair；Full dose replicates ID relationship number According into memory；ID relation data is compared with the ID relation data that full dose copies in memory and is combined, according to comparison group It closes result and carries out Data Integration, obtain several ID data subnets.

Optionally, ID data network analysis module 720 is further adapted for: ID relation data is divided into multiple fragments；It will be more The ID relation data that a fragment concurrently copies in memory with full dose, which is compared, to be combined, and the comparison combination of all fragments is obtained As a result；The comparison combined result of all fragments is subjected to Data Integration, obtains several ID data subnets.ID data network analysis module 720 are further adapted for: being directed to any fragment, which is compared group with the ID relation data that full dose copies in memory It closes, obtains the comparison combination intermediate result of the fragment；Iteration executes this step, until meeting default iterated conditional: by all points The comparison combination intermediate result of piece is divided into the sub- fragment in multiple centres, and the sub- fragment in multiple centres is concurrently copied to full dose Combination is compared in ID relation data in memory, obtains the intermediate knot of comparison combination of all fragments of next iteration operation Fruit；After iterative process, the comparison combined result of all fragments is obtained.Wherein, default iterated conditional includes: that the number of iterations reaches To default the number of iterations.

Optionally, ID data network analysis module 720 is further adapted for: the ID data and ID for being included according to ID data network Incidence relation between data constructs ID relation data；ID relation data includes several ID relationships pair, and each ID relationship is to packet Contain: the relationship between two ID and two ID；By each ID relationship to oriented positive sequence and the processing of oriented backward is carried out, obtain each ID relationship is to two corresponding oriented relationships pair of ID；Any oriented relationship centering of ID determines major key according to preset rules ID；Using according to major key ID group technology, to the oriented relationship of all ID to being grouped, several ID data are obtained according to group result Subnet.ID data network analysis module 720 is further adapted for: being the oriented relationship of each ID to setting relationship position；Wherein, same ID is closed It is, different ID relationships pass to corresponding ID oriented relationship pair identical to the relationship position of two corresponding oriented relationships pair of ID It is position difference；Using according to major key ID group technology, to the oriented relationship of all ID to being grouped, several first groupings are obtained；Needle To any first grouping, the counting of first grouping is determined according to the quantity of the included oriented relationship pair of ID of first grouping Position；Extract meter digital be the first count value at least one first grouping, according to relationship position to it is extracted at least one first The included oriented relationship of ID is grouped to processing is combined, obtains at least one the first ID data subnet；First ID data The quantity for netting included ID data is 2.

Optionally, ID data network analysis module 720 is further adapted for: extracting at least one that meter digital is the second count value First grouping；For extracted any first grouping, according to the included oriented relationship pair of ID of the first grouping, obtain this The corresponding oriented relationship group of ID of one grouping；Each oriented relationship group of ID includes: the relationship between three ID and three ID；Wherein Major key ID is determined according to preset rules in any oriented relationship group of ID；For each ID oriented relationship group, relationship position is set；Wherein, together The relationship position of the corresponding oriented relationship group of ID of one first grouping is identical, the corresponding oriented relationship group of ID of the first grouping of difference Relationship position is different；Using according to major key ID group technology, the oriented relationship group of all ID is grouped, several second packets are obtained； For any second packet, the quantity of the oriented relationship group of the ID for being included according to the second packet determines the counting of the second packet Position；Extract meter digital be third count value at least one second packet, according to relationship position to it is extracted at least one second It is grouped the included oriented relationship group of ID and is combined processing, obtain at least one the 2nd ID data subnet；2nd ID data The quantity for netting included ID data is 3.

Optionally, the device further include: cluster segmentation module 760, suitable for being directed to the quantity of any included ID data Greater than the ID data subnet of the first preset quantity threshold value, the ID data in the ID data subnet are clustered and divided, are obtained Several 3rd ID data subnets corresponding to the ID data subnet；The quantity for the ID data that 3rd ID data subnet is included is less than Or it is equal to the second preset quantity threshold value.

Optionally, cluster segmentation module 760 is further adapted for: for any ID data in the ID data subnet, according to It is associated with the frequency between the ID data and other ID data, calculates the distance between the ID data and other ID data；According to this The distance between ID data in ID data subnet and default clustering rule gather the ID data in the ID data subnet Class obtains several cluster set；Gather according to several clusters, which is split, the ID data subnet is obtained Corresponding several 3rd ID data subnets.Cluster segmentation module 760 is further adapted for: according to the ID number in the ID data subnet According to the distance between and default neighborhood radius, determine several core I D data from the ID data in the ID data subnet； For any core I D data, other ID numbers in the ID data subnet in the default neighborhood radius of core I D data are searched According to, and according to the second preset quantity threshold value, core I D data and other ID data found are clustered, are clustered Set.Cluster segmentation module 760 is further adapted for: in the ID data subnet, being gathered for any cluster, is removed the cluster set The incidence relation between the ID data in ID data and other cluster set in conjunction；Obtain number corresponding to the ID data subnet A 3rd ID data subnet.

According to ID data network processing unit provided in this embodiment, data point are carried out by the daily record data to multiple business Analysis, can rapidly construct to obtain ID data network；And beta pruning pretreatment is carried out to ID data network, is effectively and quickly eliminated Insecure incidence relation between ID data in ID data network can not only help to improve the accuracy of ID data network processing, But also the data volume of data analysis can be reduced；In addition, between the ID data and ID data that are included to ID data network Incidence relation carries out data analysis, ID data network rapidly can be divided into several ID data subnets, ID data subnet is wrapped The ID data contained have stronger, reliable incidence relation, can recognize the ID data for same user, are based on ID data subnet energy It is enough that accurately and rapidly user characteristics are analyzed, to construct complete, effective user's portrait.

Fig. 8 shows the structural block diagram of ID data network beta pruning pretreatment unit according to an embodiment of the invention, such as Fig. 8 Shown, which includes: to obtain module 810 and beta pruning preprocessing module 820.

It obtains module 810 to be suitable for: obtaining the ID data network comprising the incidence relation between ID data and ID data；ID number According to including: User ID data and/or device id data.

Beta pruning preprocessing module 820 is suitable for: carrying out beta pruning pretreatment to ID data network, obtains the pretreated ID number of beta pruning According to net.

Optionally, beta pruning preprocessing module 820 is further adapted for: data analysis is carried out to the daily record data of multiple business, Obtain the association frequency between ID data；For any ID data in ID data network, according to what is be directly linked with the ID data It is associated with the frequency between the quantity of other ID data and/or the ID data and other ID data, to the ID data and other ID numbers Incidence relation between carries out beta pruning pretreatment；Obtain the pretreated ID data network of beta pruning.

Optionally, beta pruning preprocessing module 820 is further adapted for: data analysis is carried out to the daily record data of multiple business, Calculate the actual association frequency between ID data；According to the actual association frequency between ID data, the corresponding log number of ID data According to temporal information and time weighting, the association frequency between ID data is calculated.Beta pruning preprocessing module 820 is further Be suitable for: judge the quantity for other ID data being directly linked with the ID data whether be greater than first threshold and the ID data with it is any The association frequency between other ID data is less than or equal to second threshold；If so, removing the ID data and other any ID Incidence relation between data.Beta pruning preprocessing module 820 is further adapted for: other ID that judgement is directly linked with the ID data The quantity of data whether be greater than third threshold value and between the ID data and other each ID data be associated with the sum of frequency be greater than or Equal to the 4th threshold value；If so, removing the incidence relation between the ID data and other each ID data.Beta pruning preprocessing module 820 are further adapted for: judging to be associated with whether the sum of frequency is greater than or equal to the between the ID data and other each ID data Five threshold values；If so, removing the incidence relation between the ID data and other each ID data.

According to ID data network beta pruning pretreatment unit provided in this embodiment, data are carried out to the daily record data of multiple business Analysis, the association frequency being quickly obtained between ID data, for any ID data in ID data network, according to the ID data It is associated with the frequency between the quantity and/or the ID data and other ID data of other ID data being directly linked, to the ID data Incidence relation between other ID data carries out beta pruning pretreatment, effectively and quickly eliminate in ID data network ID data it Between insecure incidence relation so that the incidence relation between ID data in the pretreated ID data network of beta pruning be compared with By force, reliable incidence relation, can not only help to improve the accuracy of ID data network processing, but also can reduce data point The data volume of analysis.Optionally, corresponding time weighting also is introduced for daily record data, by time weighting between ID data The actual association frequency carries out attenuation processing, using numerical value obtained after attenuation processing as the association frequency between ID data, with Just accurately reflect true correlation degree between current period ID data, reference value with higher facilitates accurately Beta pruning pretreatment is carried out to ID data network.

Fig. 9 shows the structural block diagram of ID data network data analytical equipment according to an embodiment of the invention, such as Fig. 9 institute Show, which includes: to obtain module 910, first to construct module 920 and compare composite module 930.

It obtains module 910 to be suitable for: obtaining the ID data network comprising the incidence relation between ID data and ID data；ID number According to including: User ID data and/or device id data.

First building module 920 is suitable for: the association between the ID data and ID data for being included according to ID data network is closed System constructs ID relation data；ID relation data includes several ID relationships pair.

It compares composite module 930 to be suitable for: combination being compared to ID relation data, obtains several ID data subnets.

Optionally, compare composite module 930 to be further adapted for: full dose replicates ID relation data into memory；By ID relationship Data are compared with the ID relation data that full dose copies in memory and combine, and carry out Data Integration according to combined result is compared, Obtain several ID data subnets.It compares composite module 930 to be further adapted for: ID relation data is divided into multiple fragments；It will be more The ID relation data that a fragment concurrently copies in memory with full dose, which is compared, to be combined, and the comparison combination of all fragments is obtained As a result；The comparison combined result of all fragments is subjected to Data Integration, obtains several ID data subnets.Compare composite module 930 It is further adapted for: for any fragment, which being compared with the ID relation data that full dose copies in memory and is combined, is obtained Intermediate result is combined in comparison to the fragment；Iteration executes this step, until meeting default iterated conditional: by the ratio of all fragments The sub- fragment in multiple centres is divided into combination intermediate result, and the sub- fragment in multiple centres is concurrently copied in memory with full dose ID relation data combination is compared, obtain next iteration operation all fragments comparison combination intermediate result；Iteration After process, the comparison combined result of all fragments is obtained.Wherein, default iterated conditional includes: that the number of iterations reaches default The number of iterations.

It, can be based on the ID data that ID data network is included according to ID data network data analytical equipment provided in this embodiment And the incidence relation between ID data, ID relation data is constructed, then copies to ID relation data and full dose in memory Combination is compared in ID relation data, carries out Data Integration according to combined result is compared, accurately and rapidly obtains several ID data Subnet, to realize effective division to ID data network.Optionally, ID relation data can be also divided into multiple fragments, led to It crosses the ID relation data that fragment concurrently copies in memory with full dose and is compared and combine, further improve ID data netting index According to analysis efficiency.Compared with ID data network, the ID data that ID data subnet is included have stronger, reliable incidence relation, It can recognize the ID data for same user, accurately and rapidly user characteristics can be analyzed based on ID data subnet, with structure Build complete, effective user's portrait.

Figure 10 shows the structural block diagram of ID data network data analytical equipment in accordance with another embodiment of the present invention, such as schemes Shown in 10, which includes: to obtain module 1010, second to construct module 1020 and grouping module 1030.

It obtains module 1010 to be suitable for: obtaining the ID data network comprising the incidence relation between ID data and ID data；ID Data include: User ID data and/or device id data.

Second building module 1020 is suitable for: the association between the ID data and ID data for being included according to ID data network is closed System constructs ID relation data；ID relation data includes several ID relationships pair, and each ID relationship is to including two ID and two ID Between relationship.

Grouping module 1030 is suitable for: being grouped to ID relation data, obtains several ID data subnets.

Optionally, grouping module 1030 is further adapted for: by each ID relationship to carrying out at oriented positive sequence and oriented backward Reason, obtains each ID relationship to two corresponding oriented relationships pair of ID；Any oriented relationship centering of ID is according to default rule Then determine major key ID；It is obtained to the oriented relationship of all ID to being grouped according to group result using according to major key ID group technology Several ID data subnets.Grouping module 1030 is further adapted for: being the oriented relationship of each ID to setting relationship position；Wherein, same ID relationship is identical to the relationship position of two corresponding oriented relationships pair of ID, and different ID relationships are to the corresponding oriented relationship pair of ID Relationship position it is different；Using according to major key ID group technology, to the oriented relationship of all ID to being grouped, several first points are obtained Group；For any first grouping, which is determined according to the quantity of the included oriented relationship pair of ID of first grouping Meter digital；Extract meter digital be the first count value at least one first grouping, according to relationship position to it is extracted at least one The included oriented relationship of ID of first grouping obtains at least one the first ID data subnet to processing is combined；First ID number The quantity for the ID data for being included according to subnet is 2.

Optionally, grouping module 1030 is further adapted for: extracting at least one first point that meter digital is the second count value Group；For extracted any first grouping, according to the included oriented relationship pair of ID of first grouping, first grouping is obtained The oriented relationship group of corresponding ID；Each oriented relationship group of ID includes: the relationship between three ID and three ID；Any ID Major key ID is determined according to preset rules in oriented relationship group；For each ID oriented relationship group, relationship position is set；Wherein, same first The relationship position of the corresponding oriented relationship group of ID of grouping is identical, the relationship position of the corresponding oriented relationship group of ID of the first grouping of difference It is different；Using according to major key ID group technology, the oriented relationship group of all ID is grouped, several second packets are obtained；For appoint The quantity of one second packet, the oriented relationship group of the ID for being included according to the second packet determines the meter digital of the second packet；It mentions Taking meter digital is at least one second packet of third count value, according to relationship position at least one extracted second packet institute The oriented relationship group of the ID for including is combined processing, obtains at least one the 2nd ID data subnet；2nd ID data subnet is wrapped The quantity of the ID data contained is 3.

It, can be based on the ID data that ID data network is included according to ID data network data analytical equipment provided in this embodiment And the incidence relation between ID data, ID relation data is constructed, is then handled by oriented positive sequence and oriented backward, obtains ID In relation data then each ID relationship utilizes according to major key ID group technology, to institute two corresponding oriented relationships pair of ID There is the oriented relationship of ID to being grouped, effectively improves ID data network data analysis efficiency, can accurately and rapidly be counted A ID data subnet, to realize effective division to ID data network.Optionally, using the meter digital of obtained grouping with And for the oriented relationship of ID to and the oriented relationship group of ID set by relationship position, can quickly and easily obtain the first ID data Net and the 2nd ID data subnet.

Figure 11 shows the structural block diagram of ID data subnet processing unit according to an embodiment of the invention, such as Figure 11 institute Show, which includes: computing module 1110, extraction module 1120 and cluster segmentation module 1130.

Computing module 1110 is suitable for: calculating the ID data that each ID data subnet is included in several ID data subnets Quantity.

Extraction module 1120 is suitable for: the quantity for extracting included ID data is more than the ID data of the first preset quantity threshold value Subnet.

Cluster segmentation module 1130 is suitable for: being greater than the first preset quantity threshold value for the quantity of any included ID data ID data subnet, the ID data in the ID data subnet are clustered and are divided, are obtained corresponding to the ID data subnet Several 3rd ID data subnets；The quantity for the ID data that 3rd ID data subnet is included is less than or equal to the second preset quantity threshold Value.

Optionally, cluster segmentation module 1130 is further adapted for: for any ID data in the ID data subnet, according to It is associated with the frequency between the ID data and other ID data, calculates the distance between the ID data and other ID data；According to this The distance between ID data in ID data subnet and default clustering rule gather the ID data in the ID data subnet Class obtains several cluster set；Gather according to several clusters, which is split, the ID data subnet is obtained Corresponding several 3rd ID data subnets.

Optionally, the device further include: association frequency determining module 1140, suitable for for any included ID data Quantity is greater than the ID data subnet of the first preset quantity threshold value, to the daily record data of multiple business corresponding with the ID data subnet Data analysis is carried out, determines the association frequency in the ID data subnet between ID data.It is associated with frequency determining module 1140 into one Step is suitable for: carrying out data analysis to the daily record data of multiple business corresponding with the ID data subnet, calculates the ID data subnet The actual association frequency between middle ID data；According to the actual association frequency between ID data, the corresponding daily record data of ID data Temporal information and time weighting, the association frequency between ID data is calculated.

Optionally, cluster segmentation module 1130 is further adapted for: according between the ID data in the ID data subnet away from From and default neighborhood radius, determine several core I D data from the ID data in the ID data subnet；For any core Heart ID data, search other ID data in the ID data subnet in the default neighborhood radius of core I D data, and according to Second preset quantity threshold value clusters core I D data and other ID data found, obtains cluster set.Cluster Segmentation module 1130 is further adapted for: in the ID data subnet, being gathered for any cluster, is removed the ID in the cluster set The incidence relation between ID data except data and the cluster set；Obtain several 3rd ID corresponding to the ID data subnet Data subnet.

According to ID data subnet processing unit provided in this embodiment, the quantity for any included ID data is more than The ID data subnet of first preset quantity threshold value, can be according to the association frequency and default clustering rule between ID data, will ID data in the ID data subnet with stronger, more structurally sound incidence relation are gathered for one kind, and are divided to same third In ID data subnet, to obtain corresponding several 3rd ID data subnets, realizes and ID data subnet is effectively treated.With ID data subnet before processing is compared, and the ID data in the 3rd ID data subnet have stronger, more structurally sound incidence relation, It can recognize the ID data for same user, accurately and efficiently user characteristics can be analyzed based on the 3rd ID data subnet, To construct complete, effective user's portrait.And the 3rd the data volume of ID data subnet be far smaller than ID data before handling The data volume of net, is more convenient for user feature analysis, helps to improve analysis efficiency.

The present invention also provides a kind of nonvolatile computer storage media, computer storage medium is stored at least one can It executes instruction, the ID data network data analysis method in above-mentioned any means embodiment can be performed in executable instruction.

Figure 12 shows a kind of structural schematic diagram for calculating equipment according to an embodiment of the present invention, the specific embodiment of the invention The specific implementation for calculating equipment is not limited.As shown in figure 12, which may include: processor (processor) 1202, communication interface (Communications Interface) 1204, memory (memory) 1206, with And communication bus 1208.Wherein: processor 1202, communication interface 1204 and memory 1206 are complete by communication bus 1208 At mutual communication.Communication interface 1204, for being communicated with the network element of other equipment such as client or other servers etc.. Processor 1202 can specifically execute the phase in above-mentioned ID data network data analysis method embodiment for executing program 1210 Close step.Specifically, program 1210 may include program code, which includes computer operation instruction.

Processor 1202 may be central processor CPU or specific integrated circuit ASIC (Application Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention Road.The one or more processors that equipment includes are calculated, can be same type of processor, such as one or more CPU；It can also To be different types of processor, such as one or more CPU and one or more ASIC.Memory 1206, for storing journey Sequence 1210.Memory 1206 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non- Volatile memory), a for example, at least magnetic disk storage.Program 1210 specifically can be used for so that processor 1202 is held ID data network data analysis method in the above-mentioned any means embodiment of row.The specific implementation of each step can be joined in program 1210 See corresponding description in the corresponding steps in above-mentioned ID data network data analysis embodiment and unit, this will not be repeated here.Affiliated neck The technical staff in domain can be understood that, for convenience and simplicity of description, the equipment of foregoing description and the specific work of module Make process, can refer to corresponding processes in the foregoing method embodiment description, details are not described herein.

Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein. Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.

In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.Similarly, it should be understood that in order to simplify the disclosure and help to understand each One or more of a inventive aspect, in the above description of the exemplary embodiment of the present invention, each spy of the invention Sign is grouped together into a single embodiment, figure, or description thereof sometimes.However, should not be by the method solution of the disclosure It is interpreted into and reflects an intention that i.e. the claimed invention requires more than feature expressly recited in each claim More features.More precisely, as the following claims reflect, inventive aspect is less than single reality disclosed above Apply all features of example.Therefore, it then follows thus claims of specific embodiment are expressly incorporated in the specific embodiment, It is wherein each that the claims themselves are regarded as separate embodiments of the invention.

Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of any Can in any combination mode come using.

Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) realize one of some or all components according to embodiments of the present invention A little or repertoire.The present invention is also implemented as setting for executing some or all of method as described herein Standby or program of device (for example, computer program and computer program product).It is such to realize that program of the invention deposit Storage on a computer-readable medium, or may be in the form of one or more signals.Such signal can be from because of spy It downloads and obtains on net website, be perhaps provided on the carrier signal or be provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.

Claims

1. a kind of ID data network data analysis method, which comprises

Obtain the ID data network comprising the incidence relation between ID data and ID data；The ID data include: User ID number According to and/or device id data；

The incidence relation between ID data and ID data for being included according to the ID data network constructs ID relation data；Institute Stating ID relation data includes several ID relationships pair；

Combination is compared to the ID relation data, obtains several ID data subnets.

2. it is described that combination is compared to the ID relation data according to the method described in claim 1, wherein, it obtains several ID data subnet further comprises:

Full dose replicates the ID relation data into memory；

The ID relation data is compared with the ID relation data that full dose copies in memory and is combined, is tied according to combination is compared Fruit carries out Data Integration, obtains several ID data subnets.

3. described that the ID relation data and full dose are copied to the ID in memory according to the method described in claim 2, wherein Combination is compared in relation data, carries out Data Integration according to combined result is compared, obtains several ID data subnets and further wrap It includes:

The ID relation data is divided into multiple fragments；

The ID relation data that multiple fragments concurrently copy in memory with full dose is compared and is combined, all fragments are obtained Compare combined result；

The comparison combined result of all fragments is subjected to Data Integration, obtains several ID data subnets.

4. according to the method described in claim 3, wherein, the ID multiple fragments concurrently copied to full dose in memory Combination is compared in relation data, and the comparison combined result for obtaining all fragments further comprises:

For any fragment, which is compared with the ID relation data that full dose copies in memory and is combined, this point is obtained Intermediate result is combined in the comparison of piece；

Iteration executes this step, until meeting default iterated conditional: the comparison combination intermediate result of all fragments being divided into more A sub- fragment in centre, and group is compared in the ID relation data that the sub- fragment in multiple centres concurrently copies in memory with full dose It closes, obtains the comparison combination intermediate result of all fragments of next iteration operation；

After iterative process, the comparison combined result of all fragments is obtained.

5. according to the method described in claim 4, wherein, the default iterated conditional includes: that the number of iterations reaches default iteration Number.

6. a kind of ID data network data analytical equipment, described device include:

Module is obtained, suitable for obtaining the ID data network comprising the incidence relation between ID data and ID data；The ID data It include: User ID data and/or device id data；

First building module, suitable for included according to the ID data network ID data and ID data between incidence relation, Construct ID relation data；The ID relation data includes several ID relationships pair；

It compares composite module and obtains several ID data subnets suitable for combination is compared to the ID relation data.

7. device according to claim 6, wherein the comparison composite module is further adapted for:

Full dose replicates the ID relation data into memory；

8. device according to claim 7, wherein the comparison composite module is further adapted for:

The ID relation data is divided into multiple fragments；

9. a kind of calculating equipment, comprising: processor, memory, communication interface and communication bus, the processor, the storage Device and the communication interface complete mutual communication by the communication bus；

The memory executes the processor as right is wanted for storing an at least executable instruction, the executable instruction Ask the corresponding operation of ID data network data analysis method described in any one of 1-5.

10. a kind of computer storage medium, an at least executable instruction, the executable instruction are stored in the storage medium Processor is set to execute the corresponding operation of ID data network data analysis method according to any one of claims 1 to 5.