The content of the invention
The application provides a kind of information correlation method and equipment, is used to realize reducing machine utilization and money
On the premise of source consumes, accurately the information of user is associated;Wherein, specific the application is proposed
A kind of information correlation method, including:
Acquisition is included under each different business scenario the identification marking and attribute information that there is incidence relation
Initial data;
Believe the most attribute information of frequency of occurrence under each business scenario in the initial data as association
Breath is associated with the identification marking, generates related information string;
The type of attribute information to be associated is selected successively according to default association order, and by the original number
The maximum attribute information of judgment value is made in each attribute information of the correspondence type under each business scenario in
For the related information related information string newest with the time that is currently generated is associated, generation related information string with
Realize information association.
Optionally, there is the identification marking of incidence relation in the case where acquisition is included in each different business scenario
With the initial data of attribute information, also include afterwards:
Initial data to getting carries out carrying out classification integration treatment, generation according to the difference of business scenario
Information record table;Wherein, the attribute information that is associated with identification marking is included in described information record sheet
Title and content, and there is business scenario title when associating, time with identification marking and attribute information
And number of times.
Optionally, the attribute letter that frequency of occurrence is most under each business scenario by the initial data
Breath is associated as related information with the identification marking, generates related information string, specifically includes:
According to determining that it is secondary that each attribute information in the initial data occurs under each business scenario
Number;
The number of times occurred under same business scenario by relatively more each attribute information, determines each business
The most attribute information of occurrence number under scape;
By the number of times of the most attribute information of occurrence number under relatively more each business scenario, it is determined that in the original
The most attribute information of occurrence number is initial attribute information to be associated in beginning data;
The initial attribute information to be associated is associated as related information with the identification marking, it is raw
Into related information string.
Optionally, the type of attribute information to be associated is selected successively according to default association order, and by institute
State under each business scenario in initial data the maximum category of judgment value in each attribute information of the correspondence type
Property information be associated as the related information related information string newest with the time that is currently generated, generation association
Bit string is specifically included with realizing information association:
Select the type of attribute information to be associated successively according to default association order;
Each attribute information corresponding with the type for currently selecting is determined in the initial data;
If attribute information only one of which corresponding with the type of current selection, selects this attribute information
It is associated as the related information related information string newest with the time that is currently generated, generates related information string
To realize information association;
If attribute information corresponding with the type of current selection there are multiple, each attribute letter for determining is calculated
The attribute information ceased in newest with the time that is currently generated related information string under each business scenario occurs simultaneously
Conditional probability;
Each attribute letter is determined based on the conditional probability and the product of the default weight of corresponding business scenario
Cease the judgement subvalue under each business scenario;
That collects the same attribute information of correspondence judges that subvalue determines the judgment value of each attribute information;
Compare the judgment value of each attribute information, determine the maximum attribute information of judgment value, and will determine that value most
Big attribute information is associated as the related information related information string newest with the time that is currently generated, raw
Into related information string realizing information association.
Optionally, the time newest related information string that is currently generated includes N number of attribute information;
Each attribute information for calculating determination is newest with the time that is currently generated under each business scenario to be associated
The conditional probability that attribute information in bit string occurs simultaneously, specifically includes following steps:
Step A, judge each attribute information for determining under each business scenario with to be currently generated the time newest
Whether N number of attribute information exists simultaneously in related information string;
If step B, judged result are, calculate each attribute information of determination under each business scenario with
It is currently generated N number of simultaneous conditional probability of attribute information in time newest related information string;
If step C, judged result are no, N=N-1, and repeat step A are set, until calculating really
Attribute in newest with the time that the is currently generated related information string under each business scenario of fixed each attribute information
The conditional probability that information occurs simultaneously.
The application also proposed a kind of information association equipment, including:
Acquisition module, the identification that there is incidence relation is included under each different business scenario for obtaining
The initial data of mark and attribute information;
First generation module, for frequency of occurrence under each business scenario in the initial data is most
Attribute information is associated as related information with the identification marking, generates related information string;
Second generation module, the class for selecting attribute information to be associated successively according to default association order
Type, and will judge in each attribute information of the correspondence type under each business scenario in the initial data
The maximum attribute information of value is closed as the related information related information string newest with the time that is currently generated
Connection, generates related information string to realize information association.
Optionally, the equipment also includes:
Processing module, for the initial data for getting classify according to the difference of business scenario
Integration is processed, and generates information record table;Wherein, include in described information record sheet and closed with identification marking
The title and content of the attribute information of connection, and there is business when associating with identification marking and attribute information
Scene title, time and number of times.
Optionally, first generation module, specifically for:
According to determining that it is secondary that each attribute information in the initial data occurs under each business scenario
Number;
The number of times occurred under same business scenario by relatively more each attribute information, determines each business
The most attribute information of occurrence number under scape;
By the number of times of the most attribute information of occurrence number under relatively more each business scenario, it is determined that in the original
The most attribute information of occurrence number is initial attribute information to be associated in beginning data;
The initial attribute information to be associated is associated as related information with the identification marking, it is raw
Into related information string.
Optionally, second generation module, specifically for:
Select the type of attribute information to be associated successively according to default association order;
Each attribute information corresponding with the type for currently selecting is determined in the initial data;
If attribute information only one of which corresponding with the type of current selection, selects this attribute information
It is associated as the related information related information string newest with the time that is currently generated, generates related information string
To realize information association;
If attribute information corresponding with the type of current selection there are multiple, each attribute letter for determining is calculated
The attribute information ceased in newest with the time that is currently generated related information string under each business scenario occurs simultaneously
Conditional probability;
Each attribute letter is determined based on the conditional probability and the product of the default weight of corresponding business scenario
Cease the judgement subvalue under each business scenario;
That collects the same attribute information of correspondence judges that subvalue determines the judgment value of each attribute information;
Compare the judgment value of each attribute information, determine the maximum attribute information of judgment value, and will determine that value most
Big attribute information is associated as the related information related information string newest with the time that is currently generated, raw
Into related information string realizing information association.
Optionally, the time newest related information string that is currently generated includes N number of attribute information;
Second generation module calculates each attribute information for determining under each business scenario and when being currently generated
Between the conditional probability that occurs simultaneously of attribute information in newest related information string, specifically include following steps:
Step A, judge each attribute information for determining under each business scenario with to be currently generated the time newest
Whether N number of attribute information exists simultaneously in related information string;
If step B, judged result are, calculate each attribute information of determination under each business scenario with
It is currently generated N number of simultaneous conditional probability of attribute information in time newest related information string;
If step C, judged result are no, N=N-1, and repeat step A are set, until calculating really
Attribute in newest with the time that the is currently generated related information string under each business scenario of fixed each attribute information
The conditional probability that information occurs simultaneously
Compared with prior art, the application is closed by obtaining to be included in exist under each different business scenario
The identification marking of connection relation and the initial data of attribute information;By each business scenario in the initial data
The most attribute information of lower frequency of occurrence is associated as related information with the identification marking, and generation is closed
Connection bit string;Select the type of attribute information to be associated successively according to default association order, and will be described
The maximum attribute of judgment value in each attribute information of the correspondence type under each business scenario in initial data
Information is associated as the related information related information string newest with the time that is currently generated, generation association letter
Breath goes here and there to realize information association.Realize on the premise of machine utilization and resource consumption is reduced, accurately
The information of user is associated.
Specific embodiment
For defect of the prior art, the embodiment of the present application discloses a kind of information correlation method, realizes
On the premise of machine utilization and resource consumption is reduced, accurately the information of user is associated,
Specifically, as shown in figure 1, the method is comprised the following steps:
Step 101, acquisition be included under each different business scenario exist the identification marking of incidence relation with
The initial data of attribute information.
In a specific embodiment, the database of such as shopping website, business scenario therein just has
May log in, certification, logistics etc., and due to the database of shopping website in various information be all with
Account is carrier, therefore account can be set into identification marking.Certain identification marking is not limited to this,
If what is for example got is the database of mobile operator, data therein are made with mobile communication number
It is carrier, therefore can also be using mobile communication number as identification marking.And initial data refers to it is same
There are the data of incidence relation in one identification marking, such as identification marking is account 1, the then original number for obtaining
According to the account 1 and attribute information that there is to be included under various different business scenes relation.
And after initial data is obtained, for the ease of subsequently carrying out data extraction, can be to getting
Data carry out classification integration treatment, specially:
Initial data to getting carries out carrying out classification integration treatment, generation according to the difference of business scenario
Information record table;Wherein, the title of the attribute information associated with identification marking is included in information record table
With content, and there is business scenario title when associating with identification marking and attribute information, the time and
Number of times.
Specifically, illustrated using payment accounts a001 as identification marking herein, for this tool
The identification marking of body, the information record table of generation can be as shown in table 1:
Table 1
And in specific embodiment, if desired process the initial data of multiple identification markings, such as except branch
The initial data of the number of paying a bill a001, the also initial data of payment accounts a002, the information note for thus generating
Record table can carry out longitudinal expansion, as shown in table 2:
Table 2
Step 102, using the most attribute information of frequency of occurrence under each business scenario in initial data as pass
Connection information is associated with identification marking, generates related information string.
Wherein, specifically using the most attribute information of frequency of occurrence under each business scenario in initial data as
Related information is associated with identification marking, generates related information string, specifically includes:
According to the number of times that each attribute information in determination initial data occurs under each business scenario;
By the number of times that relatively each attribute information occurs under same business scenario, under determining each business scenario
The most attribute information of occurrence number;
By the number of times of the most attribute information of occurrence number under relatively more each business scenario, it is determined that in original number
It is initial attribute information to be associated according to the most attribute information of middle occurrence number;
Initial attribute information to be associated is associated as related information with identification marking, generation association letter
Breath string.
In specific processing procedure, enter so that some specific identification marking is payment accounts b001 as an example
Row explanation, such as there are two business scenarios in the corresponding initial data of b001, for example, log in scene
And certification scene.Wherein, in the case where scene is logged in, the attribute information of appearance is cell-phone number:188****8254
(occurrence number is 300 times);And under certification scene, the attribute information of appearance is:Identity card:
3301081975****7598 (occurrence number is 10 times);Name:Zhang San (occurrence number is 20 times);
Bank's card number:40065***5874153 (occurrence number is 51 times).
In the case, log under scene, occurrence number it is most be cell-phone number:188****8254 (occurs
Number of times is 300 times);Under certification scene, occurrence number it is most be bank's card number:40065***5874153
(occurrence number is 51 times).Continue to compare cell-phone number:188****8254 and bank card
Number:The number of times of 40065***5874153, cell-phone number:188****8254 is the corresponding initial data of b001
Middle occurrence number is most, therefore by cell-phone number:188****8254 enters as related information with identification marking
Row association, the related information string of generation can be as shown in table 3.
Table 3
Payment accounts |
Cell-phone number |
b001 |
188****8254 |
Step 103, select the type of attribute information to be associated successively according to default association order, and by original
The maximum attribute information of judgment value is made in each attribute information of corresponding types under each business scenario in beginning data
For the related information related information string newest with the time that is currently generated is associated, generation related information string with
Realize information association.
Wherein, specific process includes:Attribute information to be associated is selected successively according to default association order
Type;
Each attribute information corresponding with the type for currently selecting is determined in initial data;
If attribute information only one of which corresponding with the type of current selection, selects this attribute information
It is associated as the related information related information string newest with the time that is currently generated, generates related information string
To realize information association;
If attribute information corresponding with the type of current selection there are multiple, each attribute letter for determining is calculated
The attribute information ceased in newest with the time that is currently generated related information string under each business scenario occurs simultaneously
Conditional probability;
Determine each attribute information in each industry with the product of the default weight of corresponding business scenario based on conditional probability
Judgement subvalue under business scene;
That collects the same attribute information of correspondence judges that subvalue determines the judgment value of each attribute information;
Compare the judgment value of each attribute information, determine the maximum attribute information of judgment value, and will determine that value most
Big attribute information is associated as the related information related information string newest with the time that is currently generated, raw
Into related information string realizing information association.
Still illustrated with previous example, related information string as shown in table 3 has been generated, in this situation
Under, select the type of attribute information to be associated, such as order to be successively according to default association order
The quantity of the attribute information of name-bank card-identity card, specific order and required association can be based on
Needs are configured, and are illustrated so that the type of attribute information to be associated is as name as an example herein.
Judge the attribute information corresponding to name this type, as shown in table 1, corresponding name this class
The only Zhang San of type, due to there was only this attribute information, it is already possible to absolutely prove the confidence level of Zhang San,
Therefore directly Zhang San can be associated with related information string as shown in table 3.
And if corresponding name this type except Zhang San, also Li Si, it is necessary to carry out subsequent treatment,
Specifically, firstly the need of design conditions probability, the formula of specific conditional probability is P (A Shu B)=P
(AB)/P (B), wherein P (A Shu B) represent that event A has occurred and that condition in another event B
Under probability of happening.
In the specific embodiment, it is assumed that two business scenarios occur, one is logged on scene, also one
Individual is certification scene, is illustrated as a example by logging in scene herein, and P (A Shu B) is represented in payment accounts:
B001 and cell-phone number:In the case that 188****8254 occurs when scene is logged in similarly hereinafter, name:
Three appear in the probability logged under scene;
P (AB) represents payment accounts:B001, cell-phone number:188****8254 and name:Zhang San is same
When the probability that occurs;
P (B) represents payment accounts:B001 and cell-phone number 188****8254 go out when scene is logged in similarly hereinafter
Existing probability.
If calculating in the case where scene is logged in, P (A Shu B)=0.2, the treatment of certification scene is similar to the above,
If it is determined that under certification scene, P (A Shu B)=0.3, and different business scenarios is different to that should have
Weight, it is 0.6 for example to log in the weight of scene, and the weight of certification scene is 0.5.Then name:Zhang San's
Two judge that subvalue is respectively 0.2 × 0.6=0.12 and 0.3 × 0.5=0.15, and final judgment value is
0.12+0.15=0.27.
As for name:Li Si, then carry out above-mentioned and name:Zhang San's identical is operated, it is assumed that finally given
Judgment value be 0.26, due to 0.26 be less than 0.27, therefore selection name:Zhang San as related information with
Related information string association as shown in table 3, the related information string of generation is as shown in table 4.
Table 4
Payment accounts |
Cell-phone number |
Name |
b001 |
188****8254 |
Zhang San |
If also needing to associate other attribute informations, such as bank card etc., it is possible to newly-generated such as table 4
Based on shown related information string, aforesaid operations are performed, respectively needing in presetting association order is closed
The attribute information of connection is all associated and finished.Information association is realized by related information string.
Moreover, it is assumed that being currently generated time newest related information string includes N number of attribute information;Thus
The process of the specific design conditions probability in step 103 is as follows:
Step A, judge each attribute information for determining under each business scenario with to be currently generated the time newest
Whether N number of attribute information exists simultaneously in related information string;
If step B, judged result are, calculate each attribute information of determination under each business scenario with
It is currently generated N number of simultaneous conditional probability of attribute information in time newest related information string;
If step C, judged result are no, N=N-1, and repeat step A are set, until calculating really
Attribute in newest with the time that the is currently generated related information string under each business scenario of fixed each attribute information
The conditional probability that information occurs simultaneously.
Specifically, assume that being currently generated time newest related information string includes 6 attribute informations, first
Judge each attribute information (type of correspondence attribute information to be associated) for determining, it is assumed that type is bank
Card number, correspondence bank card number 1 and bank's card number 2, for bank's card number 1, judge bank's card number 1 whether
Exist simultaneously with this 6 attribute informations under same business scenario, if the determination result is YES, then entered based on this
The calculating of row conditional probability, specific calculation is shown in step 103.
And if judged result is no, then judge bank's card number 1 whether under same business scenario with this 6
Any 5 attribute informations exist simultaneously in individual attribute information, if the determination result is YES, then carry out bar based on this
The calculating of part probability.If judged result is no, judge bank's card number 1 whether under same business scenario
Exist simultaneously with any 4 attribute informations in this 6 attribute informations, by that analogy, until shaping can be calculated
Untill part probability.
Because the situation that the calculating of conditional probability is based on is possible different, therefore, carry out in step 103
On judging subvalue, and judgment value calculating process, with the proviso that the calculating of conditional probability be based on it is identical
Situation.
Specifically, being for example directed to bank's card number 1, its conditional probability 1 judged in subvalue 1 is based on bank card
Number 1 has the calculating that carries out for 1 time simultaneously in business scenario with 6 attribute informations, and judges the bar in subvalue 2
Part probability 2 is the presence of the meter for carrying out for 2 times simultaneously with 5 attribute informations in business scenario based on bank's card number 1
Calculate, so judge subvalue 1 and judge that subvalue 2 can not merge generation judgment value.
Additionally, bank card 1 is directed to, if two conditional probabilities judged in subvalue are all based on same business
There is the calculating that carries out simultaneously with 6 attribute informations under scape, and if the correspondence of bank card 23 judges subvalue,
And this 3 conditional probabilities judged in subvalue are all based under same business scenario with 5 attribute informations simultaneously
In the presence of the calculating for carrying out, in the case, it is not necessary to specifically judge the size of judgment value for finally giving,
Directly can be associated bank card 1 as the related information related information string newest with the time that is currently generated.Its
His is similar, if the conditional probability in judgment value a is to be based on believing with N number of attribute under same business scenario
There is the calculating that carries out simultaneously in breath, and the conditional probability in judgment value b be based under same business scenario with n
There is the calculating for carrying out in individual attribute information, wherein N > n do not need the size of more specific numerical value then simultaneously,
Judgment value a can be higher than judgment value b, the corresponding attribute informations of judgment value a can as related information be currently generated
Time it is newest related information string association.
In order to technological thought of the invention is expanded on further, in conjunction with specific application scenarios, to the present invention
Technical scheme illustrate, the embodiment of the present application also discloses a kind of information association equipment, such as Fig. 2 institutes
Show, including:
Acquisition module 201, the knowledge that there is incidence relation is included under each different business scenario for obtaining
Biao Shi not be with the initial data of attribute information;
First relating module 202, for frequency of occurrence under each business scenario in the initial data is most
Attribute information be associated with the identification marking as related information, generate related information string;
Second relating module 203, for selecting attribute information to be associated successively according to default association order
Type, and will sentence in each attribute information of the correspondence type under each business scenario in the initial data
The maximum attribute information of disconnected value is closed as the related information related information string newest with the time that is currently generated
Connection, generates related information string to realize information association.
Specifically, the equipment, also includes:
Processing module, for the initial data for getting classify according to the difference of business scenario
Integration is processed, and generates information record table;Wherein, include in described information record sheet and closed with identification marking
The title and content of the attribute information of connection, and there is business when associating with identification marking and attribute information
Scene title, time and number of times.
First relating module 202, specifically for:
According to determining that it is secondary that each attribute information in the initial data occurs under each business scenario
Number;
The number of times occurred under same business scenario by relatively more each attribute information, determines each business
The most attribute information of occurrence number under scape;
By the number of times of the most attribute information of occurrence number under relatively more each business scenario, it is determined that in the original
The most attribute information of occurrence number is initial attribute information to be associated in beginning data;
The initial attribute information to be associated is associated as related information with the identification marking, it is raw
Into related information string.
Second relating module 203, specifically for:
Select the type of attribute information to be associated successively according to default association order;
Each attribute information corresponding with the type for currently selecting is determined in the initial data;
If attribute information only one of which corresponding with the type of current selection, selects this attribute information
It is associated as the related information related information string newest with the time that is currently generated, generates related information string
To realize information association;
If attribute information corresponding with the type of current selection there are multiple, each attribute letter for determining is calculated
The attribute information ceased in newest with the time that is currently generated related information string under each business scenario occurs simultaneously
Conditional probability;
Each attribute letter is determined based on the conditional probability and the product of the default weight of corresponding business scenario
Cease the judgement subvalue under each business scenario;
That collects the same attribute information of correspondence judges that subvalue determines the judgment value of each attribute information;
Compare the judgment value of each attribute information, determine the maximum attribute information of judgment value, and will determine that value most
Big attribute information is associated as the related information related information string newest with the time that is currently generated, raw
Into related information string realizing information association.
Specifically, the time newest related information string that is currently generated includes N number of attribute information;
Second relating module 203 calculates each attribute information for determining under each business scenario and works as previous existence
Into the conditional probability that the attribute information in time newest related information string occurs simultaneously, specifically include following
Step:
Step A, judge each attribute information for determining under each business scenario with to be currently generated the time newest
Whether N number of attribute information exists simultaneously in related information string;
If step B, judged result are, calculate each attribute information of determination under each business scenario with
It is currently generated N number of simultaneous conditional probability of attribute information in time newest related information string;
If step C, judged result are no, N=N-1, and repeat step A are set, until calculating really
Attribute in newest with the time that the is currently generated related information string under each business scenario of fixed each attribute information
The conditional probability that information occurs simultaneously.
The application is included under each different business scenario the identification marking that there is incidence relation by obtaining
With the initial data of attribute information;Frequency of occurrence under each business scenario in the initial data is most
Attribute information is associated as related information with the identification marking, generates related information string;According to pre-
If association order selects the type of attribute information to be associated successively, and by each industry in the initial data
The attribute information of judgment value maximum in each attribute information of the type is corresponded under business scene as related information
The related information string newest with the time that is currently generated is associated, and generation related information string is closed with realizing information
Connection.Realize on the premise of machine utilization and resource consumption is reduced, accurately enter the information of user
Row association.
Through the above description of the embodiments, those skilled in the art can be understood that this hair
It is bright to be realized by hardware, it is also possible to be realized by the mode of software plus necessary general hardware platform.
Based on such understanding, technical scheme can be embodied in the form of software product, and this is soft
It (can be CD-ROM, USB flash disk is mobile hard that part product can be stored in a non-volatile memory medium
Disk etc.) in, including some instructions are used to so that a computer equipment (can be personal computer, take
Business device, or the network equipment etc.) perform method described in each implement scene of the invention.
It will be appreciated by those skilled in the art that accompanying drawing is a schematic diagram for being preferable to carry out scene, in accompanying drawing
Module or necessary to flow not necessarily implements the present invention.
It will be appreciated by those skilled in the art that the module in device in implement scene can be according to implement scene
Description be distributed in the device of implement scene, it is also possible to is carried out respective change and is disposed other than this implementation
In one or more devices of scene.The module of above-mentioned implement scene can merge into a module, also may be used
To be further split into multiple submodule.
The invention described above sequence number is for illustration only, and the quality of implement scene is not represented.
Disclosed above is only several specific implementation scenes of the invention, but, the present invention is not limited to
This, the changes that any person skilled in the art can think of should all fall into protection scope of the present invention.