CN106933829A - A kind of information correlation method and equipment - Google Patents

A kind of information correlation method and equipment Download PDF

Info

Publication number
CN106933829A
CN106933829A CN201511017699.3A CN201511017699A CN106933829A CN 106933829 A CN106933829 A CN 106933829A CN 201511017699 A CN201511017699 A CN 201511017699A CN 106933829 A CN106933829 A CN 106933829A
Authority
CN
China
Prior art keywords
attribute information
information
related information
business scenario
under
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201511017699.3A
Other languages
Chinese (zh)
Other versions
CN106933829B (en
Inventor
杜玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201511017699.3A priority Critical patent/CN106933829B/en
Publication of CN106933829A publication Critical patent/CN106933829A/en
Application granted granted Critical
Publication of CN106933829B publication Critical patent/CN106933829B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2425Iterative querying; Query formulation based on the results of a preceding query

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of information correlation method and equipment, the method includes:There is the identification marking of incidence relation and the initial data of attribute information under being included in each different business scenario in acquisition;The most attribute information of frequency of occurrence under each business scenario in the initial data is associated as related information with the identification marking, related information string is generated;Select the type of attribute information to be associated successively according to default association order, and the maximum attribute information of judgment value in each attribute information of the correspondence type under each business scenario in the initial data is associated as the related information related information string newest with the time that is currently generated, related information string is generated to realize information association.Realize on the premise of machine utilization and resource consumption is reduced, be accurately associated the information of user.

Description

A kind of information correlation method and equipment
Technical field
The present invention relates to communication technical field, more particularly to a kind of information correlation method.The application goes back simultaneously It is related to a kind of information association equipment.
Background technology
Information fusion is a kind of cognizable identity attribute and its main behavioural information to be built with specified account Vertical one-to-one method of the contact with mapping.With continuing to develop for Internet technology, people can pass through Each different business scenario carries out activity, and the personal information data of many fragmentations can be left in data, Therefore it is very necessary how numerous personal information data of same user to be carried out into polymerization.
However, entering to different information in the information carrier (such as the payment accounts of user) in face of magnanimity level When row polymerization splicing, message store table is often very big.If will according to existing splicing mode If the horizontally-spliced big table of big table, to be difficult to undertake so huge for current equipment disposal ability Load, can not run result or operational efficiency is very low substantially.
Further, since current information fusion mode is to separate that information is processed and judged, institute It is already off when each information is individually processed with the relevance between different information, therefore existing information is poly- Closing may be stitched together the information not occurred jointly in splicing, so as to cause accuracy not It is high.
As can be seen here, how on the premise of machine utilization and resource consumption is reduced, accurately by user Information be associated, as those skilled in the art's technical problem urgently to be resolved hurrily.
The content of the invention
The application provides a kind of information correlation method and equipment, is used to realize reducing machine utilization and money On the premise of source consumes, accurately the information of user is associated;Wherein, specific the application is proposed A kind of information correlation method, including:
Acquisition is included under each different business scenario the identification marking and attribute information that there is incidence relation Initial data;
Believe the most attribute information of frequency of occurrence under each business scenario in the initial data as association Breath is associated with the identification marking, generates related information string;
The type of attribute information to be associated is selected successively according to default association order, and by the original number The maximum attribute information of judgment value is made in each attribute information of the correspondence type under each business scenario in For the related information related information string newest with the time that is currently generated is associated, generation related information string with Realize information association.
Optionally, there is the identification marking of incidence relation in the case where acquisition is included in each different business scenario With the initial data of attribute information, also include afterwards:
Initial data to getting carries out carrying out classification integration treatment, generation according to the difference of business scenario Information record table;Wherein, the attribute information that is associated with identification marking is included in described information record sheet Title and content, and there is business scenario title when associating, time with identification marking and attribute information And number of times.
Optionally, the attribute letter that frequency of occurrence is most under each business scenario by the initial data Breath is associated as related information with the identification marking, generates related information string, specifically includes:
According to determining that it is secondary that each attribute information in the initial data occurs under each business scenario Number;
The number of times occurred under same business scenario by relatively more each attribute information, determines each business The most attribute information of occurrence number under scape;
By the number of times of the most attribute information of occurrence number under relatively more each business scenario, it is determined that in the original The most attribute information of occurrence number is initial attribute information to be associated in beginning data;
The initial attribute information to be associated is associated as related information with the identification marking, it is raw Into related information string.
Optionally, the type of attribute information to be associated is selected successively according to default association order, and by institute State under each business scenario in initial data the maximum category of judgment value in each attribute information of the correspondence type Property information be associated as the related information related information string newest with the time that is currently generated, generation association Bit string is specifically included with realizing information association:
Select the type of attribute information to be associated successively according to default association order;
Each attribute information corresponding with the type for currently selecting is determined in the initial data;
If attribute information only one of which corresponding with the type of current selection, selects this attribute information It is associated as the related information related information string newest with the time that is currently generated, generates related information string To realize information association;
If attribute information corresponding with the type of current selection there are multiple, each attribute letter for determining is calculated The attribute information ceased in newest with the time that is currently generated related information string under each business scenario occurs simultaneously Conditional probability;
Each attribute letter is determined based on the conditional probability and the product of the default weight of corresponding business scenario Cease the judgement subvalue under each business scenario;
That collects the same attribute information of correspondence judges that subvalue determines the judgment value of each attribute information;
Compare the judgment value of each attribute information, determine the maximum attribute information of judgment value, and will determine that value most Big attribute information is associated as the related information related information string newest with the time that is currently generated, raw Into related information string realizing information association.
Optionally, the time newest related information string that is currently generated includes N number of attribute information;
Each attribute information for calculating determination is newest with the time that is currently generated under each business scenario to be associated The conditional probability that attribute information in bit string occurs simultaneously, specifically includes following steps:
Step A, judge each attribute information for determining under each business scenario with to be currently generated the time newest Whether N number of attribute information exists simultaneously in related information string;
If step B, judged result are, calculate each attribute information of determination under each business scenario with It is currently generated N number of simultaneous conditional probability of attribute information in time newest related information string;
If step C, judged result are no, N=N-1, and repeat step A are set, until calculating really Attribute in newest with the time that the is currently generated related information string under each business scenario of fixed each attribute information The conditional probability that information occurs simultaneously.
The application also proposed a kind of information association equipment, including:
Acquisition module, the identification that there is incidence relation is included under each different business scenario for obtaining The initial data of mark and attribute information;
First generation module, for frequency of occurrence under each business scenario in the initial data is most Attribute information is associated as related information with the identification marking, generates related information string;
Second generation module, the class for selecting attribute information to be associated successively according to default association order Type, and will judge in each attribute information of the correspondence type under each business scenario in the initial data The maximum attribute information of value is closed as the related information related information string newest with the time that is currently generated Connection, generates related information string to realize information association.
Optionally, the equipment also includes:
Processing module, for the initial data for getting classify according to the difference of business scenario Integration is processed, and generates information record table;Wherein, include in described information record sheet and closed with identification marking The title and content of the attribute information of connection, and there is business when associating with identification marking and attribute information Scene title, time and number of times.
Optionally, first generation module, specifically for:
According to determining that it is secondary that each attribute information in the initial data occurs under each business scenario Number;
The number of times occurred under same business scenario by relatively more each attribute information, determines each business The most attribute information of occurrence number under scape;
By the number of times of the most attribute information of occurrence number under relatively more each business scenario, it is determined that in the original The most attribute information of occurrence number is initial attribute information to be associated in beginning data;
The initial attribute information to be associated is associated as related information with the identification marking, it is raw Into related information string.
Optionally, second generation module, specifically for:
Select the type of attribute information to be associated successively according to default association order;
Each attribute information corresponding with the type for currently selecting is determined in the initial data;
If attribute information only one of which corresponding with the type of current selection, selects this attribute information It is associated as the related information related information string newest with the time that is currently generated, generates related information string To realize information association;
If attribute information corresponding with the type of current selection there are multiple, each attribute letter for determining is calculated The attribute information ceased in newest with the time that is currently generated related information string under each business scenario occurs simultaneously Conditional probability;
Each attribute letter is determined based on the conditional probability and the product of the default weight of corresponding business scenario Cease the judgement subvalue under each business scenario;
That collects the same attribute information of correspondence judges that subvalue determines the judgment value of each attribute information;
Compare the judgment value of each attribute information, determine the maximum attribute information of judgment value, and will determine that value most Big attribute information is associated as the related information related information string newest with the time that is currently generated, raw Into related information string realizing information association.
Optionally, the time newest related information string that is currently generated includes N number of attribute information;
Second generation module calculates each attribute information for determining under each business scenario and when being currently generated Between the conditional probability that occurs simultaneously of attribute information in newest related information string, specifically include following steps:
Step A, judge each attribute information for determining under each business scenario with to be currently generated the time newest Whether N number of attribute information exists simultaneously in related information string;
If step B, judged result are, calculate each attribute information of determination under each business scenario with It is currently generated N number of simultaneous conditional probability of attribute information in time newest related information string;
If step C, judged result are no, N=N-1, and repeat step A are set, until calculating really Attribute in newest with the time that the is currently generated related information string under each business scenario of fixed each attribute information The conditional probability that information occurs simultaneously
Compared with prior art, the application is closed by obtaining to be included in exist under each different business scenario The identification marking of connection relation and the initial data of attribute information;By each business scenario in the initial data The most attribute information of lower frequency of occurrence is associated as related information with the identification marking, and generation is closed Connection bit string;Select the type of attribute information to be associated successively according to default association order, and will be described The maximum attribute of judgment value in each attribute information of the correspondence type under each business scenario in initial data Information is associated as the related information related information string newest with the time that is currently generated, generation association letter Breath goes here and there to realize information association.Realize on the premise of machine utilization and resource consumption is reduced, accurately The information of user is associated.
Brief description of the drawings
Fig. 1 is a kind of schematic flow sheet of information correlation method disclosed in the embodiment of the present application;
Fig. 2 is a kind of structural representation of information association equipment disclosed in the embodiment of the present application.
Specific embodiment
For defect of the prior art, the embodiment of the present application discloses a kind of information correlation method, realizes On the premise of machine utilization and resource consumption is reduced, accurately the information of user is associated, Specifically, as shown in figure 1, the method is comprised the following steps:
Step 101, acquisition be included under each different business scenario exist the identification marking of incidence relation with The initial data of attribute information.
In a specific embodiment, the database of such as shopping website, business scenario therein just has May log in, certification, logistics etc., and due to the database of shopping website in various information be all with Account is carrier, therefore account can be set into identification marking.Certain identification marking is not limited to this, If what is for example got is the database of mobile operator, data therein are made with mobile communication number It is carrier, therefore can also be using mobile communication number as identification marking.And initial data refers to it is same There are the data of incidence relation in one identification marking, such as identification marking is account 1, the then original number for obtaining According to the account 1 and attribute information that there is to be included under various different business scenes relation.
And after initial data is obtained, for the ease of subsequently carrying out data extraction, can be to getting Data carry out classification integration treatment, specially:
Initial data to getting carries out carrying out classification integration treatment, generation according to the difference of business scenario Information record table;Wherein, the title of the attribute information associated with identification marking is included in information record table With content, and there is business scenario title when associating with identification marking and attribute information, the time and Number of times.
Specifically, illustrated using payment accounts a001 as identification marking herein, for this tool The identification marking of body, the information record table of generation can be as shown in table 1:
Table 1
And in specific embodiment, if desired process the initial data of multiple identification markings, such as except branch The initial data of the number of paying a bill a001, the also initial data of payment accounts a002, the information note for thus generating Record table can carry out longitudinal expansion, as shown in table 2:
Table 2
Step 102, using the most attribute information of frequency of occurrence under each business scenario in initial data as pass Connection information is associated with identification marking, generates related information string.
Wherein, specifically using the most attribute information of frequency of occurrence under each business scenario in initial data as Related information is associated with identification marking, generates related information string, specifically includes:
According to the number of times that each attribute information in determination initial data occurs under each business scenario;
By the number of times that relatively each attribute information occurs under same business scenario, under determining each business scenario The most attribute information of occurrence number;
By the number of times of the most attribute information of occurrence number under relatively more each business scenario, it is determined that in original number It is initial attribute information to be associated according to the most attribute information of middle occurrence number;
Initial attribute information to be associated is associated as related information with identification marking, generation association letter Breath string.
In specific processing procedure, enter so that some specific identification marking is payment accounts b001 as an example Row explanation, such as there are two business scenarios in the corresponding initial data of b001, for example, log in scene And certification scene.Wherein, in the case where scene is logged in, the attribute information of appearance is cell-phone number:188****8254 (occurrence number is 300 times);And under certification scene, the attribute information of appearance is:Identity card: 3301081975****7598 (occurrence number is 10 times);Name:Zhang San (occurrence number is 20 times); Bank's card number:40065***5874153 (occurrence number is 51 times).
In the case, log under scene, occurrence number it is most be cell-phone number:188****8254 (occurs Number of times is 300 times);Under certification scene, occurrence number it is most be bank's card number:40065***5874153 (occurrence number is 51 times).Continue to compare cell-phone number:188****8254 and bank card Number:The number of times of 40065***5874153, cell-phone number:188****8254 is the corresponding initial data of b001 Middle occurrence number is most, therefore by cell-phone number:188****8254 enters as related information with identification marking Row association, the related information string of generation can be as shown in table 3.
Table 3
Payment accounts Cell-phone number
b001 188****8254
Step 103, select the type of attribute information to be associated successively according to default association order, and by original The maximum attribute information of judgment value is made in each attribute information of corresponding types under each business scenario in beginning data For the related information related information string newest with the time that is currently generated is associated, generation related information string with Realize information association.
Wherein, specific process includes:Attribute information to be associated is selected successively according to default association order Type;
Each attribute information corresponding with the type for currently selecting is determined in initial data;
If attribute information only one of which corresponding with the type of current selection, selects this attribute information It is associated as the related information related information string newest with the time that is currently generated, generates related information string To realize information association;
If attribute information corresponding with the type of current selection there are multiple, each attribute letter for determining is calculated The attribute information ceased in newest with the time that is currently generated related information string under each business scenario occurs simultaneously Conditional probability;
Determine each attribute information in each industry with the product of the default weight of corresponding business scenario based on conditional probability Judgement subvalue under business scene;
That collects the same attribute information of correspondence judges that subvalue determines the judgment value of each attribute information;
Compare the judgment value of each attribute information, determine the maximum attribute information of judgment value, and will determine that value most Big attribute information is associated as the related information related information string newest with the time that is currently generated, raw Into related information string realizing information association.
Still illustrated with previous example, related information string as shown in table 3 has been generated, in this situation Under, select the type of attribute information to be associated, such as order to be successively according to default association order The quantity of the attribute information of name-bank card-identity card, specific order and required association can be based on Needs are configured, and are illustrated so that the type of attribute information to be associated is as name as an example herein.
Judge the attribute information corresponding to name this type, as shown in table 1, corresponding name this class The only Zhang San of type, due to there was only this attribute information, it is already possible to absolutely prove the confidence level of Zhang San, Therefore directly Zhang San can be associated with related information string as shown in table 3.
And if corresponding name this type except Zhang San, also Li Si, it is necessary to carry out subsequent treatment, Specifically, firstly the need of design conditions probability, the formula of specific conditional probability is P (A Shu B)=P (AB)/P (B), wherein P (A Shu B) represent that event A has occurred and that condition in another event B Under probability of happening.
In the specific embodiment, it is assumed that two business scenarios occur, one is logged on scene, also one Individual is certification scene, is illustrated as a example by logging in scene herein, and P (A Shu B) is represented in payment accounts: B001 and cell-phone number:In the case that 188****8254 occurs when scene is logged in similarly hereinafter, name: Three appear in the probability logged under scene;
P (AB) represents payment accounts:B001, cell-phone number:188****8254 and name:Zhang San is same When the probability that occurs;
P (B) represents payment accounts:B001 and cell-phone number 188****8254 go out when scene is logged in similarly hereinafter Existing probability.
If calculating in the case where scene is logged in, P (A Shu B)=0.2, the treatment of certification scene is similar to the above, If it is determined that under certification scene, P (A Shu B)=0.3, and different business scenarios is different to that should have Weight, it is 0.6 for example to log in the weight of scene, and the weight of certification scene is 0.5.Then name:Zhang San's Two judge that subvalue is respectively 0.2 × 0.6=0.12 and 0.3 × 0.5=0.15, and final judgment value is 0.12+0.15=0.27.
As for name:Li Si, then carry out above-mentioned and name:Zhang San's identical is operated, it is assumed that finally given Judgment value be 0.26, due to 0.26 be less than 0.27, therefore selection name:Zhang San as related information with Related information string association as shown in table 3, the related information string of generation is as shown in table 4.
Table 4
Payment accounts Cell-phone number Name
b001 188****8254 Zhang San
If also needing to associate other attribute informations, such as bank card etc., it is possible to newly-generated such as table 4 Based on shown related information string, aforesaid operations are performed, respectively needing in presetting association order is closed The attribute information of connection is all associated and finished.Information association is realized by related information string.
Moreover, it is assumed that being currently generated time newest related information string includes N number of attribute information;Thus The process of the specific design conditions probability in step 103 is as follows:
Step A, judge each attribute information for determining under each business scenario with to be currently generated the time newest Whether N number of attribute information exists simultaneously in related information string;
If step B, judged result are, calculate each attribute information of determination under each business scenario with It is currently generated N number of simultaneous conditional probability of attribute information in time newest related information string;
If step C, judged result are no, N=N-1, and repeat step A are set, until calculating really Attribute in newest with the time that the is currently generated related information string under each business scenario of fixed each attribute information The conditional probability that information occurs simultaneously.
Specifically, assume that being currently generated time newest related information string includes 6 attribute informations, first Judge each attribute information (type of correspondence attribute information to be associated) for determining, it is assumed that type is bank Card number, correspondence bank card number 1 and bank's card number 2, for bank's card number 1, judge bank's card number 1 whether Exist simultaneously with this 6 attribute informations under same business scenario, if the determination result is YES, then entered based on this The calculating of row conditional probability, specific calculation is shown in step 103.
And if judged result is no, then judge bank's card number 1 whether under same business scenario with this 6 Any 5 attribute informations exist simultaneously in individual attribute information, if the determination result is YES, then carry out bar based on this The calculating of part probability.If judged result is no, judge bank's card number 1 whether under same business scenario Exist simultaneously with any 4 attribute informations in this 6 attribute informations, by that analogy, until shaping can be calculated Untill part probability.
Because the situation that the calculating of conditional probability is based on is possible different, therefore, carry out in step 103 On judging subvalue, and judgment value calculating process, with the proviso that the calculating of conditional probability be based on it is identical Situation.
Specifically, being for example directed to bank's card number 1, its conditional probability 1 judged in subvalue 1 is based on bank card Number 1 has the calculating that carries out for 1 time simultaneously in business scenario with 6 attribute informations, and judges the bar in subvalue 2 Part probability 2 is the presence of the meter for carrying out for 2 times simultaneously with 5 attribute informations in business scenario based on bank's card number 1 Calculate, so judge subvalue 1 and judge that subvalue 2 can not merge generation judgment value.
Additionally, bank card 1 is directed to, if two conditional probabilities judged in subvalue are all based on same business There is the calculating that carries out simultaneously with 6 attribute informations under scape, and if the correspondence of bank card 23 judges subvalue, And this 3 conditional probabilities judged in subvalue are all based under same business scenario with 5 attribute informations simultaneously In the presence of the calculating for carrying out, in the case, it is not necessary to specifically judge the size of judgment value for finally giving, Directly can be associated bank card 1 as the related information related information string newest with the time that is currently generated.Its His is similar, if the conditional probability in judgment value a is to be based on believing with N number of attribute under same business scenario There is the calculating that carries out simultaneously in breath, and the conditional probability in judgment value b be based under same business scenario with n There is the calculating for carrying out in individual attribute information, wherein N > n do not need the size of more specific numerical value then simultaneously, Judgment value a can be higher than judgment value b, the corresponding attribute informations of judgment value a can as related information be currently generated Time it is newest related information string association.
In order to technological thought of the invention is expanded on further, in conjunction with specific application scenarios, to the present invention Technical scheme illustrate, the embodiment of the present application also discloses a kind of information association equipment, such as Fig. 2 institutes Show, including:
Acquisition module 201, the knowledge that there is incidence relation is included under each different business scenario for obtaining Biao Shi not be with the initial data of attribute information;
First relating module 202, for frequency of occurrence under each business scenario in the initial data is most Attribute information be associated with the identification marking as related information, generate related information string;
Second relating module 203, for selecting attribute information to be associated successively according to default association order Type, and will sentence in each attribute information of the correspondence type under each business scenario in the initial data The maximum attribute information of disconnected value is closed as the related information related information string newest with the time that is currently generated Connection, generates related information string to realize information association.
Specifically, the equipment, also includes:
Processing module, for the initial data for getting classify according to the difference of business scenario Integration is processed, and generates information record table;Wherein, include in described information record sheet and closed with identification marking The title and content of the attribute information of connection, and there is business when associating with identification marking and attribute information Scene title, time and number of times.
First relating module 202, specifically for:
According to determining that it is secondary that each attribute information in the initial data occurs under each business scenario Number;
The number of times occurred under same business scenario by relatively more each attribute information, determines each business The most attribute information of occurrence number under scape;
By the number of times of the most attribute information of occurrence number under relatively more each business scenario, it is determined that in the original The most attribute information of occurrence number is initial attribute information to be associated in beginning data;
The initial attribute information to be associated is associated as related information with the identification marking, it is raw Into related information string.
Second relating module 203, specifically for:
Select the type of attribute information to be associated successively according to default association order;
Each attribute information corresponding with the type for currently selecting is determined in the initial data;
If attribute information only one of which corresponding with the type of current selection, selects this attribute information It is associated as the related information related information string newest with the time that is currently generated, generates related information string To realize information association;
If attribute information corresponding with the type of current selection there are multiple, each attribute letter for determining is calculated The attribute information ceased in newest with the time that is currently generated related information string under each business scenario occurs simultaneously Conditional probability;
Each attribute letter is determined based on the conditional probability and the product of the default weight of corresponding business scenario Cease the judgement subvalue under each business scenario;
That collects the same attribute information of correspondence judges that subvalue determines the judgment value of each attribute information;
Compare the judgment value of each attribute information, determine the maximum attribute information of judgment value, and will determine that value most Big attribute information is associated as the related information related information string newest with the time that is currently generated, raw Into related information string realizing information association.
Specifically, the time newest related information string that is currently generated includes N number of attribute information;
Second relating module 203 calculates each attribute information for determining under each business scenario and works as previous existence Into the conditional probability that the attribute information in time newest related information string occurs simultaneously, specifically include following Step:
Step A, judge each attribute information for determining under each business scenario with to be currently generated the time newest Whether N number of attribute information exists simultaneously in related information string;
If step B, judged result are, calculate each attribute information of determination under each business scenario with It is currently generated N number of simultaneous conditional probability of attribute information in time newest related information string;
If step C, judged result are no, N=N-1, and repeat step A are set, until calculating really Attribute in newest with the time that the is currently generated related information string under each business scenario of fixed each attribute information The conditional probability that information occurs simultaneously.
The application is included under each different business scenario the identification marking that there is incidence relation by obtaining With the initial data of attribute information;Frequency of occurrence under each business scenario in the initial data is most Attribute information is associated as related information with the identification marking, generates related information string;According to pre- If association order selects the type of attribute information to be associated successively, and by each industry in the initial data The attribute information of judgment value maximum in each attribute information of the type is corresponded under business scene as related information The related information string newest with the time that is currently generated is associated, and generation related information string is closed with realizing information Connection.Realize on the premise of machine utilization and resource consumption is reduced, accurately enter the information of user Row association.
Through the above description of the embodiments, those skilled in the art can be understood that this hair It is bright to be realized by hardware, it is also possible to be realized by the mode of software plus necessary general hardware platform. Based on such understanding, technical scheme can be embodied in the form of software product, and this is soft It (can be CD-ROM, USB flash disk is mobile hard that part product can be stored in a non-volatile memory medium Disk etc.) in, including some instructions are used to so that a computer equipment (can be personal computer, take Business device, or the network equipment etc.) perform method described in each implement scene of the invention.
It will be appreciated by those skilled in the art that accompanying drawing is a schematic diagram for being preferable to carry out scene, in accompanying drawing Module or necessary to flow not necessarily implements the present invention.
It will be appreciated by those skilled in the art that the module in device in implement scene can be according to implement scene Description be distributed in the device of implement scene, it is also possible to is carried out respective change and is disposed other than this implementation In one or more devices of scene.The module of above-mentioned implement scene can merge into a module, also may be used To be further split into multiple submodule.
The invention described above sequence number is for illustration only, and the quality of implement scene is not represented.
Disclosed above is only several specific implementation scenes of the invention, but, the present invention is not limited to This, the changes that any person skilled in the art can think of should all fall into protection scope of the present invention.

Claims (10)

1. a kind of information correlation method, it is characterised in that including:
Acquisition is included under each different business scenario the identification marking and attribute information that there is incidence relation Initial data;
Believe the most attribute information of frequency of occurrence under each business scenario in the initial data as association Breath is associated with the identification marking, generates related information string;
The type of attribute information to be associated is selected successively according to default association order, and by the original number The maximum attribute information of judgment value is made in each attribute information of the correspondence type under each business scenario in For the related information related information string newest with the time that is currently generated is associated, generation related information string with Realize information association.
2. the method for claim 1, it is characterised in that be included in each different industry obtaining There is the identification marking of incidence relation and the initial data of attribute information under business scene, also include afterwards:
Initial data to getting carries out carrying out classification integration treatment, generation according to the difference of business scenario Information record table;Wherein, the attribute information that is associated with identification marking is included in described information record sheet Title and content, and there is business scenario title when associating, time with identification marking and attribute information And number of times.
3. method as claimed in claim 1 or 2, it is characterised in that described by the initial data Each business scenario under the most attribute information of frequency of occurrence carried out with the identification marking as related information Association, generates related information string, specifically includes:
According to determining that it is secondary that each attribute information in the initial data occurs under each business scenario Number;
The number of times occurred under same business scenario by relatively more each attribute information, determines each business The most attribute information of occurrence number under scape;
By the number of times of the most attribute information of occurrence number under relatively more each business scenario, it is determined that in the original The most attribute information of occurrence number is initial attribute information to be associated in beginning data;
The initial attribute information to be associated is associated as related information with the identification marking, it is raw Into related information string.
4. method as claimed in claim 1 or 2, it is characterised in that according to default association order successively The type of selection attribute information to be associated, and institute will be corresponded under each business scenario in the initial data State in each attribute information of type the maximum attribute information of judgment value as related information be currently generated the time Newest related information string is associated, and generation related information string is specifically included with realizing information association:
Select the type of attribute information to be associated successively according to default association order;
Each attribute information corresponding with the type for currently selecting is determined in the initial data;
If attribute information only one of which corresponding with the type of current selection, selects this attribute information It is associated as the related information related information string newest with the time that is currently generated, generates related information string To realize information association;
If attribute information corresponding with the type of current selection there are multiple, each attribute letter for determining is calculated The attribute information ceased in newest with the time that is currently generated related information string under each business scenario occurs simultaneously Conditional probability;
Each attribute letter is determined based on the conditional probability and the product of the default weight of corresponding business scenario Cease the judgement subvalue under each business scenario;
That collects the same attribute information of correspondence judges that subvalue determines the judgment value of each attribute information;
Compare the judgment value of each attribute information, determine the maximum attribute information of judgment value, and will determine that value most Big attribute information is associated as the related information related information string newest with the time that is currently generated, raw Into related information string realizing information association.
5. method as claimed in claim 4, it is characterised in that described to be currently generated time newest pass Connection bit string includes N number of attribute information;
Each attribute information for calculating determination is newest with the time that is currently generated under each business scenario to be associated The conditional probability that attribute information in bit string occurs simultaneously, specifically includes following steps:
Step A, judge each attribute information for determining under each business scenario with to be currently generated the time newest Whether N number of attribute information exists simultaneously in related information string;
If step B, judged result are, calculate each attribute information of determination under each business scenario with It is currently generated N number of simultaneous conditional probability of attribute information in time newest related information string;
If step C, judged result are no, N=N-1, and repeat step A are set, until calculating really Attribute in newest with the time that the is currently generated related information string under each business scenario of fixed each attribute information The conditional probability that information occurs simultaneously.
6. a kind of information association equipment, it is characterised in that including:
Acquisition module, the identification that there is incidence relation is included under each different business scenario for obtaining The initial data of mark and attribute information;
First generation module, for frequency of occurrence under each business scenario in the initial data is most Attribute information is associated as related information with the identification marking, generates related information string;
Second generation module, the class for selecting attribute information to be associated successively according to default association order Type, and will judge in each attribute information of the correspondence type under each business scenario in the initial data The maximum attribute information of value is closed as the related information related information string newest with the time that is currently generated Connection, generates related information string to realize information association.
7. equipment as claimed in claim 6, it is characterised in that also include:
Processing module, for the initial data for getting classify according to the difference of business scenario Integration is processed, and generates information record table;Wherein, include in described information record sheet and closed with identification marking The title and content of the attribute information of connection, and there is business when associating with identification marking and attribute information Scene title, time and number of times.
8. equipment as claimed in claims 6 or 7, it is characterised in that first generation module, tool Body is used for:
According to determining that it is secondary that each attribute information in the initial data occurs under each business scenario Number;
The number of times occurred under same business scenario by relatively more each attribute information, determines each business The most attribute information of occurrence number under scape;
By the number of times of the most attribute information of occurrence number under relatively more each business scenario, it is determined that in the original The most attribute information of occurrence number is initial attribute information to be associated in beginning data;
The initial attribute information to be associated is associated as related information with the identification marking, it is raw Into related information string.
9. equipment as claimed in claims 6 or 7, it is characterised in that second generation module, tool Body is used for:
Select the type of attribute information to be associated successively according to default association order;
Each attribute information corresponding with the type for currently selecting is determined in the initial data;
If attribute information only one of which corresponding with the type of current selection, selects this attribute information It is associated as the related information related information string newest with the time that is currently generated, generates related information string To realize information association;
If attribute information corresponding with the type of current selection there are multiple, each attribute letter for determining is calculated The attribute information ceased in newest with the time that is currently generated related information string under each business scenario occurs simultaneously Conditional probability;
Each attribute letter is determined based on the conditional probability and the product of the default weight of corresponding business scenario Cease the judgement subvalue under each business scenario;
That collects the same attribute information of correspondence judges that subvalue determines the judgment value of each attribute information;
Compare the judgment value of each attribute information, determine the maximum attribute information of judgment value, and will determine that value most Big attribute information is associated as the related information related information string newest with the time that is currently generated, raw Into related information string realizing information association.
10. equipment as claimed in claim 9, it is characterised in that described to be currently generated the time newest Related information string includes N number of attribute information;
Second generation module calculates each attribute information for determining under each business scenario and when being currently generated Between the conditional probability that occurs simultaneously of attribute information in newest related information string, specifically include following steps:
Step A, judge each attribute information for determining under each business scenario with to be currently generated the time newest Whether N number of attribute information exists simultaneously in related information string;
If step B, judged result are, calculate each attribute information of determination under each business scenario with It is currently generated N number of simultaneous conditional probability of attribute information in time newest related information string;
If step C, judged result are no, N=N-1, and repeat step A are set, until calculating really Attribute in newest with the time that the is currently generated related information string under each business scenario of fixed each attribute information The conditional probability that information occurs simultaneously.
CN201511017699.3A 2015-12-29 2015-12-29 Information association method and device Active CN106933829B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511017699.3A CN106933829B (en) 2015-12-29 2015-12-29 Information association method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511017699.3A CN106933829B (en) 2015-12-29 2015-12-29 Information association method and device

Publications (2)

Publication Number Publication Date
CN106933829A true CN106933829A (en) 2017-07-07
CN106933829B CN106933829B (en) 2020-08-04

Family

ID=59442286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511017699.3A Active CN106933829B (en) 2015-12-29 2015-12-29 Information association method and device

Country Status (1)

Country Link
CN (1) CN106933829B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110580309A (en) * 2019-08-14 2019-12-17 阿里巴巴集团控股有限公司 personal information display device method, device and equipment based on block chain type account book
CN110968785A (en) * 2019-11-26 2020-04-07 腾讯科技(深圳)有限公司 Target account identification method and device, storage medium and electronic device
CN111680248A (en) * 2020-04-28 2020-09-18 五八有限公司 Method and device for generating batch number of message pushed

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101620625A (en) * 2009-07-30 2010-01-06 腾讯科技(深圳)有限公司 Method, device and search engine for sequencing searching keywords
CN102368788A (en) * 2011-12-09 2012-03-07 中国电信股份有限公司 Information pushing method and apparatus thereof
US20150025913A1 (en) * 2004-10-12 2015-01-22 International Business Machines Corporation Associating records in healthcare databases with individuals
CN104573094A (en) * 2015-01-30 2015-04-29 深圳市华傲数据技术有限公司 Online account recognizing and matching method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150025913A1 (en) * 2004-10-12 2015-01-22 International Business Machines Corporation Associating records in healthcare databases with individuals
CN101620625A (en) * 2009-07-30 2010-01-06 腾讯科技(深圳)有限公司 Method, device and search engine for sequencing searching keywords
CN102368788A (en) * 2011-12-09 2012-03-07 中国电信股份有限公司 Information pushing method and apparatus thereof
CN104573094A (en) * 2015-01-30 2015-04-29 深圳市华傲数据技术有限公司 Online account recognizing and matching method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110580309A (en) * 2019-08-14 2019-12-17 阿里巴巴集团控股有限公司 personal information display device method, device and equipment based on block chain type account book
CN110968785A (en) * 2019-11-26 2020-04-07 腾讯科技(深圳)有限公司 Target account identification method and device, storage medium and electronic device
CN111680248A (en) * 2020-04-28 2020-09-18 五八有限公司 Method and device for generating batch number of message pushed

Also Published As

Publication number Publication date
CN106933829B (en) 2020-08-04

Similar Documents

Publication Publication Date Title
CN106557747B (en) The method and device of identification insurance single numbers
CN107330471A (en) The problem of feedback content localization method and device, computer equipment, storage medium
CN107122369B (en) Service data processing method, device and system
CN106156145A (en) The management method of a kind of address date and device
WO2019080661A1 (en) Method and device for identifying intention of user
CN107704512A (en) Financial product based on social data recommends method, electronic installation and medium
CN105468742A (en) Malicious order recognition method and device
CN103226554A (en) Automatic stock matching and classifying method and system based on news data
CN107423278A (en) The recognition methods of essential elements of evaluation, apparatus and system
CN111028006B (en) Service delivery auxiliary method, service delivery method and related device
CN107315731A (en) Text similarity computing method
CN103714086A (en) Method and device used for generating non-relational data base module
CN104142912A (en) Accurate corpus category marking method and device
CN106933829A (en) A kind of information correlation method and equipment
CN107330128A (en) Certification abnormality judgment method and device
US20190220924A1 (en) Method and device for determining key variable in model
CN110046941A (en) A kind of face identification method, system and electronic equipment and storage medium
CN109902157A (en) A kind of training sample validation checking method and device
CN108090040A (en) A kind of text message sorting technique and system
WO2019133206A1 (en) Search engine for identifying analogies
CN107944866B (en) Transaction record duplication elimination method and computer-readable storage medium
CN102521713B (en) Data processing equipment and data processing method
CN113129057A (en) Software cost information processing method and device, computer equipment and storage medium
AU2018201681A1 (en) Systems and methods for authenticating drivers based on gps data
US10902428B1 (en) Maintaining a risk model using feedback directed to other risk models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1238738

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20201013

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201013

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.