CN105119744A - An association relation determination method and apparatus for user identifications - Google Patents

An association relation determination method and apparatus for user identifications Download PDF

Info

Publication number
CN105119744A
CN105119744A CN201510506033.8A CN201510506033A CN105119744A CN 105119744 A CN105119744 A CN 105119744A CN 201510506033 A CN201510506033 A CN 201510506033A CN 105119744 A CN105119744 A CN 105119744A
Authority
CN
China
Prior art keywords
user
attr
incidence relation
feature information
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510506033.8A
Other languages
Chinese (zh)
Other versions
CN105119744B (en
Inventor
叶青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510506033.8A priority Critical patent/CN105119744B/en
Publication of CN105119744A publication Critical patent/CN105119744A/en
Application granted granted Critical
Publication of CN105119744B publication Critical patent/CN105119744B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5061Network service management, e.g. ensuring proper service fulfilment according to agreements characterised by the interaction between service providers and their network customers, e.g. customer relationship management
    • H04L41/5064Customer relationship management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The application discloses an association relation determination method and apparatus for user identifications. A detailed mode of execution of the invention of the method comprises the following steps: first characteristic information associated with a first user identification and second characteristic information associated with a second user identification are respectively obtained; whether the first user identification and the second user identification has association relationship is determined on the basis of the first characteristic information and the second characteristic information so as to obtain an initial determination result; variation of the first characteristic information and the second characteristic information is detected; determination indication information indicating whether the association relation between the first user identification and the second user identification exists or not needs to be determined again is generated; operation corresponding to the determination indication information is executed; and whether the association relation exists between the first user identification and the second user identification is determined, so that repeated calculating caused by afresh determination of the association relation between the user identifications needed in each determination process is avoided, and consumption of network resources is reduced.

Description

The incidence relation determination methods of user ID and device
Technical field
The application relates to computer realm, is specifically related to data processing field, particularly relates to incidence relation determination methods and the device of user ID.
Background technology
At present, (such as determine whether user ID belongs to same user) in the excavation of the incidence relation between the user ID to magnanimity, judge that the mode of the incidence relation between user ID is: not there is relevance between the deterministic process of the incidence relation between user ID each time, in judging each time, the incidence relation between all user ID all needs to re-start judgement.But, the possibility that the judged result of the incidence relation between the user ID occurred continuously in repeatedly judging changes is less, incidence relation between still needing the user ID occurred continuously judges again, thus cause a large amount of unnecessary double countings, in existing odd-numbered day full dose premeasuring under other data scale of TERA-SCALE, consumption of network resources is comparatively serious.
Summary of the invention
This application provides incidence relation determination methods and the device of user ID, for solving the technical problem that above-mentioned background technology part exists.
First aspect, this application provides the incidence relation determination methods of user ID, the method comprises: obtain respectively and identify the fisrt feature information be associated and the second feature information be associated with the second user ID with first user, wherein, characteristic information comprises search characteristics information, browses characteristic information; Based on fisrt feature information and second feature information, judge whether first user mark has incidence relation with the second user ID, and obtain initial decision result; Detect the variable quantity of fisrt feature information and second feature information, and based on variable quantity and initial decision result, generate and judge indication information, judge that indication information indicates whether whether need to rejudge first user mark has incidence relation with the second user ID; Performing the operation corresponding with judging indication information, determining whether first user mark has incidence relation with the second user ID.
Second aspect, this application provides the incidence relation judgment means of user ID, this device comprises: acquiring unit, be configured for obtain respectively and identify the fisrt feature information be associated and the second feature information be associated with the second user ID with first user, wherein, characteristic information comprises search characteristics information, browses characteristic information; Judging unit, is configured for based on fisrt feature information and second feature information, judges whether first user mark has incidence relation with the second user ID, and obtains initial decision result; Generation unit, be configured for the variable quantity detecting fisrt feature information and second feature information, and based on variable quantity and initial decision result, generate and judge indication information, judge that indication information indicates whether whether need to rejudge first user mark has incidence relation with the second user ID; Performance element, is configured for and performs the operation corresponding with judging indication information, determines whether first user mark has incidence relation with the second user ID.
The incidence relation determination methods of the user ID that the application provides and device, identify the fisrt feature information be associated and the second feature information be associated with the second user ID by obtaining respectively with first user; Based on fisrt feature information and second feature information, judge whether first user mark has incidence relation with the second user ID, and obtain initial decision result; Detect the variable quantity of fisrt feature information and second feature information, and based on variable quantity and initial decision result, generate and indicate whether that needing to rejudge first user identifies the judgement indication information whether with the second user ID with incidence relation; Performing the operation corresponding with judging indication information, determining whether first user mark has incidence relation with the second user ID.Thus avoid in each deterministic process the incidence relation between all needing the mark of user and rejudge and the double counting caused, decrease the consumption to Internet resources.
Accompanying drawing explanation
By reading the detailed description done non-limiting example done with reference to the following drawings, the other features, objects and advantages of the application will become more obvious:
Fig. 1 shows the flow chart of an embodiment of the incidence relation determination methods of the user ID according to the application;
Fig. 2 it illustrates characteristic vector structural representation;
Fig. 3 shows the structural representation of an embodiment of the incidence relation judgment means of the user ID according to the application.
Embodiment
Below in conjunction with drawings and Examples, the application is described in further detail.Be understandable that, specific embodiment described herein is only for explaining related invention, but not the restriction to this invention.It also should be noted that, for convenience of description, in accompanying drawing, illustrate only the part relevant to Invention.
It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combine mutually.Below with reference to the accompanying drawings and describe the application in detail in conjunction with the embodiments.
In the embodiment of the application, first user mark and the second user ID do not refer in particular to the user ID of a certain user.First user mark and the second user ID can for judge whether the mark of user belongs in the process of same user, any two user ID in the mass users mark got.
Please refer to Fig. 1, it illustrates the flow process 100 of an embodiment of the incidence relation determination methods of the user ID according to the application.The method comprises the following steps:
Step 101, obtains respectively and identifies the fisrt feature information be associated and the second feature information be associated with the second user ID with first user.
In the present embodiment, characteristic information comprises search characteristics information, browses characteristic information.User ID (also can be referred to as user ID) is the information for identifying user identity in a network.The information that can be associated based on the network behavior (such as search behavior, navigation patterns) etc. of user ID to user in advance carries out record, thus acquisition identifies the fisrt feature information be associated and the second feature information be associated with the second user ID with first user.
In some of the present embodiment alternatively in implementation, user ID comprises one of following: the international identification code IMEI of mobile device, browse record identification.In the present embodiment, browsing record identification can be the cookie of user.
In some of the present embodiment alternatively in implementation, search characteristics information comprises following at least one item: search word type, search time; Browse characteristic information and comprise following at least one item: browsing page type, online place, line duration.In the present embodiment, online place can the place of user access network.Line duration can use time of specific network service for user, also can be the time period of user access network.
Step 102, based on fisrt feature information and second feature information, judges whether first user mark has incidence relation with the second user ID, and obtains initial decision result.
In the present embodiment, can based on the similarity between fisrt feature information and second feature information, judge whether first user mark has incidence relation with the second user ID, thus obtain characterizing first user mark, with the second user ID, whether there is the initial decision result of incidence relation.
In some of the present embodiment alternatively in implementation, incidence relation comprises the user ID that first user mark and the second user ID belong to same user.Judge whether first user mark has incidence relation with the second user ID, and obtain initial decision result to comprise: generate the first eigenvector characterizing multiple fisrt feature information and the second feature vector characterizing multiple second feature information respectively, wherein, the corresponding characteristic information of each component in characteristic vector; Employing cosine similarity algorithm calculates the similarity between the component in the second feature vector corresponding with it of each component in first eigenvector respectively, obtains multiple similarity subparameter, and based on multiple similarity subparameter, obtains similarity parameter; When similarity parameter is greater than similarity threshold, determine that first user mark has incidence relation with the second user ID.
In the present embodiment, the characteristic vector of following form can be adopted to represent the characteristic information be associated with user ID: ATTR={attr0, attr1, attr2 ..., attrn}.Wherein, attri (i=0,1 ..., a n) respective corresponding characteristic information, characteristic information can represent with a concrete numerical value, also can represent by a collecting structure.Above-mentioned characteristic vector can be adopted respectively to represent and to identify the characteristic information be associated and the characteristic information be associated with the second user ID with first user.In the present embodiment, characteristic information can comprise browsing page type, online place, line duration, the information of the types such as the search word type used during search.
Please refer to Fig. 2, it illustrates characteristic vector structural representation.In fig. 2, browsing page type, the online characteristic information such as place, line duration is shown.Wherein, browsing page type, online place adopt collecting structure to represent, browsing page type adopts a keyword set to represent, online place adopts the coordinate set of characterizing consumer position to represent.Line duration then adopts the vector of characterization time section to represent.
In the present embodiment, after determining fisrt feature information characteristic of correspondence vector and second feature information characteristic of correspondence characteristic vector, cosine similarity algorithm can be adopted to calculate the similarity in above-mentioned two vectors between each component, thus obtain the similarity subparameter of similarity between sign two components.Then based on similarity subparameter, similarity parameter can be obtained further.Such as, first can generate input vector based on similarity subparameter, as the input of default neural network model, default neural network model can generate based on training history similarity parameter.Then, default neural network model can be utilized to export the numerical value in [-1,1] scope, using this numerical value as similarity parameter.In the present embodiment, after determining similarity parameter, similarity parameter and default similarity parameter can be compared, obtain initial decision result.When similarity parameter is greater than similarity threshold, determine that first user mark has incidence relation with the second user ID.When similarity parameter is less than similarity threshold, determine that first user mark does not have incidence relation with the second user ID.
Step 103, detects the variable quantity of fisrt feature information and second feature information, and based on variable quantity and initial decision result, generates and judge indication information.
In the present embodiment, the deterministic process obtaining initial decision result can be referred to as last deterministic process, can by the variable quantity based on fisrt feature information and second feature information, the deterministic process of the incidence relation between the mark determining user is further referred to as this deterministic process.In the present embodiment, by last deterministic process, can obtain whether there is incidence relation between initial decision result and user ID.On this basis, in this judges, directly the incidence relation of user ID is not judged again, but based on the variable quantity of the characteristic information be associated with user ID and initial decision result, determine whether again again to judge the incidence relation between user ID.It should be noted that, in the present embodiment, last deterministic process and this deterministic process do not refer in particular to certain twice adjacent deterministic process.Determine in the process of the incidence relation between user ID whole, can only once judge the incidence relation between user ID, obtain initial decision result, in each deterministic process afterwards, all can adopt the incidence relation that the above-mentioned mode adopted in this deterministic process is determined between user ID further.
In the present embodiment, judge that indication information indicates whether whether need to rejudge first user mark has incidence relation with the second user ID.Based on the variable quantity of the characteristic information be associated with user ID, generate and judge that indication information can based on following principle: when judging whether two user ID exist incidence relation based on the characteristic information be associated with two user ID, if the variable quantity that characteristic information occurs is larger, then the probability changed to the judged result that whether there is incidence relation between user ID obtained in last deterministic process is larger.
Illustrate the relation that the variable quantity of feature based information and judged result change below: when supposing that user utilizes connection of mobile terminal into network, the user ID be associated with the network behavior of user is that first user identifies.When user utilizes PC terminal (such as notebook computer) access network, the user ID be associated with the network behavior of user is the second user ID.The notebook computer of usual user only uses in company and is used for work (technical documentation of search in the middle browser technology webpage that such as works, work), and mobile terminal is only for amusement (such as browsing the webpage of tour site, the tourist attractions of search).The variable quantity of characteristic information can characterize with the change of the type of the type of browsing page, search word.When user keeps above-mentioned behavior pattern always, the variable quantity that first user identifies the characteristic information be associated and the characteristic information be associated with the second user ID is all less, then judged result (such as first user mark does not belong to same user with the second user ID) probability that changes is also less.And the thing of carrying out on mobile phone in certain hour section suddenly as user in work, such as browse the website with the type be associated that works, use and the network service be associated that works, now, the web page browsing type be associated with the mark of user, the variable quantity using the characteristic informations such as the time of network service to occur will increase, the probability that then judged result changes also correspondingly becomes large, thus can determine whether further to rejudge the relation between first user mark and the second user ID, obtain new judged result (such as belonging to same user between first user mark and the second user ID).
Alternatively in implementation in some of the present embodiment, based on variable quantity and initial decision result, generate and judge that indication information comprises: judge indication parameter by following formulae discovery:
Repredict=(DELTAa+DELTAb) × FACTOR-H (ATTR a -1, ATTR b -1) × factor t× CONFIDENCE (ATTR a -1, ATTR b -1)-(1-H (ATTR a -1, ATTR b -1)) × factor f× CONFIDENCE (ATTR a -1, ATTR b -1); Wherein, Repredict is for judge indication parameter;
H ( ATTRa - 1 , ATTRb - 1 ) = 1 , i f Pr e d i c t ( ATTRa - 1 , ATTRb - 1 ) > t h r e a d h o l d 0 , i f Pr e d i c t ( ATTRa - 1 , ATTRb - 1 ) ≤ t h r e a d h o l d ;
CONFIDENCE (ATTR a -1, ATTR b -1)=ABS (Predict (ATTR a -1, ATTR b -1)-threadhold), ABS function is ABS function, Predict (ATTR a -1, ATTR b -1)for similarity parameter, threadhold is similarity threshold, factor tbe the first confidence level coefficient, factor fit is the second confidence level coefficient; DELTAa is the multi-C vector of each component variable quantity of a corresponding fisrt feature information separately, and DELTAb is the multi-C vector of each component variable quantity of a corresponding second feature information separately, and FACTOR is that multidimensional real number is vectorial; Determine to judge whether indication parameter is greater than null value, if, then generate and need to rejudge the judgement indication information whether first user mark and the second user ID have incidence relation, if not, then generate and do not need to rejudge the judgement indication information whether first user mark and the second user ID have incidence relation.In a step 102, the implication of vectorial ATTR is defined, on this basis, vectorial ATTR can be adopted further 0={ attr0 0, attr1 0, attr2 0..., attrn 0represent each characteristic information be associated with the mark of user in this deterministic process.Meanwhile, vectorial ATTR can be adopted -1={ attr0 -1, attr1 -1, attr2 -1..., attrn -1represent each characteristic information be associated with the mark of user in upper once deterministic process.In the present embodiment, vectorial DELTA=ATTR can be adopted 0-ATTR -1=delta0, delta1 ..., deltan} represents the variable quantity of each characteristic information between upper once deterministic process and this twice deterministic process of this deterministic process be associated with same user ID.Wherein, deltai=attri 0-attri -1(i=0,1 ..., n), deltai ∈ [0,1], deltai represent and carry out the numerical value after the normalization reprocessing of 0 ~ 1 to the variable quantity of each characteristic information.
Be the browsing page type be associated with user ID below for the characteristic information attr0 in above-mentioned vector, the process of the variable quantity of calculated characteristics information is described: characteristic information attr0 can be a collecting structure, wherein, the keyword (also can be referred to as point of interest) of the web page contents that each element in collecting structure can be browsed for user.Correspondingly, attr0 -1can be the keyword set of the webpage browsed in upper once deterministic process, attr0 0it can be the keyword set of the webpage browsed in this deterministic process.Keyword set can adopt vector form to represent, the keyword in the corresponding keyword set of each component wherein in vector.For the variable quantity of this characteristic information of browsing page type, attr0 can be passed through 0with attr0 -1similarity between corresponding vector defines.Such as, cosine similarity algorithm can be adopted to calculate attr0 0with attr0 -1similarity between corresponding vector and the angle between vector.In the present embodiment, based on the principle of the variable quantity of above-mentioned calculated characteristics information, the variable quantity of search word type, the online characteristic information such as place, line duration can also be calculated respectively.
In the present embodiment, on the above-mentioned basis that the implication of vectorial DELTA is defined, vectorial DELTAa can be adopted to represent the vector identifying the variable quantity of each characteristic information in twice judgement be associated with first user, the vector of the variable quantity of each characteristic information in twice judgement adopting vectorial DELTAb to represent to be associated with the second user ID.FACTOR is multidimensional real number vector, FACTOR=(f0, f1 ..., fn), each real number in FACTOR is the default weighted value corresponding when being added by the component of DELTAb corresponding with it for each component in DELTAa.
In the present embodiment, H (ATTR a -1, ATTR b -1) × factor t× CONFIDENCE (ATTRa, ATTRb) indicates judged result to be forward when namely belonging to the user ID of same user, is greater than the impact of similarity threshold part on this deterministic process; (1-H (ATTR a -1, ATTR b -1)) × factor f× CONFIDENCE (ATTRa, ATTRb) indicates judged result to be negative sense when namely not belonging to the user ID of same user, is less than the impact of similarity threshold part on this deterministic process.
In the present embodiment, in the deterministic process to the incidence relation between user ID, factor can also be adjusted respectively tvalue and factor fvalue, thus judged result to be had an impact.The value of the real number in real number vector FACTOR can be increased, thus increase recall rate.Also can remove and get the less component of value in FACTOR, thus reduce the amount of calculation in deterministic process further.Factor tvalue can characterize the conservative that last judged result is forward, when increase factor tvalue time, the amount of calculation in follow-up deterministic process can be reduced, but the accuracy rate being judged as the mark of user to belong to same user can be reduced simultaneously.Factor fvalue can characterize the conservative that last judged result is negative sense, when increase factor fvalue time, the amount of calculation in follow-up deterministic process can be reduced, but can recall rate be reduced simultaneously.
In some of the present embodiment alternatively in implementation, also comprise: the historical variations amount obtaining fisrt feature information and second feature information respectively, and based on historical variations amount, judge and the history judged result that obtains whether there is incidence relation between first user mark with the second user ID; Based on historical variations amount and history judged result, by machine learning algorithm determination multidimensional real number vector, the first confidence level coefficient, the second confidence level coefficient .
In the present embodiment, the historical data such as historical variations amount and history judged result of fisrt feature information and second feature information can be obtained in advance.Then can, using historical data as sample data, machine learning algorithm (such as support vector machines model) be adopted to train sample data.Such as, (such as parameter changes and is labeled as 1, and parameter does not change and is labeled as-1) can be processed to parameters such as DELATAa, DELTAb and Repredict of obtaining based on historical data and generate sample data.By machine learning algorithm, sample data is trained, calculate FACTOR, factor t, factor f.
Step 104, performs the operation corresponding with judging indication information, determines whether first user mark has incidence relation with the second user ID.
In the present embodiment, when judge indication information instruction need to rejudge first user mark whether there is incidence relation with the second user ID time, can based on current with the characteristic information that first user identifies and the second user ID is associated, incidence relation between first user mark and the second user ID is judged again, thus determines whether first user mark has incidence relation with the second user ID.And when judge indication information instruction do not need to rejudge first user mark whether there is incidence relation with the second user ID time, then whether the judged result that last deterministic process can be adopted to obtain has incidence relation to characterize first user mark with the second user ID.
Please refer to Fig. 3, it illustrates the incidence relation judgment means of the user ID according to the application.Device 300 comprises: acquiring unit 301, judging unit 302, generation unit 303, performance element 304.Wherein, acquiring unit 301 is configured for obtain respectively and identifies the fisrt feature information be associated and the second feature information be associated with the second user ID with first user, and wherein, characteristic information comprises search characteristics information, browses characteristic information; Judging unit 302 is configured for based on fisrt feature information and second feature information, judges whether first user mark has incidence relation with the second user ID, and obtains initial decision result; Generation unit 303 is configured for the variable quantity detecting fisrt feature information and second feature information, and based on variable quantity and initial decision result, generate and judge indication information, judge that indication information indicates whether whether need to rejudge first user mark has incidence relation with the second user ID; Performance element 304 is configured for and performs the operation corresponding with judging indication information, determines whether first user mark has incidence relation with the second user ID.
In some of the present embodiment alternatively in implementation, judging unit 302 comprises: vector generates subelement (not shown), be configured for the first eigenvector and the second feature vector for characterizing second feature information that generate respectively for characterizing fisrt feature information, wherein, the corresponding characteristic information of each component in characteristic vector; Similarity Measure subelement (not shown), be configured for adopt cosine similarity algorithm to calculate in the second feature vector corresponding with it of each component in first eigenvector respectively component between similarity, obtain multiple similarity subparameter, and based on multiple similarity subparameter, obtain similarity parameter; Similarity judgment sub-unit (not shown), is configured for and judges whether similarity parameter is greater than similarity threshold; Incidence relation determination subelement (not shown), is configured for and if so, determines that first user mark has incidence relation with the second user ID, if not, determines that first user mark does not have incidence relation with the second user ID.
In some of the present embodiment alternatively in implementation, generation unit 303 comprises: calculation of parameter subelement (not shown), is configured for and judges indication parameter by following formulae discovery: Repredict=(DELTAa+DELTAb) × FACTOR-H (ATTR a -1, ATTR b -1) × factor t× CONFIDENCE (ATTR a -1, ATTR b -1)-(1-H (ATTR a -1, ATTR b -1)) × factor f× CONFIDENCE (ATTR a -1, ATTR b -1); Wherein, Repredict is for judge indication parameter;
H ( ATTRa - 1 , ATTRb - 1 ) = 1 , i f Pr e d i c t ( ATTRa - 1 , ATTRb - 1 ) > t h r e a d h o l d 0 , i f Pr e d i c t ( ATTRa - 1 , ATTRb - 1 ) ≤ t h r e a d h o l d ;
CONFIDENCE (ATTR a -1, ATTR b -1)=ABS (Predict (ATTR a -1, ATTR b -1)-threadhold), ABS function is ABS function, Predict (ATTR a -1, ATTR b -1)for similarity parameter, threadhold is similarity threshold, factor tbe the first confidence level coefficient, factor fit is the second confidence level coefficient; DELTAa is the multi-C vector of each component variable quantity of a corresponding fisrt feature information separately, and DELTAb is the multi-C vector of each component variable quantity of a corresponding second feature information separately, and FACTOR is that multidimensional real number is vectorial; Relatively subelement (not shown), be configured for and determine to judge whether indication parameter is greater than null value, if, then generate and need to rejudge the judgement indication information whether first user mark and the second user ID have incidence relation, if not, then generate and do not need to rejudge the judgement indication information whether first user mark and the second user ID have incidence relation.
In some of the present embodiment alternatively in implementation, device 300 also comprises: historical information acquiring unit (not shown), be configured for the historical variations amount obtaining fisrt feature information and second feature information respectively, and based on historical variations amount, judge and the history judged result that obtains whether there is incidence relation between first user mark with the second user ID; Parameter determination unit (not shown), is configured for based on historical variations amount and history judged result, by machine learning algorithm determination multidimensional real number vector, the first confidence level coefficient, the second confidence level coefficient .
In some of the present embodiment alternatively in implementation, incidence relation is that first user mark belongs to the user ID of same user with the second user ID.
In some of the present embodiment alternatively in implementation, user ID comprises one of following: the international identification code IMEI of mobile device, browse record identification.
In some of the present embodiment alternatively in implementation, search characteristics information comprises following at least one item: search word type, search time; Browse characteristic information and comprise following at least one item: browsing page type, online place, line duration.
Unit involved in the embodiment of the present application or module can be realized by the mode of software, also can be realized by the mode of hardware.Described unit or module also can be arranged within a processor, such as, can be described as: a kind of processor comprises acquiring unit, judging unit, generation unit, performance element.Wherein, the title of these unit does not form the restriction to this unit itself under certain conditions, such as, unit can also be described to " being configured for the unit obtaining respectively and identify the fisrt feature information be associated and the second feature information be associated with the second user ID with first user ".
As another aspect, present invention also provides a kind of computer-readable recording medium, this computer-readable recording medium can be the computer-readable recording medium comprised in device described in above-described embodiment; Also can be individualism, be unkitted the computer-readable recording medium allocated in terminal.Described computer-readable recording medium stores more than one or one program, and described program is used for performance description in the incidence relation determination methods of the user ID of the application by one or more than one processor.
More than describe and be only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art are to be understood that, invention scope involved in the application, be not limited to the technical scheme of the particular combination of above-mentioned technical characteristic, also should be encompassed in when not departing from described inventive concept, other technical scheme of being carried out combination in any by above-mentioned technical characteristic or its equivalent feature and being formed simultaneously.The technical characteristic that such as, disclosed in above-mentioned feature and the application (but being not limited to) has similar functions is replaced mutually and the technical scheme formed.

Claims (14)

1. an incidence relation determination methods for user ID, is characterized in that, described method comprises:
Obtain respectively and identify the fisrt feature information be associated and the second feature information be associated with the second user ID with first user, wherein, characteristic information comprises search characteristics information, browses characteristic information;
Based on described fisrt feature information and second feature information, judge whether first user mark has incidence relation with the second user ID, and obtain initial decision result;
Detect the variable quantity of described fisrt feature information and second feature information, and based on described variable quantity and described initial decision result, generate and judge indication information, describedly judge that indication information indicates whether to need to rejudge first user mark and whether the second user ID has incidence relation;
Perform the operation corresponding with described judgement indication information, determine whether first user mark has incidence relation with the second user ID.
2. method according to claim 1, is characterized in that, described incidence relation is that first user mark belongs to the user ID of same user with the second user ID.
3. method according to claim 2, is characterized in that, user ID comprises one of following: the international identification code IMEI of mobile device, browse record identification.
4. method according to claim 3, is characterized in that, search characteristics information comprises following at least one item: search word type, search time; Browse characteristic information and comprise following at least one item: browsing page type, online place, line duration.
5., according to the method one of claim 1-4 Suo Shu, it is characterized in that, described based on described fisrt feature information and second feature information, judge whether first user mark has incidence relation with the second user ID, and obtain initial decision result and comprise:
Generate the first eigenvector for characterizing described fisrt feature information and the second feature vector for characterizing described second feature information respectively, wherein, the corresponding characteristic information of each component in characteristic vector;
Employing cosine similarity algorithm calculates the similarity between the component in the described second feature vector corresponding with it of each component in described first eigenvector respectively, obtain multiple similarity subparameter, and based on described multiple similarity subparameter, obtain similarity parameter;
Judge whether described similarity parameter is greater than similarity threshold;
If so, determine that described first user mark has incidence relation with the second user ID, if not, determine that described first user mark does not have incidence relation with the second user ID.
6. method according to claim 5, is characterized in that, described based on described variable quantity and described initial decision result, generates and judges that indication information comprises:
Indication parameter is judged by following formulae discovery:
Repredict=(DELTAa+DELTAb)×FACTOR-H(ATTR a -1,ATTR b -1)×factor t
×CONFIDENCE(ATTR a -1,ATTR b -1)-(1-H(ATTR a -1,ATTR b -1))×factor f×
CONFIDENCE(ATTR a -1,ATTR b -1);
Wherein, Repredict is for judge indication parameter;
H ( ATTRa - 1 , ATTRb - 1 ) = 1 , i f Pr e d i c t ( ATTRa - 1 , ATTRb - 1 ) > t h r e a d h o l d 0 , i f Pr e d i c t ( ATTRa - 1 , ATTRb - 1 ) ≤ t h r e a d h o l d ;
CONFIDENCE (ATTR a -1, ATTR b -1)=ABS (Predict (ATTR a -1, ATTR b -1)-threadhold), ABS function is ABS function, Predict (ATTR a -1, ATTR b -1)for described similarity parameter, threadhold is described similarity threshold, factor tbe the first confidence level coefficient, factor fit is the second confidence level coefficient;
DELTAa is the multi-C vector of each component variable quantity of a corresponding fisrt feature information separately, and DELTAb is the multi-C vector of each component variable quantity of a corresponding second feature information separately, and FACTOR is that multidimensional real number is vectorial;
Determine to judge whether indication parameter is greater than null value, if, then generate and need to rejudge the judgement indication information whether first user mark and the second user ID have incidence relation, if not, then generate and do not need to rejudge the judgement indication information whether first user mark and the second user ID have incidence relation.
7. method according to claim 6, is characterized in that, described method also comprises:
Obtaining the historical variations amount of described fisrt feature information and described second feature information respectively, and based on described historical variations amount, judging and the history judged result that obtains whether there is incidence relation between first user mark with the second user ID;
Based on described historical variations amount and described history judged result, determine described multidimensional real number vector, the first confidence level coefficient, the second confidence level coefficient by machine learning algorithm.
8. an incidence relation judgment means for user ID, is characterized in that, described device comprises:
Acquiring unit, be configured for acquisition respectively and identify the fisrt feature information be associated and the second feature information be associated with the second user ID with first user, wherein, characteristic information comprises search characteristics information, browses characteristic information;
Judging unit, is configured for based on described fisrt feature information and second feature information, judges whether first user mark has incidence relation with the second user ID, and obtains initial decision result;
Generation unit, be configured for the variable quantity detecting described fisrt feature information and second feature information, and based on described variable quantity and described initial decision result, generate and judge indication information, describedly judge that indication information indicates whether to need to rejudge first user mark and whether the second user ID has incidence relation;
Performance element, is configured for and performs the operation corresponding with described judgement indication information, determines whether first user mark has incidence relation with the second user ID.
9. device according to claim 8, is characterized in that, described incidence relation is that first user mark belongs to the user ID of same user with the second user ID.
10. device according to claim 9, is characterized in that, user ID comprises one of following: the international identification code IMEI of mobile device, browse record identification.
11. devices according to claim 10, is characterized in that, search characteristics information comprises following at least one item: search word type, search time; Browse characteristic information and comprise following at least one item: browsing page type, online place, line duration.
12. one of-11 described devices according to Claim 8, it is characterized in that, described judging unit comprises:
Vector generates subelement, is configured for the first eigenvector and the second feature vector for characterizing described second feature information that generate respectively for characterizing described fisrt feature information, wherein, and the corresponding characteristic information of each component in characteristic vector;
Similarity Measure subelement, be configured for adopt cosine similarity algorithm to calculate in the described second feature vector corresponding with it of each component in described first eigenvector respectively component between similarity, obtain multiple similarity subparameter, and based on described multiple similarity subparameter, obtain similarity parameter;
Similarity judgment sub-unit, is configured for and judges whether described similarity parameter is greater than similarity threshold;
Incidence relation determination subelement, is configured for and if so, determines that described first user mark has incidence relation with the second user ID, if not, determines that described first user mark does not have incidence relation with the second user ID.
13. devices according to claim 12, is characterized in that, described generation unit comprises:
Calculation of parameter subelement, is configured for and judges indication parameter by following formulae discovery:
Repredict=(DELTAa+DELTAb)×FACTOR-H(ATTR a -1,ATTR b -1)×factor t
×CONFIDENCE(ATTR a -1,ATTR b -1)-(1-H(ATTR a -1,ATTR b -1))×factor f×
CONFIDENCE(ATTR a -1,ATTR b -1);
Wherein, Repredict is for judge indication parameter;
H ( ATTRa - 1 , ATTRb - 1 ) = 1 , i f Pr e d i c t ( ATTRa - 1 , ATTRb - 1 ) > t h r e a d h o l d 0 , i f Pr e d i c t ( ATTRa - 1 , ATTRb - 1 ) ≤ t h r e a d h o l d ;
CONFIDENCE (ATTR a -1, ATTR b -1)=ABS (Predict (ATTR a -1, ATTR b -1)-threadhold), ABS function is ABS function, Predict (ATTR a -1, ATTR b -1)for described similarity parameter, threadhold is described similarity threshold, factor tbe the first confidence level coefficient, factor fit is the second confidence level coefficient; DELTAa is the multi-C vector of each component variable quantity of a corresponding fisrt feature information separately, and DELTAb is the multi-C vector of each component variable quantity of a corresponding second feature information separately, and FACTOR is that multidimensional real number is vectorial;
Relatively subelement, be configured for and determine to judge whether indication parameter is greater than null value, if, then generate and need to rejudge the judgement indication information whether first user mark and the second user ID have incidence relation, if not, then generate and do not need to rejudge the judgement indication information whether first user mark and the second user ID have incidence relation.
14. devices according to claim 13, is characterized in that, described device also comprises:
Historical information acquiring unit, be configured for the historical variations amount obtaining described fisrt feature information and described second feature information respectively, and based on described historical variations amount, judge and the history judged result that obtains whether there is incidence relation between first user mark with the second user ID;
Parameter determination unit, is configured for based on described historical variations amount and described history judged result, determines described multidimensional real number vector, the first confidence level coefficient, the second confidence level coefficient by machine learning algorithm.
CN201510506033.8A 2015-08-17 2015-08-17 The incidence relation judgment method and device of user identifier Active CN105119744B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510506033.8A CN105119744B (en) 2015-08-17 2015-08-17 The incidence relation judgment method and device of user identifier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510506033.8A CN105119744B (en) 2015-08-17 2015-08-17 The incidence relation judgment method and device of user identifier

Publications (2)

Publication Number Publication Date
CN105119744A true CN105119744A (en) 2015-12-02
CN105119744B CN105119744B (en) 2018-09-28

Family

ID=54667642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510506033.8A Active CN105119744B (en) 2015-08-17 2015-08-17 The incidence relation judgment method and device of user identifier

Country Status (1)

Country Link
CN (1) CN105119744B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608179A (en) * 2015-12-22 2016-05-25 百度在线网络技术(北京)有限公司 Method and device for determining relevance of user identification
CN105871585A (en) * 2015-12-03 2016-08-17 乐视网信息技术(北京)股份有限公司 Terminal association method and device
CN107623605A (en) * 2016-07-14 2018-01-23 精硕科技(北京)股份有限公司 The method and system of network traffics duplicate removal
CN110729053A (en) * 2019-10-11 2020-01-24 平安医疗健康管理股份有限公司 Data processing method, data processing device, computer equipment and storage medium
CN111224743A (en) * 2018-11-23 2020-06-02 中兴通讯股份有限公司 Detection method, terminal and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102316130A (en) * 2010-06-29 2012-01-11 百度在线网络技术(北京)有限公司 Method and device for judging intimacy between user and friends thereof based on behaviors of user
US20140136534A1 (en) * 2012-11-14 2014-05-15 Electronics And Telecommunications Research Institute Similarity calculating method and apparatus
CN103995907A (en) * 2014-06-13 2014-08-20 北京奇艺世纪科技有限公司 Determining method of access users

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102316130A (en) * 2010-06-29 2012-01-11 百度在线网络技术(北京)有限公司 Method and device for judging intimacy between user and friends thereof based on behaviors of user
US20140136534A1 (en) * 2012-11-14 2014-05-15 Electronics And Telecommunications Research Institute Similarity calculating method and apparatus
CN103995907A (en) * 2014-06-13 2014-08-20 北京奇艺世纪科技有限公司 Determining method of access users

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105871585A (en) * 2015-12-03 2016-08-17 乐视网信息技术(北京)股份有限公司 Terminal association method and device
CN105608179A (en) * 2015-12-22 2016-05-25 百度在线网络技术(北京)有限公司 Method and device for determining relevance of user identification
CN105608179B (en) * 2015-12-22 2019-03-12 百度在线网络技术(北京)有限公司 The method and apparatus for determining the relevance of user identifier
CN107623605A (en) * 2016-07-14 2018-01-23 精硕科技(北京)股份有限公司 The method and system of network traffics duplicate removal
CN111224743A (en) * 2018-11-23 2020-06-02 中兴通讯股份有限公司 Detection method, terminal and computer readable storage medium
CN111224743B (en) * 2018-11-23 2022-11-15 中兴通讯股份有限公司 Detection method, terminal and computer readable storage medium
CN110729053A (en) * 2019-10-11 2020-01-24 平安医疗健康管理股份有限公司 Data processing method, data processing device, computer equipment and storage medium
CN110729053B (en) * 2019-10-11 2023-02-03 深圳平安医疗健康科技服务有限公司 Data processing method, data processing device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN105119744B (en) 2018-09-28

Similar Documents

Publication Publication Date Title
CN105119744A (en) An association relation determination method and apparatus for user identifications
JP7032408B2 (en) Site detection
Feremans et al. Pattern-based anomaly detection in mixed-type time series
CN104216881A (en) Method and device for recommending individual labels
Yin et al. A K-means Approach for Map-Reduce Model and Social Network Privacy Protection.
CN104166732B (en) Project collaboration filtering recommendation method based on global scoring information
CN102135983A (en) Group dividing method and device based on network user behavior
CN105022957A (en) Method for detecting malicious program on demand, electronic device and user interface thereof
Ding et al. Cleanits: a data cleaning system for industrial time series
WO2017063420A1 (en) User demand determining method and apparatus
CN105930507A (en) Method and apparatus for obtaining Web browsing interest of user
CN104850489A (en) Mobile application test system
CN112508638B (en) Data processing method and device and computer equipment
CN104361092A (en) Searching method and device
CN103544325A (en) Data processing method and device used for web page clicking distribution
Wang et al. An adaptive multivariate CUSUM control chart for signaling a range of location shifts
CN106469205B (en) Method and device for determining geographical location information of user
JP7200069B2 (en) Information processing device, vector generation method and program
CN116257663A (en) Abnormality detection and association analysis method and related equipment for unmanned ground vehicle
US11182376B1 (en) Identifying variance in distributed systems
CN109919180B (en) Electronic device, processing method of user operation record data and storage medium
CN102523286A (en) Method and device for obtaining credit degree of service
Yuan et al. Interesting activities discovery for moving objects based on collaborative filtering
CN115278757A (en) Method and device for detecting abnormal data and electronic equipment
KR102343139B1 (en) Method and appartus for anomaly detectioin

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant