CN107247786A - Method, device and server for determining similar users - Google Patents

Method, device and server for determining similar users Download PDF

Info

Publication number
CN107247786A
CN107247786A CN201710451969.4A CN201710451969A CN107247786A CN 107247786 A CN107247786 A CN 107247786A CN 201710451969 A CN201710451969 A CN 201710451969A CN 107247786 A CN107247786 A CN 107247786A
Authority
CN
China
Prior art keywords
user
label
default label
users
default
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710451969.4A
Other languages
Chinese (zh)
Inventor
李泽中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaodu Information Technology Co Ltd
Original Assignee
Beijing Xiaodu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaodu Information Technology Co Ltd filed Critical Beijing Xiaodu Information Technology Co Ltd
Priority to CN201710451969.4A priority Critical patent/CN107247786A/en
Publication of CN107247786A publication Critical patent/CN107247786A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses for the method for determining similar users, device and server.One embodiment of this method includes:The user profile that pending user concentrates each user is obtained, the user profile includes geographical location information and the History Order information associated with least one default label;Number of users corresponding with the History Order information of each default label association is counted, to determine the weight of each default label;The History Order information of each user and the weight of each default label are concentrated based on the pending user, the tag attributes feature of each user is generated;By the user clustering that the pending user concentrates it is multiple user's clusters according to the tag attributes feature and geographical location information;Similarity based on other users in the tag attributes feature calculation targeted customer and same user's cluster and the similar users for determining the targeted customer.The embodiment improves the accuracy of similar users positioning.

Description

Method, device and server for determining similar users
Technical field
The application is related to field of computer technology, and in particular to internet data digging technology field, more particularly, to Determine method, device and the server of similar users.
Background technology
With the development of e-commerce technology, shopping platform is purchased on line for increasing user's selection.On line Shopping platform can obtain the user data of magnanimity, include base attribute information, order data, evaluation information, the logistics of user Information etc..Based on these user data, the portrait of each user can be constructed, including age of user, hobby, consumption energy Power, purchasing habits etc..Shopping platform can filter out the use similar to the loyal user of each trade company using user's portrait on line Recommended as the potential user of trade company at family.
In existing similar users screening technique, the attribute pair of all categories of user is not considered in user's portrait building process The influence of Similarity Measure, but in similar users screening process, influence of the attribute of all categories Similarity Measure user Power is different, and the purchasing habits of the sex ratio user of such as user are small to the influence power of Similarity Measure, therefore existing similar use The accuracy of family positioning result has to be hoisted.
The content of the invention
In order to solve one or more technical problems of above-mentioned background section, the embodiment of the present application, which is provided, to be used for really Determine method, device and the server of similar users.
The embodiment of the present application discloses A1, a kind of method for determining similar users, and methods described includes:Obtain pending User concentrates the user profile of each user, and the user profile includes geographical location information and closed with least one default label The History Order information of connection;Number of users corresponding with the History Order information of each default label association is counted, to determine The weight of each default label;The History Order information of each user is concentrated and each described default based on the pending user The weight of label, generates the tag attributes feature of each user;Will according to the tag attributes feature and geographical location information The user clustering that the pending user concentrates is multiple user's clusters;Based on the tag attributes feature calculation targeted customer and together The similarity of other users and the similar users of the targeted customer are determined in one user's cluster.
In A2, the method as described in A1, the statistics is corresponding with the History Order information of each default label association Number of users, to determine the weight of each default label, including:To each default label, filter out and preset with described The History Order information of label association;
Count the corresponding number of users of History Order information filtered out;Taken down after seeking logarithm to the number of users counted Count the weight as the default label.
In A3, the method as described in A1, it is described based on the pending user concentrate the History Order information of each user with And the weight of each default label, the tag attributes feature of each user is generated, including:Believed according to the History Order Breath, it is determined that each user corresponds to the frequency that places an order of each default label;Each default label is corresponded to based on each user The weight of the frequency that places an order and the default label of correspondence, calculates the effectively lower single-frequency of the corresponding each default label of each user It is secondary;The frequency that effectively places an order based on each default label generates the label characteristics vector of each user, is used as each use The tag attributes feature at family.
In A4, the method as described in A3, the frequency and correspondingly of placing an order that each default label is corresponded to based on each user The weight of default label, calculates the frequency that effectively places an order of the corresponding each default label of each user, including:Will be each User corresponds to the place an order frequency and the multiplied by weight of corresponding each default label of each default label, corresponding each as each user The frequency that effectively places an order of default label;The frequency that effectively places an order based on each default label generates the mark of each user Characteristic vector is signed, including:The frequency that effectively places an order of each default label is each as corresponding in label characteristics vector The characteristic value of the default label.
It is described to be collected pending user according to tag attributes feature and geographical location information in A5, the method as described in A3 In user clustering be multiple user's clusters, including:Descending is carried out by characteristic value to the element in the label characteristics vector of each user The corresponding default label of element that position is preset before sequence, selected and sorted is label to be matched;With the geographical location information and institute The characteristic information that label to be matched is each user is stated, the user for the pending user being concentrated based on the characteristic information Cluster as multiple user's clusters.
It is described using the geographical location information and the label to be matched as each use in A6, the method as described in A5 The characteristic information at family, is multiple user's clusters based on the user clustering that the characteristic information concentrates the pending user, including: The geographical location information is identical and at least one label identical user to be matched gathers for same user's cluster.
It is described to be based on the tag attributes feature calculation targeted customer and same user's cluster in A7, the method as described in A3 The similarity of middle other users and the similar users for determining the targeted customer, including:Calculate the label of the targeted customer Characteristic vector and the similarity of the label characteristics vector of other each users in same user's cluster;Label based on the targeted customer Characteristic vector and the similarity of the label characteristics vector of other each users in same user's cluster filter out the phase of the targeted customer Like user.
The embodiment of the present application discloses B1, a kind of device for being used to determine similar users, and described device includes:Acquiring unit, Be configured to obtain the user profile that pending user concentrates each user, the user profile include geographical location information and with The History Order information of at least one default label association;Statistic unit, is configured to statistics and is associated with each default label The corresponding number of users of History Order information, to determine the weight of each default label;Generation unit, is configured to be based on The pending user concentrates the History Order information of each user and the weight of each default label, generates each user Tag attributes feature;Cluster cell, is configured to wait to locate by described according to the tag attributes feature and geographical location information The user clustering for managing user's concentration is multiple user's clusters;Determining unit, is configured to be based on the tag attributes feature calculation mesh The similarity of other users and the similar users of the targeted customer are determined in mark user and same user's cluster.
In B2, the device as described in B1, the statistic unit is further configured to determine as follows each described The weight of default label:To each default label, the History Order information associated with the default label is filtered out;Statistics The corresponding number of users of History Order information filtered out;Ask the number of users counted inverted as described pre- after logarithm It is marked with the weight of label.
In B3, the device as described in B1, the generation unit is further configured to generate as follows each described The tag attributes feature of user:According to the History Order information, it is determined that each user corresponds under each default label Single-frequency time;Correspond to the place an order frequency and the weight of the default label of correspondence of each default label based on each user, calculate every The frequency that effectively places an order of the corresponding each default label of individual user;The frequency generation that effectively places an order based on each default label The label characteristics vector of each user, is used as the tag attributes feature of each user.
In B4, the device as described in B3, the generation unit is further configured to calculate as follows often The frequency that effectively places an order of the corresponding each default label of individual user:Each user is corresponded to the frequency that places an order of each default label With the multiplied by weight of corresponding each default label, the frequency that effectively places an order of the corresponding each default label of each user is used as;The life Further it is configured to generate the label characteristics vector of each user into unit as follows:By each default label Effectively place an order the frequency as the label characteristics vector in correspond to each default label characteristic value.
In B5, the device as described in B3, the cluster cell is further configured to use pending as follows The user clustering that family is concentrated is multiple user's clusters:Descending row is carried out by characteristic value to the element in the label characteristics vector of each user The corresponding default label of element that position is preset before sequence, selected and sorted is label to be matched;With the geographical location information and described Label to be matched is the characteristic information of each user, is gathered the user that the pending user concentrates based on the characteristic information Class is multiple user's clusters.
In B6, the device as described in B5, the cluster cell is further configured to use pending as follows The user clustering that family is concentrated is multiple user's clusters:By the geographical location information is identical and at least one label identical to be matched User gathers for same user's cluster.
In B7, the device as described in B3, the determining unit is further configured to calculate target use as follows The similarity of other users and the similar users of the targeted customer are determined in family and same user's cluster:The target is calculated to use The vectorial similarity with the label characteristics vector of other each users in same user's cluster of label characteristics at family;Used based on the target The vectorial similarity with the label characteristics vector of other each users in same user's cluster of label characteristics at family filters out the target The similar users of user.
The embodiment of the present application discloses C1, a kind of server, including:One or more processors;Storage device, for storing One or more programs, when one or more of programs are by one or more of computing devices so that it is one or Multiple processors realize A1 into A7 it is any as described in method.
The embodiment of the present application discloses D1, a kind of computer-readable recording medium, is stored thereon with computer program, the program When being executed by processor realize A1 into A7 it is any as described in method.
Method, device and the server for determining similar users that the embodiment of the present application is provided, it is pending by obtaining User concentrates the user profile of each user, then statistics number of users corresponding with the History Order information of each default label association Amount, to determine the weight of each default label, then concentrates the History Order information of each user and each pre- based on pending user Be marked with the weight of label, generate the tag attributes feature of each user, afterwards according to tag attributes feature and geographical location information to Family concentrate user clustered, be finally based on tag attributes feature calculation targeted customer in same cluster other users it is similar The similar users of targeted customer are spent and determined, the weight of different default labels can rationally, be effectively determined, and are come according to this accurate The attributive character of user is really described, so as to improve the accuracy of Similarity Measure between user.In addition by being gathered to user Class, then searches similar users in same user's cluster, can effectively reduce the computation complexity of similarity between user, lifts phase The efficiency positioned like user.
Brief description of the drawings
Non-limiting example is described in detail with reference to what the following drawings was made by reading, other features, Objects and advantages will become more apparent upon:
Fig. 1 is that the application can apply to a kind of exemplary system architecture figure therein;
Fig. 2 is the flow chart for being used to determine one embodiment of the method for similar users according to the application;
Fig. 3 is the flow chart for being used to determine another embodiment of the method for similar users according to the application;
Fig. 4 is the effect diagram for being used to determine an application scenarios of the method for similar users according to the application;
Fig. 5 is the structural representation for being used to determine one embodiment of the device of similar users of the application;
Fig. 6 is adapted for the structural representation of the computer system of the server for realizing the embodiment of the present application.
Embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that, in order to Be easy to description, illustrate only in accompanying drawing to about the related part of invention.
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combination.Describe the application in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1, which is shown, can apply the method for determining similar users or the dress for determining similar users of the application A kind of exemplary system architecture 100 for the embodiment put.
As shown in figure 1, system architecture 100 can include trade company 110 use terminal device 101,102, user 120, 130th ... terminal device 103,104 ..., network 105 and server 106.Network 105 be used to terminal device 101,102, The 103rd, 104 ... the medium of communication link is provided between server 106.Network 105 can include various connection types, for example Wired, wireless communication link or fiber optic cables etc..
Trade company 110 can be interacted with using terminal equipment 101,102 by network 105 with server 106, to receive or send Message.Terminal device 101,102 can be provided with the application that the service provided with server 106 is associated, class application of for example doing shopping.
User 120,130 ... can also using terminal equipment 103,104 ... interacted by network 105 with server 106, To receive or send message.Terminal device 103,104 ... various telecommunication customer end applications, such as web page browsing can be installed Device application, the application of shopping class, social software etc..
Terminal device 101,102,103,104 ... can be with display screen and support the various of Data Communication in Computer Networks Electronic equipment, including but not limited to smart mobile phone, tablet personal computer, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio aspect 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio aspect 4) player, knee Mo(u)ld top half pocket computer and desktop computer etc..
Server 106 can be for the terminal device 101,102 of trade company 110 and user 120,130 ... terminal device 103rd, 104 the server of same data, services ... is provided, the background server for class application of for example, doing shopping.Shopping class application Background server can receive user 120,130 ... terminal device 103,104 ... request of data, and to request of data Sent after the processing such as being analyzed, being stored to the terminal device 101,102 of trade company 110, and by the terminal device 101 of trade company 110, 102 return feedback informations analyzed and processed after send to user 120,130 ... terminal device 103,104 ....
It should be noted that the method for being used to determine similar users that the embodiment of the present application is provided is general by server 106 perform, correspondingly, for determining that the device of similar users is generally positioned in server 106.
It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematical.According to realizing need Will, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the stream for being used to determine one embodiment of the method for similar users according to the application is shown Journey 200.This is used for the method for determining similar users, comprises the following steps:
Step 201, the user profile that pending user concentrates each user is obtained.
In the present embodiment, pending user collection can be the above-mentioned method operation for being used to determining similar users with thereon Electronic equipment (such as the server 106 shown in Fig. 1) provide support application platform (such as online shopping platform) whole The set of user.Above-mentioned electronic equipment can concentrate each user from user that is local or obtaining the application platform from other equipment User profile, the user profile of each user on platform can be for example transferred from local memory, network can also be passed through The user profile of each user is received from remote equipment.
Above-mentioned user profile includes geographical location information and the History Order information associated with least one default label. Wherein, geographical location information can be represented by house number, latitude and longitude coordinates, landmark building etc., can also be by above-mentioned application Platform pre-defined area identification (such as commercial circle title) is represented.Default label can be the Commercial goods labelses (example in trade company The vegetable label in such as restaurant), the age label of user, item price label.
Label is preset in platform as an example, this is made a reservation with the user on platform that makes a reservation includes multiple vegetable labels, for example " braised in soy sauce ", " the meat clip Mo ", " vinegar-pepper ", " boiled dumpling ", " rice " etc., the vegetable label of each vegetable of each trade company can pass through profit The tag set is matched with menu name to obtain.
Above-mentioned History Order information is recorded including a plurality of History Order, and the lower single object in every History Order record all has There is at least one default label, every History Order record correspondence user's once descends single operation, then can be by user at one section Lower single operation in time is stored in association with above-mentioned default label, obtains the history associated with least one default label Order record, and then obtain the above-mentioned History Order information associated with least one default label.
In some optional implementations, above-mentioned user profile can also include the base attribute information of user, including The information such as age, sex, occupation, hobby, the custom of user.These base attribute information can be by user's active typing, such as Age, sex;Can also combine user platform operation behavior data obtain, for example can by user the time that places an order, under The analysis such as folk prescription formula draws the information such as custom, the hobby of user., can be according to user when obtaining the History Order information of user Identity, while obtain user above-mentioned base attribute information.
Step 202, statistics number of users corresponding with the History Order information of each default label association, each default to determine The weight of label.
In the present embodiment, different default labels are different from the description accuracy of the difference of other users to user.Also It is to say, the ability that the default label of difference is described to user characteristics is differed.The History Order generally associated with a default label The corresponding number of users of information is more, then the ability that the default label is used to distinguish different user is poorer, otherwise is preset with one The corresponding number of users of History Order information of label association is fewer, then the ability that the default label is used to distinguish different user is got over By force.For example, in class platform of making a reservation, label " rice " is preset related to the order record of most of user in platform Connection, and user's proportion in the platform associated with default label " spiral shell powder " is smaller, then presets label " spiral shell powder " and compare The hobby or feature of user can more effectively be described in " rice ".
In the present embodiment, the ability that can distinguish different user according to different default labels is added to default label Power, weight can be related to the corresponding number of users of History Order information that default label is associated.Specifically, the power of label is preset Weight can number of users negative correlation corresponding with the History Order information that default label is associated.It is alternatively possible to using with presetting The weight reciprocal as default label of the corresponding number of users of History Order information of the association of label.
Herein, number of users corresponding with the History Order information that default label is associated can be carried out as follows Statistics:The History Order information that pending user concentrates all users is counted according to default label, you can that will own History Order information divide each default corresponding History Order set of records ends of label, then going through for each default label History order record set, counts the corresponding total number of users of History Order set of records ends, as with going through that each default label is associated The statistical result of the corresponding number of users of history sequence information.
In some optional implementations of the present embodiment, the power of each default label can be determined as follows Weight:To each default label, the History Order information associated with default label is filtered out;Count the History Order information filtered out Corresponding number of users;The inverted weight as default label after logarithm is sought the number of users counted.Specifically, it is right Each preset label tagkIf what is filtered out presets label tag with thiskThe corresponding number of users of History Order information of association is user_count(tagk), then weight weight (tag of the default labelk) be:
Wherein, k=1,2,3 ... n, n are the quantity of default label.
By pair with default label tagkThe corresponding number of users of History Order information of association is user_count (tagk) Ask reciprocal after taking the logarithm, user_count (tag can be avoided when the total number of users on above-mentioned application software platform is largerk) compared with Hour default label tagkWeight turn into especially small number, it is ensured that each default label of the History Order information association of user All there is certain influence power in user characteristics description.
Step 203, the History Order information of each user and the weight of each default label are concentrated based on pending user, it is raw Into the tag attributes feature of each user.
Herein, tag attributes feature can be the user property feature based on label, in other words, that is, special with label Property is come the attribute of user that represents.In the present embodiment, the weight of each default label can be combined, according to default label to every Order in the History Order information of individual user is counted, and regard statistical result as the tag attributes feature for corresponding to user.
Specifically, the tag attributes feature of user can be represented using various ways.For example, user A history Sequence information includes 1, the order associated with default label a 3, the order associated and with default label b, it is assumed that step Default label a is drawn in 202, b weight is respectively weight (a) and weight (b), then user A tag attributes feature can To be expressed as weight (a) × a3+weight (b) × b1, or weight (a) × a3&weight (b) × b1.Alternatively, mark The form of label attributive character can be preset, and draw statistical result of the quantity on order based on each default label of each user Afterwards, with reference to the weight of each default label, the tag attributes feature of user is represented according to predetermined pattern
In some optional implementations of the present embodiment, the quantity of above-mentioned default label is more, can first will be pre- If labeling, all default labels are divided into multiple label classifications, then for each label classification, based on label class The weight of all default labels determines the weight of the label classification in not, for example, can say each default label in label classification The average of weight is counted based on label classification to the History Order information of user afterwards as the weight of the label classification, That is the quantity that the History Order of statistics and each label category associations is recorded, is used as the tag attributes feature of each user.So Follow-up operand can be reduced, shortens operation time, the efficiency of similar users positioning is improved.
Step 204, it is many by the user clustering that pending user concentrates according to tag attributes feature and geographical location information Individual user's cluster.
In the present embodiment, above-mentioned electronic equipment can according to the tag attributes feature and geographical location information of user to Family is clustered, namely the user concentrated according to tag attributes feature and geographical location information to pending user classifies. Herein.The geographical position letter that the tag attributes feature and step 201 that the feature of each user can be generated by step 203 are obtained Breath is represented, in cluster, can using tag attributes feature and geographical location information as each user's cluster feature.
Specifically, geographical location information can be gathered to be same the consistent user of identical and user tag attributes feature User's cluster, the different user of geographical location information is subdivided into different user's clusters, by the inconsistent user of tag attributes feature It is subdivided into different user's clusters.In actual scene, geographical location information is identical can be identical for " commercial circle ", " commercial circle " here Can be multistage " commercial circle ", high level " commercial circle " can cover the geographical position range of low level " commercial circle ".Specific real The rank of " commercial circle " can be selected according to the demand of similar users setting accuracy in existing.The uniformity of tag attributes feature can To judge using a variety of existing methods, for example, the tag attributes feature of two users directly can be subjected to overall comparison or office Portion compares to draw.
In the present embodiment, the method that the user that pending user concentrates is clustered can be included but is not limited to be based on K averages (K-means) algorithm, hierarchical clustering algorithm, fuzzy C-mean algorithm (FCM) clustering algorithm etc..
Can be difference by user clustering by being clustered based on tag attributes feature and geographical location information to user User's cluster, user's similarity is high in same user's cluster, and user's similarity between different user cluster is low, so, is subsequently screening During similar users, it is only necessary to the similarity between user is calculated in same user's cluster, without in each targeted customer's calculating platform The similarity of other all users and the targeted customer, so as to significantly decrease the complexity of Similarity Measure between user, Lift similar users location efficiency.
Step 205, the similarity based on other users in tag attributes feature calculation targeted customer and same user's cluster is simultaneously Determine the similar users of targeted customer.
After the user clustering for concentrating pending user is multiple user's clusters, use can be calculated in same user's cluster Similarity between family.Above-mentioned targeted customer is the user that pending user concentrates, and can be similar users to be matched Object.In actual scene, targeted customer can be the user with special characteristic, such as the single-frequency under a certain shop on platform The good user of the secondary higher or evaluation to shop.
In the present embodiment, the similar users of targeted customer can be searched in user's cluster where targeted customer.Specifically For, can the similarity based on other users in tag attributes feature calculation targeted customer and same user's cluster, and based on phase The similar users of targeted customer are determined like degree.
Specifically, can be to user where the tag attributes feature of targeted customer and targeted customer when calculating similarity The tag attributes feature of other users is compared in cluster, can use a variety of methods, for example, can belong to the label of each user Property feature with data mode (mode such as character string, vector, matrix) represent, then extract tag attributes feature characteristic value Or characteristic strong point, two tag attributes features are matched using characteristic value or characteristic strong point, by two tag attributes The characteristic value of feature or the matching degree at characteristic strong point as corresponding two users similarity.Furthermore it is also possible to using all Such as cosine similarity, Pearson correlation coefficients method calculate the similarity between the tag attributes feature of two users, as The similarity of two users.
, can be by similarity after the similarity of other each users in calculating targeted customer and same user's cluster Higher than setting threshold value user as targeted customer similar users, or can according to similarity carry out descending sort, choosing The similar users for targeted customer of N (N is positive integer set in advance) position before fixed sequence.
Afterwards, above-mentioned electronic equipment can push to the relevant information of similar users the higher business of targeted customer's interest-degree Family.Here higher can to include but is not limited to targeted customer higher or right in trade company's frequency that places an order for the interest-degree of targeted customer The preferable, total amount that places an order of evaluating of trade company exceedes amount of money of setting etc..
The method for determining similar users that the above embodiments of the present application are provided, obtains pending user and concentrates each first The user profile of user, then statistics number of users corresponding with the History Order information of each default label association, each to determine The weight of default label, then concentrates the History Order information of each user and the power of each default label based on pending user Weight, generates the tag attributes feature of each user, the use concentrated afterwards according to tag attributes feature and geographical location information to user Family is clustered, and is finally based on tag attributes feature calculation targeted customer and the similarity of other users in same cluster and is determined The similar users of targeted customer, can rationally, effectively determine the weight of different default labels, and carry out accurate description user according to this Attributive character, so as to improve the accuracy of Similarity Measure between user.
In addition, the above embodiments of the present application to user by clustering, similar users are searched in user's cluster and come effective The computation complexity of similarity between reduction user, can lift the efficiency of similar users positioning.
With continued reference to Fig. 3, it illustrates the flow 300 of another embodiment of the method for determining similar users.Should For the flow 300 for the method for determining similar users, comprise the following steps:
Step 301, the user profile that pending user concentrates each user is obtained.
In the present embodiment, for determine the method for similar users run electronic equipment thereon can from local or The user profile that pending user concentrates each user is obtained from other equipment.Wherein, pending user, which integrates, to be application software The set of all users on platform.User profile include geographical location information and with going through that at least one default label is associated History sequence information.Default label can for the type of merchandise label (the vegetable label in such as restaurant) of trade company, item price label, Distribution information label etc..History Order information is recorded including a plurality of History Order, and the one of every History Order record correspondence user Secondary lower single operation, then can in association store with above-mentioned default label by lower single operation of the user within a period of time, obtain Recorded to History Order associate with least one default label, and then obtain above-mentioned with going through that at least one presets that label associates History sequence information.
Step 302, statistics number of users corresponding with the History Order information of each default label association, each default to determine The weight of label.
In the present embodiment, the ability that can distinguish different user according to different default labels is added to default label Power, weight can be related to the corresponding number of users of History Order information that default label is associated.Specifically, the power of label is preset Weight can number of users corresponding with the History Order information of the default label association it is negatively correlated, that is to say, that when one it is default When the corresponding number of users of History Order information of label association is more, the ability that the default label distinguishes different user is poor, Then its weight is relatively low;Conversely, when the corresponding number of users of History Order information of a default label association is less, this is preset The ability that label distinguishes different user is stronger, then its weight is higher.Thus, it is possible to utilize the History Order associated with default label The corresponding number of users of information obtains the weight of default label.
Step 303, according to History Order information, it is determined that each user corresponds to the frequency that places an order of each default label.
In the present embodiment, the History Order information for each user that can be concentrated according to pending user, is used each The frequency that places an order that family corresponds to each default label is counted.Herein, every History Order record in History Order information All associated with one or more default labels.For each user, the history that can specifically count each default label association is ordered The quantity of unirecord, or unifrequency under the history that the user associates with each label is counted, correspond to as the user each default The frequency that places an order of label.It is alternatively possible to the History Order record only in statistics a period of time (such as in 3 months), to reduce Follow-up operand.
For example, if user placed an order 3 times in 3 months, including the History Order associated with default label A, B Record, the History Order record associated with default label A, C, D, and one associated with default label A, B, C, E go through History order record, then the user correspond to default label A, B, C, D, E the frequency statistics result that places an order and be respectively:3、2、2、1、1.
In some optional implementations of the present embodiment, count each user place an order the frequency when, it is also contemplated that The interest-degree of user changes with time, such as when the time gap that places an order of an order record in History Order record is current Between farther out when, the importance that this order record is assessed user tag attributive character less than another time gap that places an order it is current The importance of time nearer order record.
Specifically, in some optional implementations, the History Order that the label can be corresponded to according to user is recorded Order generation the time and current time distance, determine influence power of the time decay factor to the frequency statistics that place an order, for example in advance If time decay factor be α (0 < α < 1, such as α=0.95), when the order generation time gap of History Order record is current Between t days, then time decay factor is α to the influence powers of the frequency statistics that place an ordert, the equivalent number of times that places an order of this History Order record For αt.So, the equivalent number of times addition that places an order of pair every History Order associated with same default label record, that is, used Family corresponds to the statistical result of the frequency that places an order of the label.
Step 304, the place an order frequency and the weight of the default label of correspondence, meter of each default label are corresponded to based on each user Calculate the frequency that effectively places an order for drawing the corresponding each default label of each user.
The corresponding frequency that effectively places an order can be calculated according to the weight and the frequency statistics result that places an order of each default label.Effectively The frequency that places an order can be the frequency statistics result that places an order corresponding to each default label for more accurately representing each user after weighting. Herein, each user can be corresponded to the place an order frequency and the multiplied by weight of corresponding each default label of each default label, made For the frequency that effectively places an order of the corresponding each default label of each user.
Step 305, the frequency that effectively places an order based on each default label generates the label characteristics vector of each user, is used as each The tag attributes feature at family.
In the present embodiment, it is possible to use the form of label characteristics vector represents tag attributes feature.Label characteristics to Each element in amount corresponds to each default feature tag.Can be special as label using the frequency that effectively places an order of each default label Levy the characteristic value for corresponding to each default label in vector.
In some optional implementations, each label that can also correspond to user in default tag set placing an order The statistical result of the frequency is normalized, and regard the frequency statistics result that places an order after normalized as label characteristics vector In each corresponding element characteristic value.Specifically, user i label characteristics vector viIt can be expressed as:
vi=[vi1,vi2,…,vin] (2)
Wherein,
count(tagj) for j-th of default label the frequency that places an order statistical result, n is the total quantity of default label, weight(tagj) for the weight of j-th of default label, k=1,2,3 ..., n.
So, the tag attributes feature of each user can use an one-dimensional label characteristics vector representation, as with The quantized value of family portrait carries out follow-up cluster and Similarity Measure.
Step 306, the element in the label characteristics vector of each user is carried out before descending sort, selected and sorted by characteristic value The corresponding default label of element of default position is label to be matched.
, can be to the element in the label characteristics of each user vector according to having after the label characteristics vector of generation user The statistical value for imitating the frequency that places an order carries out descending sort, it is assumed that user i label characteristics vector viM are respectively the before middle sequence T1, t2 ..., tm characteristic value vit1、vit2、…、vitm, then can be by label characteristics vector viIn t1, t2 ..., tm element Corresponding default label is used as label to be matched.In actual scene, m may be configured as a less integer, and for example, 3, this Sample, it is ensured that there is larger difference, simultaneously because default number of tags to be matched in cluster result between different user cluster Amount is less, can further reduce computation complexity.
Step 307, using geographical location information and label to be matched as the characteristic information of each user, feature based information will be treated The user clustering for handling user's concentration is multiple user's clusters.
In the present embodiment, what above-mentioned electronic equipment can be determined using geographical location information and step 306 is to be matched Label is clustered as the characteristic information of user, can be using set in advance and geographical location information and label system to be matched The similarity that user is divided into the characteristic information of user in multiple user's clusters, each user's cluster by related clustering rule is higher.
, can be identical by geographical location information and at least one label to be matched is identical in some optional implementations User gather for same user's cluster.
In further implementation, in order to more accurately classify to user, when the geographical position of two users Information is identical, and each label to be matched is when also the match is successful, two users can be subdivided into same user's cluster.Alternatively, If the geographical location information of two users is identical, and label to be matched matches, and can also determine whether the mark of two users Whether the corresponding characteristic value of label to be matched signed in characteristic vector is identical, if so, two users then are subdivided into same user Cluster, is otherwise subdivided into different user's clusters.
So, user is clustered using rule set in advance, can simplified while the cluster degree of accuracy is ensured Clustering algorithm, can reduce the computation complexity of cluster, so as to further lift the location efficiency of similar users.
Step 308, the similarity based on other users in tag attributes feature calculation targeted customer and same user's cluster is simultaneously Determine the similar users of targeted customer.
After the user clustering for concentrating pending user is multiple user's clusters, use can be calculated in same user's cluster Similarity between family.Above-mentioned targeted customer is the user that pending user concentrates, and can be predetermined user. In actual scene, targeted customer can be the user with special characteristic, such as a certain shop frequency that places an order is higher on platform Or to the user having higher rating in shop.
In the present embodiment, the label characteristics vector and the label with each user of other in cluster of targeted customer can be calculated The similarity of characteristic vector, and the label characteristics based on targeted customer are vectorial special with other each users in same user's cluster labels The similarity for levying vector filters out the similar users of targeted customer.Optional similarity calculating method can include but is not limited to remaining String similarity, Pearson correlation coefficients, Euclidean distance etc..
When screening the similar users of targeted customer, it can be determined that whether similarity is higher than the threshold value of setting, by similarity Higher than setting threshold value user as targeted customer similar users;Or descending sort, choosing can be carried out according to similarity The similar users for targeted customer of N (N is positive integer set in advance) position before fixed sequence.
Step 301, step 302 in above method flow respectively with the step 201 in previous embodiment, step 202 phase Together, here is omitted.
From figure 3, it can be seen that compared with the corresponding embodiments of Fig. 2, being used in the present embodiment determines mesh similar users Method flow 300 by being label characteristics vector by the tag attributes characteristic quantification of user, and further from label characteristics The corresponding default label of larger characteristic value is extracted in vector as label to be matched, based on label to be matched and geographical position Information is clustered, and can be quickly and efficiently different user's clusters by user clustering, so as to further reduce similar use The complexity that family is calculated, the efficiency of lifting similar users positioning.
Fig. 4 shows that being used for shown in Fig. 2 and Fig. 3 determines the effect signal of an application scenarios of the method for similar users Figure.As shown in figure 4, the back-end server of platform where trade company " * * chop house " can be provided in this trade company and place an order the most visitor of number of times Family list, including user " AAAA " and " BBBB ", if trade company's selection " drawing new " service, above-mentioned back-end server can be by user " AAAA " and " BBBB " searches the similar users of targeted customer " AAAA " and " BBBB " as targeted customer.It can specifically obtain Each corresponding number of users of vegetable label in the History Order information of all users in line taking, statistics platform, so that it is determined that going out The weight of each vegetable, and combine the mark that the label that lower single user orders vegetable every time in History Order information determines each user Sign attributive character.The user for being located at same commercial circle with targeted customer " AAAA " or " BBBB " will be filtered out afterwards, in the commercial circle User clustered, finally calculate targeted customer " AAAA " and " BBBB " and same class in other users similarity, and Similarity is higher is used as similar users for selection.Positioning result such as the similar users of user " BBBB0 " in Fig. 4 is user " XX " With user " YYY ".The positioning result of similar users can be pushed in the client of trade company " * * chop house " and presented.Can be with The information such as the lower unirecord of similar users are provided, so that trade company carries out targetedly commercial product recommending or action message to similar users Push.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, it is used to determine phase this application provides one kind Like one embodiment of the device of user, the device embodiment is corresponding with the embodiment of the method shown in Fig. 5, and the device specifically may be used With applied in various electronic equipments.
As shown in figure 5, the device 500 for determining mesh similar users of the present embodiment includes:Acquiring unit 501, statistics Unit 502, generation unit 503, cluster cell 504 and determining unit 505.Wherein, acquiring unit 501 is configured to acquisition and treated The user profile that user concentrates each user is handled, wherein, user profile includes geographical location information and default with least one The History Order information of label association;Statistic unit 502 is configured to the History Order information that statistics is associated with each default label Corresponding number of users, to determine the weight of each default label;Generation unit 503 is configured to concentrate each based on pending user The weight of the History Order information of user and each default label, generates the tag attributes feature of each user;Cluster cell 504 is matched somebody with somebody It is multiple user's clusters to put the user clustering for concentrating pending user according to tag attributes feature and geographical location information;Really Order member 505 is configured to the similarity based on other users in tag attributes feature calculation targeted customer and same user's cluster simultaneously Determine the similar users of targeted customer.
In the present embodiment, acquiring unit 501 can transfer out the user profile of each user on platform from local storage, Or can by wired connection mode or radio connection each user on receiving platform from other servers user Information.Here user profile can also include the base attribute information such as age, hobby, the occupation of user.
The user profile for each user that statistic unit 502 can be obtained based on acquiring unit 501, each pre- bidding of statistics The History Order of label association records corresponding number of users, and the power of each default label of number of users setting according to statistics Weight.
Generation unit 503 can count the weight of obtained each default label according to statistic unit 502, with reference to each user's Corresponding to the quantity of the History Order record of each default label, the tag attributes feature of user is generated.It is alternatively possible to by user Corresponding to the quantity and the multiplied by weight of corresponding default label of the History Order record of each default label, pair of user is obtained It should belong in the characteristic value of the default label, the label that the characteristic value corresponding to each default label of user then is synthesized into user Property feature.
Cluster cell 504 can be to the geographical location information and generation unit of the user obtained according to acquiring unit 501 The user that the label characteristics attribute of the user of 503 generations is concentrated to pending user clusters.That is, can be by geography User is divided into different user's clusters by positional information and label characteristics attribute as the feature of cluster institute foundation.Same user The similarity of the feature of user in cluster is higher, and the similarity of the feature of user is relatively low between different user cluster.In such manner, it is possible to pass through Cluster is preliminary to be excluded with the low user of targeted customer's similarity, so as to accelerate the speed of follow-up similar users positioning.
Determining unit 505 can calculate the similarity of targeted customer and other users in same user's cluster and determine target The similar users of user.Specifically, the similarity between the tag attributes feature of targeted customer and other users can be regard as two The similarity of person.Similarity can be selected higher than the user of the threshold value of setting or the top N that sorts as similar users afterwards.
In certain embodiments, above-mentioned statistic unit 502 may be configured to determine each default label as follows Weight:To each default label, the History Order information associated with default label is filtered out;Count the History Order filtered out The corresponding number of users of information;The inverted weight as default label after logarithm is sought the number of users counted.
In certain embodiments, generation unit 503 may be configured to generate the label category of each user as follows Property feature:According to History Order information, it is determined that each user corresponds to the frequency that places an order of each default label;Based on each user couple The corresponding default label of each user should be calculated in the place an order frequency and the weight of the default label of correspondence of each default label Effectively place an order the frequency;The frequency that effectively places an order based on each default label generates the label characteristics vector of each user, is used as each user Tag attributes feature.
In a further embodiment, above-mentioned generation unit 503 can further be configured to correspond to each user The place an order frequency and the multiplied by weight of corresponding each default label of each default label, are used as the corresponding each default label of each user Effectively place an order the frequency;And be further configured to regard the frequency that effectively places an order of each default label as correspondence in label characteristics vector In the characteristic value of each default label.
In a further embodiment, above-mentioned cluster cell 504 can further be configured to treat as follows The user clustering for handling user's concentration is multiple user's clusters:Element in the label characteristics vector of each user is carried out by characteristic value The corresponding default label of element that position is preset before descending sort, selected and sorted is label to be matched;Geographical location information and to treat The characteristic information that label is each user is matched, the user clustering that feature based information concentrates pending user is multiple users Cluster.
Further, above-mentioned cluster cell 504 may be configured to geographical location information is identical and at least one is treated Gather with label identical user for same user's cluster.
In certain embodiments, above-mentioned determining unit 505 is further configured to:Calculate targeted customer label characteristics to Amount and the similarity of the label characteristics vector of other each users in same user's cluster;Based on targeted customer label characteristics vector with The similarity of the label characteristics vector of other each users filters out the similar users of targeted customer in same user's cluster.
It should be appreciated that all units described in device 500 and each step phase in the method referring to figs. 2 and 3 description Correspondence.Thus, the unit that the operation and feature described above with respect to method is equally applicable to device 500 and wherein included, herein Repeat no more.
The device 500 for determining similar users that the embodiment of the present application is provided, utilizes statistic unit to count each pre- bidding The corresponding number of users of sequence information of pipe association is signed, the weight of different default labels can rationally, be effectively determined, generation is single Member carrys out the attributive character of accurate description user according to this, so as to improve the accuracy of Similarity Measure between user;And by right Cluster cell user is clustered, it is then determined that unit searches similar users in same user's cluster, can effectively reduce calculating Complexity, the efficiency of lifting similar users positioning.
Below with reference to Fig. 6, it illustrates suitable for the computer system 600 for the server of realizing the embodiment of the present application Structural representation.Server shown in Fig. 6 is only an example, to the function of the embodiment of the present application and should not use range band Carry out any limitation.
As shown in fig. 6, computer system 600 includes CPU (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into program in random access storage device (RAM) 603 from storage part 608 and Perform various appropriate actions and processing.In RAM 603, the system that is also stored with 600 operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.
I/O interfaces 605 are connected to lower component:Importation 606 including keyboard, mouse etc.;Penetrated including such as negative electrode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage part 608 including hard disk etc.; And the communications portion 609 of the NIC including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net performs communication process.Driver 610 is also according to needing to be connected to I/O interfaces 605.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc., are arranged on driver 610, in order to read from it as needed Computer program be mounted into as needed storage part 608.
Especially, in accordance with an embodiment of the present disclosure, the process described above with reference to flow chart may be implemented as computer Software program.For example, embodiment of the disclosure includes a kind of computer program product, it includes being carried on computer-readable medium On computer program, the computer program includes the program code for being used to perform above-mentioned flow chart 2 or the method shown in Fig. 3. In such embodiments, the computer program can be downloaded and installed by communications portion 609 from network, and/or from Detachable media 611 is mounted.When the computer program is performed by CPU (CPU) 601, the side of the application is performed The above-mentioned functions limited in method.It should be noted that computer-readable medium described herein can be computer-readable letter Number medium or computer-readable recording medium either the two any combination.Computer-readable recording medium for example may be used System, device or the device of --- but being not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor to be, or it is any with On combination.The more specifically example of computer-readable recording medium can include but is not limited to:With one or more wires Electrical connection, portable computer diskette, hard disk, random access storage device (RAM), read-only storage (ROM), erasable type can compile Journey read-only storage (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic Memory device or above-mentioned any appropriate combination.In this application, computer-readable recording medium can be any includes Or the tangible medium of storage program, the program can be commanded execution system, device or device using or in connection make With.And in this application, computer-readable signal media can be included in a base band or as carrier wave part propagation Data-signal, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, bag Include but be not limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium beyond computer-readable recording medium, the computer-readable medium can send, propagate or Transmit for being used or program in connection by instruction execution system, device or device.Computer-readable medium On the program code that includes any appropriate medium can be used to transmit, include but is not limited to:Wirelessly, electric wire, optical cable, RF etc., Or above-mentioned any appropriate combination.
Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of the various embodiments of the application, method and computer journey Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation The part of one module of table, program segment or code, the part of the module, program segment or code is used comprising one or more In the executable instruction for realizing defined logic function.It should also be noted that in some realizations as replacement, being marked in square frame The function of note can also be with different from the order marked in accompanying drawing generation.For example, two square frames succeedingly represented are actually It can perform substantially in parallel, they can also be performed in the opposite order sometimes, this is depending on involved function.Also to note Meaning, the combination of each square frame in block diagram and/or flow chart and the square frame in block diagram and/or flow chart can be with holding The special hardware based system of function or operation as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit can also be set within a processor, for example, can be described as:A kind of processor bag Include acquiring unit, statistic unit, screening unit, computing unit and determining unit.Wherein, the title of these units is in certain situation Under do not constitute restriction to the unit in itself, for example, acquiring unit is also described as " obtaining pending user and concentrating each The unit of the user profile of user ".
As on the other hand, present invention also provides a kind of computer-readable medium, the computer-readable medium can be Included in device described in above-described embodiment;Can also be individualism, and without be incorporated the device in.Above-mentioned calculating Machine computer-readable recording medium carries one or more program, when said one or multiple programs are performed by the device so that should Device obtains the user profile that pending user concentrates each user, the user profile include geographical location information and with least The History Order information of one default label association;Count corresponding with the History Order information of each default label association to use Amount amount, to determine the weight of each default label;The History Order information of each user is concentrated based on the pending user And the weight of each default label, generate the tag attributes feature of each user;According to the tag attributes feature and The user clustering that the pending user concentrates is multiple user's clusters by geographical location information;Based on the tag attributes feature meter Calculate the similarity of other users in targeted customer and same user's cluster and determine the similar users of the targeted customer.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.People in the art Member should be appreciated that invention scope involved in the application, however it is not limited to the technology of the particular combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, is carried out by above-mentioned technical characteristic or its equivalent feature Other technical schemes formed by any combination.Such as features described above has similar work(with (but not limited to) disclosed herein The technical characteristic of energy carries out technical scheme formed by replacement mutually.

Claims (10)

1. a kind of method for determining similar users, it is characterised in that methods described includes:
Obtain the user profile that pending user concentrates each user, the user profile include geographical location information and with least The History Order information of one default label association;
Number of users corresponding with the History Order information of each default label association is counted, to determine each default label Weight;
The History Order information of each user and the weight of each default label are concentrated based on the pending user, generation is each The tag attributes feature of the user;
Used according to the user clustering that the tag attributes feature and geographical location information concentrate the pending user to be multiple Family cluster;
Similarity based on other users in the tag attributes feature calculation targeted customer and same user's cluster simultaneously determines institute State the similar users of targeted customer.
2. according to the method described in claim 1, it is characterised in that the history associated with each default label that counts is ordered The corresponding number of users of single information, to determine the weight of each default label, including:
To each default label, the History Order information associated with the default label is filtered out;
Count the corresponding number of users of History Order information filtered out;
The inverted weight as the default label after logarithm is sought the number of users counted.
3. according to the method described in claim 1, it is characterised in that described that going through for each user is concentrated based on the pending user The weight of history sequence information and each default label, generates the tag attributes feature of each user, including:
According to the History Order information, it is determined that each user corresponds to the frequency that places an order of each default label;
Correspond to the place an order frequency and the weight of the default label of correspondence of each default label based on each user, calculate each use The frequency that effectively places an order of the corresponding each default label in family;
The frequency that effectively places an order based on each default label generates the label characteristics vector of the user, is used as each user Tag attributes feature.
4. method according to claim 3, it is characterised in that described to be corresponded to based on each user under each default label The weight of single-frequency time and the default label of correspondence, calculates the effectively lower single-frequency of the corresponding each default label of each user It is secondary, including:
Each user is corresponded to the place an order frequency and the multiplied by weight of corresponding each default label of each default label, used as each The frequency that effectively places an order of the corresponding each default label in family;
The frequency that effectively places an order based on each default label generates the label characteristics vector of each user, including:
Using each default label effectively place an order the frequency as the label characteristics vector in correspond to each default label Characteristic value.
5. method according to claim 3, it is characterised in that described to be believed according to the tag attributes feature and geographical position It is multiple user's clusters to cease the user clustering for concentrating the pending user, including:
By characteristic value preset the element of position before descending sort, selected and sorted to the element in the label characteristics vector of each user Corresponding default label is label to be matched;
Using the geographical location information and the label to be matched as the characteristic information of each user, based on the characteristic information The user clustering that the pending user is concentrated is multiple user's clusters.
6. method according to claim 5, it is characterised in that described with the geographical location information and the mark to be matched Sign as the characteristic information of each user, by the user clustering that the pending user concentrates be multiple based on the characteristic information User's cluster, including:
The geographical location information is identical and at least one label identical user to be matched gathers for same user's cluster.
7. method according to claim 3, it is characterised in that described to be based on the tag attributes feature calculation targeted customer With the similarity of other users in same user's cluster and the similar users of the targeted customer are determined, including:
Calculate the vectorial phase with the label characteristics vector of other each users in same user's cluster of label characteristics of the targeted customer Like degree;
The vectorial phase with the label characteristics vector of other each users in same user's cluster of label characteristics based on the targeted customer The similar users of the targeted customer are filtered out like degree.
8. a kind of device for being used to determine similar users, it is characterised in that described device includes:
Acquiring unit, is configured to obtain the user profile that pending user concentrates each user, the user profile includes geography Positional information and the History Order information associated with least one default label;
Statistic unit, is configured to statistics number of users corresponding with the History Order information of each default label association, with Determine the weight of each default label;
Generation unit, is configured to concentrate the History Order information of each user and each described default based on the pending user The weight of label, generates the tag attributes feature of each user;
Cluster cell, is configured to concentrate the pending user according to the tag attributes feature and geographical location information User clustering is multiple user's clusters;
Determining unit, is configured to based on other users in the tag attributes feature calculation targeted customer and same user's cluster Similarity and the similar users for determining the targeted customer.
9. a kind of server, it is characterised in that including:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are by one or more of computing devices so that one or more of processors are real The existing method as described in any in claim 1-7.
10. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the program is by processor The method as described in any in claim 1-7 is realized during execution.
CN201710451969.4A 2017-06-15 2017-06-15 Method, device and server for determining similar users Pending CN107247786A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710451969.4A CN107247786A (en) 2017-06-15 2017-06-15 Method, device and server for determining similar users

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710451969.4A CN107247786A (en) 2017-06-15 2017-06-15 Method, device and server for determining similar users

Publications (1)

Publication Number Publication Date
CN107247786A true CN107247786A (en) 2017-10-13

Family

ID=60019223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710451969.4A Pending CN107247786A (en) 2017-06-15 2017-06-15 Method, device and server for determining similar users

Country Status (1)

Country Link
CN (1) CN107247786A (en)

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107786376A (en) * 2017-10-20 2018-03-09 广州优视网络科技有限公司 Content delivery method, device and computer equipment
CN107837532A (en) * 2017-11-16 2018-03-27 腾讯科技(上海)有限公司 User matching method, device, server and storage medium
CN107886354A (en) * 2017-10-31 2018-04-06 广州云移信息科技有限公司 Method and system for determining marketing object group
CN107943943A (en) * 2017-11-23 2018-04-20 北京小度信息科技有限公司 Definite method, apparatus, electronic equipment and the storage medium of user's similarity
CN108446281A (en) * 2017-02-13 2018-08-24 北京嘀嘀无限科技发展有限公司 Determine the method, apparatus and storage medium of user's cohesion
CN108764371A (en) * 2018-06-08 2018-11-06 Oppo广东移动通信有限公司 Image processing method, device, computer readable storage medium and electronic equipment
CN109447748A (en) * 2018-10-23 2019-03-08 广州致轩服饰有限公司 A kind of commodity feedback method and device based on increment user
CN109509016A (en) * 2018-09-25 2019-03-22 中国平安人寿保险股份有限公司 Sale processing method, apparatus and computer readable storage medium
CN109597858A (en) * 2018-12-14 2019-04-09 拉扎斯网络科技(上海)有限公司 Merchant classification method and device and merchant recommendation method and device
CN109697637A (en) * 2018-12-27 2019-04-30 拉扎斯网络科技(上海)有限公司 Object type determination method and device, electronic equipment and computer storage medium
WO2019080404A1 (en) * 2017-10-25 2019-05-02 平安科技(深圳)有限公司 Cross-social networking platform user matching method, data processing device, and readable storage medium
CN109712001A (en) * 2018-11-29 2019-05-03 平安科技(深圳)有限公司 Information recommendation method, device, computer equipment and storage medium
CN109731341A (en) * 2018-12-28 2019-05-10 广州华多网络科技有限公司 A kind of method for splitting of interlock account, device and equipment
CN109753993A (en) * 2018-12-11 2019-05-14 东软集团股份有限公司 User's portrait method, apparatus, computer readable storage medium and electronic equipment
CN109753994A (en) * 2018-12-11 2019-05-14 东软集团股份有限公司 User's portrait method, apparatus, computer readable storage medium and electronic equipment
CN109754171A (en) * 2018-12-25 2019-05-14 北京三快在线科技有限公司 Task ranking method, device, electronic equipment and storage medium
CN109766913A (en) * 2018-12-11 2019-05-17 东软集团股份有限公司 Tenant group method, apparatus, computer readable storage medium and electronic equipment
CN109784367A (en) * 2018-12-11 2019-05-21 东软集团股份有限公司 User's portrait method, apparatus, computer readable storage medium and electronic equipment
CN109872242A (en) * 2019-01-30 2019-06-11 北京字节跳动网络技术有限公司 Information-pushing method and device
CN109995884A (en) * 2017-12-29 2019-07-09 北京京东尚科信息技术有限公司 The method and apparatus for determining accurate geographic position
CN110083634A (en) * 2019-03-19 2019-08-02 中国平安人寿保险股份有限公司 Order processing method, apparatus, equipment and storage medium based on data analysis
CN110297750A (en) * 2018-03-22 2019-10-01 北京京东尚科信息技术有限公司 The method and apparatus of program similitude detection
CN110400024A (en) * 2019-07-31 2019-11-01 京东城市(北京)数字科技有限公司 Method, apparatus, equipment and the computer readable storage medium of order forecasting
CN110414613A (en) * 2019-07-31 2019-11-05 京东城市(北京)数字科技有限公司 Method, apparatus, equipment and the computer readable storage medium of region clustering
CN110503353A (en) * 2018-05-16 2019-11-26 北京三快在线科技有限公司 A kind of dispatching Zonal expression method and device
CN110517099A (en) * 2018-05-22 2019-11-29 北京京东尚科信息技术有限公司 Method and apparatus for determining joint supply side
CN110727857A (en) * 2019-09-04 2020-01-24 口碑(上海)信息技术有限公司 Method and device for identifying key features of potential users aiming at business objects
CN110827101A (en) * 2018-08-07 2020-02-21 北京京东尚科信息技术有限公司 Shop recommendation method and device
CN110858313A (en) * 2018-08-24 2020-03-03 国信优易数据有限公司 Crowd classification method and crowd classification system
CN111062419A (en) * 2019-11-26 2020-04-24 复旦大学 Compression and recovery method of deep learning data set
CN111178949A (en) * 2019-12-18 2020-05-19 北京文思海辉金信软件有限公司 Service resource matching reference data determination method, device, equipment and storage medium
CN111178975A (en) * 2019-12-31 2020-05-19 北京顺达同行科技有限公司 Business circle dividing method and device, electronic equipment and storage medium
CN111784356A (en) * 2020-07-22 2020-10-16 支付宝(杭州)信息技术有限公司 Payment verification method, device, equipment and storage medium
CN111831894A (en) * 2019-04-23 2020-10-27 北京嘀嘀无限科技发展有限公司 Information matching method and device
CN111861526A (en) * 2019-04-30 2020-10-30 京东城市(南京)科技有限公司 Method and device for analyzing object source
CN112131484A (en) * 2019-06-25 2020-12-25 北京京东尚科信息技术有限公司 Multi-person session establishing method, device, equipment and storage medium
CN112434140A (en) * 2020-11-10 2021-03-02 杭州博联智能科技股份有限公司 Reply information processing method and system
CN112529671A (en) * 2021-02-08 2021-03-19 杭州拼便宜网络科技有限公司 Commodity recommendation method and device, electronic equipment and storage medium
CN112767031A (en) * 2021-01-22 2021-05-07 北京明略昭辉科技有限公司 Takeaway advertisement putting method, system and equipment based on user connection strength
CN113743536A (en) * 2021-09-23 2021-12-03 安徽淘云科技股份有限公司 Self-adaptive height adjusting method of intelligent desk and intelligent desk
CN114581204A (en) * 2022-04-28 2022-06-03 深圳市同征电子商务有限公司 Commodity sales management method and system based on electronic commerce platform

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095256A (en) * 2014-05-07 2015-11-25 阿里巴巴集团控股有限公司 Information push method and apparatus based on similarity degree between users
CN106355449A (en) * 2016-08-31 2017-01-25 腾讯科技(深圳)有限公司 User selecting method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095256A (en) * 2014-05-07 2015-11-25 阿里巴巴集团控股有限公司 Information push method and apparatus based on similarity degree between users
CN106355449A (en) * 2016-08-31 2017-01-25 腾讯科技(深圳)有限公司 User selecting method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王娜 等: "融合标签权值的用户模糊聚类方法研究", 《情报理论与实践》 *

Cited By (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446281A (en) * 2017-02-13 2018-08-24 北京嘀嘀无限科技发展有限公司 Determine the method, apparatus and storage medium of user's cohesion
CN108446281B (en) * 2017-02-13 2021-03-12 北京嘀嘀无限科技发展有限公司 Method, device and storage medium for determining user intimacy
CN107786376A (en) * 2017-10-20 2018-03-09 广州优视网络科技有限公司 Content delivery method, device and computer equipment
WO2019080404A1 (en) * 2017-10-25 2019-05-02 平安科技(深圳)有限公司 Cross-social networking platform user matching method, data processing device, and readable storage medium
CN107886354A (en) * 2017-10-31 2018-04-06 广州云移信息科技有限公司 Method and system for determining marketing object group
CN107837532A (en) * 2017-11-16 2018-03-27 腾讯科技(上海)有限公司 User matching method, device, server and storage medium
CN107943943B (en) * 2017-11-23 2020-11-03 北京小度信息科技有限公司 User similarity determination method and device, electronic equipment and storage medium
CN107943943A (en) * 2017-11-23 2018-04-20 北京小度信息科技有限公司 Definite method, apparatus, electronic equipment and the storage medium of user's similarity
CN109995884A (en) * 2017-12-29 2019-07-09 北京京东尚科信息技术有限公司 The method and apparatus for determining accurate geographic position
CN110297750A (en) * 2018-03-22 2019-10-01 北京京东尚科信息技术有限公司 The method and apparatus of program similitude detection
CN110503353A (en) * 2018-05-16 2019-11-26 北京三快在线科技有限公司 A kind of dispatching Zonal expression method and device
CN110503353B (en) * 2018-05-16 2022-04-01 北京三快在线科技有限公司 Distribution area expression method and device
CN110517099A (en) * 2018-05-22 2019-11-29 北京京东尚科信息技术有限公司 Method and apparatus for determining joint supply side
CN108764371A (en) * 2018-06-08 2018-11-06 Oppo广东移动通信有限公司 Image processing method, device, computer readable storage medium and electronic equipment
CN110827101A (en) * 2018-08-07 2020-02-21 北京京东尚科信息技术有限公司 Shop recommendation method and device
CN110827101B (en) * 2018-08-07 2024-05-24 北京京东尚科信息技术有限公司 Shop recommending method and device
CN110858313A (en) * 2018-08-24 2020-03-03 国信优易数据有限公司 Crowd classification method and crowd classification system
CN110858313B (en) * 2018-08-24 2023-01-31 国信优易数据股份有限公司 Crowd classification method and crowd classification system
CN109509016A (en) * 2018-09-25 2019-03-22 中国平安人寿保险股份有限公司 Sale processing method, apparatus and computer readable storage medium
CN109447748A (en) * 2018-10-23 2019-03-08 广州致轩服饰有限公司 A kind of commodity feedback method and device based on increment user
CN109712001A (en) * 2018-11-29 2019-05-03 平安科技(深圳)有限公司 Information recommendation method, device, computer equipment and storage medium
CN109753994A (en) * 2018-12-11 2019-05-14 东软集团股份有限公司 User's portrait method, apparatus, computer readable storage medium and electronic equipment
CN109784367A (en) * 2018-12-11 2019-05-21 东软集团股份有限公司 User's portrait method, apparatus, computer readable storage medium and electronic equipment
CN109753993A (en) * 2018-12-11 2019-05-14 东软集团股份有限公司 User's portrait method, apparatus, computer readable storage medium and electronic equipment
CN109753994B (en) * 2018-12-11 2024-05-14 东软集团股份有限公司 User image drawing method, device, computer readable storage medium and electronic equipment
CN109766913A (en) * 2018-12-11 2019-05-17 东软集团股份有限公司 Tenant group method, apparatus, computer readable storage medium and electronic equipment
CN109597858A (en) * 2018-12-14 2019-04-09 拉扎斯网络科技(上海)有限公司 Merchant classification method and device and merchant recommendation method and device
CN109597858B (en) * 2018-12-14 2021-09-14 拉扎斯网络科技(上海)有限公司 Merchant classification method and device and merchant recommendation method and device
CN109754171A (en) * 2018-12-25 2019-05-14 北京三快在线科技有限公司 Task ranking method, device, electronic equipment and storage medium
CN109697637A (en) * 2018-12-27 2019-04-30 拉扎斯网络科技(上海)有限公司 Object type determination method and device, electronic equipment and computer storage medium
CN109731341B (en) * 2018-12-28 2022-07-22 广州方硅信息技术有限公司 Splitting method, device and equipment for associated account
CN109731341A (en) * 2018-12-28 2019-05-10 广州华多网络科技有限公司 A kind of method for splitting of interlock account, device and equipment
CN109872242A (en) * 2019-01-30 2019-06-11 北京字节跳动网络技术有限公司 Information-pushing method and device
CN110083634B (en) * 2019-03-19 2024-02-06 中国平安人寿保险股份有限公司 Order processing method, device, equipment and storage medium based on data analysis
CN110083634A (en) * 2019-03-19 2019-08-02 中国平安人寿保险股份有限公司 Order processing method, apparatus, equipment and storage medium based on data analysis
CN111831894A (en) * 2019-04-23 2020-10-27 北京嘀嘀无限科技发展有限公司 Information matching method and device
CN111861526B (en) * 2019-04-30 2024-05-21 京东城市(南京)科技有限公司 Method and device for analyzing object source
CN111861526A (en) * 2019-04-30 2020-10-30 京东城市(南京)科技有限公司 Method and device for analyzing object source
CN112131484A (en) * 2019-06-25 2020-12-25 北京京东尚科信息技术有限公司 Multi-person session establishing method, device, equipment and storage medium
CN110414613B (en) * 2019-07-31 2021-03-02 京东城市(北京)数字科技有限公司 Method, device and equipment for clustering regions and computer readable storage medium
CN110400024A (en) * 2019-07-31 2019-11-01 京东城市(北京)数字科技有限公司 Method, apparatus, equipment and the computer readable storage medium of order forecasting
CN110414613A (en) * 2019-07-31 2019-11-05 京东城市(北京)数字科技有限公司 Method, apparatus, equipment and the computer readable storage medium of region clustering
CN110727857A (en) * 2019-09-04 2020-01-24 口碑(上海)信息技术有限公司 Method and device for identifying key features of potential users aiming at business objects
CN111062419B (en) * 2019-11-26 2023-06-02 复旦大学 Compression and recovery method for deep learning data set
CN111062419A (en) * 2019-11-26 2020-04-24 复旦大学 Compression and recovery method of deep learning data set
CN111178949A (en) * 2019-12-18 2020-05-19 北京文思海辉金信软件有限公司 Service resource matching reference data determination method, device, equipment and storage medium
CN111178975A (en) * 2019-12-31 2020-05-19 北京顺达同行科技有限公司 Business circle dividing method and device, electronic equipment and storage medium
CN111784356B (en) * 2020-07-22 2023-11-28 支付宝(杭州)信息技术有限公司 Payment verification method, device, equipment and storage medium
CN111784356A (en) * 2020-07-22 2020-10-16 支付宝(杭州)信息技术有限公司 Payment verification method, device, equipment and storage medium
CN112434140B (en) * 2020-11-10 2024-02-09 杭州博联智能科技股份有限公司 Reply information processing method and system
CN112434140A (en) * 2020-11-10 2021-03-02 杭州博联智能科技股份有限公司 Reply information processing method and system
CN112767031A (en) * 2021-01-22 2021-05-07 北京明略昭辉科技有限公司 Takeaway advertisement putting method, system and equipment based on user connection strength
CN112529671A (en) * 2021-02-08 2021-03-19 杭州拼便宜网络科技有限公司 Commodity recommendation method and device, electronic equipment and storage medium
CN113743536A (en) * 2021-09-23 2021-12-03 安徽淘云科技股份有限公司 Self-adaptive height adjusting method of intelligent desk and intelligent desk
CN114581204A (en) * 2022-04-28 2022-06-03 深圳市同征电子商务有限公司 Commodity sales management method and system based on electronic commerce platform

Similar Documents

Publication Publication Date Title
CN107247786A (en) Method, device and server for determining similar users
CN107220852A (en) Method, device and server for determining target recommended user
CN109492772B (en) Method and device for generating information
CN108090162A (en) Information-pushing method and device based on artificial intelligence
CN107105031A (en) Information-pushing method and device
CN106649890A (en) Data storage method and device
CN107944481A (en) Method and apparatus for generating information
CN107332910A (en) Information-pushing method and device
CN107424007A (en) A kind of method and apparatus for building electronic ticket susceptibility identification model
CN107301592A (en) The method and device excavated for commodity substitute
CN107426328A (en) Information-pushing method and device
CN110852785B (en) User grading method, device and computer readable storage medium
CN107346344A (en) The method and apparatus of text matches
CN107911449A (en) Method and apparatus for pushed information
CN109785000A (en) Customer resources distribution method, device, storage medium and terminal
CN107451785A (en) Method and apparatus for output information
CN110909222A (en) User portrait establishing method, device, medium and electronic equipment based on clustering
CN113298568B (en) Method and device for advertising
CN109685537A (en) Analysis method, device, medium and the electronic equipment of user behavior
CN107885784A (en) The method and apparatus for extracting user characteristic data
CN107741967A (en) Method, apparatus and electronic equipment for behavioral data processing
CN108595448A (en) Information-pushing method and device
CN109711733A (en) For generating method, electronic equipment and the computer-readable medium of Clustering Model
CN111626767B (en) Resource data issuing method, device and equipment
CN107977678A (en) Method and apparatus for output information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171013