WO2020258101A1 - 用户相似度计算方法、装置、服务端及存储介质 - Google Patents
用户相似度计算方法、装置、服务端及存储介质 Download PDFInfo
- Publication number
- WO2020258101A1 WO2020258101A1 PCT/CN2019/093109 CN2019093109W WO2020258101A1 WO 2020258101 A1 WO2020258101 A1 WO 2020258101A1 CN 2019093109 W CN2019093109 W CN 2019093109W WO 2020258101 A1 WO2020258101 A1 WO 2020258101A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- similarity
- ids
- target
- user ids
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
Definitions
- This application relates to the field of communication technology, and in particular to a method, device, server and storage medium for calculating user similarity.
- the embodiments of the present application provide a user similarity calculation method, device, server, and storage medium, which can reduce the complexity of user similarity calculation.
- an embodiment of the present application provides a method for calculating user similarity, including:
- At least two user features of the first user ID are extracted to obtain a feature set of the first user ID, the feature set includes at least the at least two user features, and the first user ID is one of the N user IDs Any one, N is an integer greater than or equal to 2;
- an embodiment of the present application provides a user similarity calculation device.
- the user similarity calculation device includes a detection unit, an acquisition unit, and a processing unit, wherein:
- the extraction unit is configured to extract at least two user characteristics of the first user ID to obtain a characteristic set of the first user ID, the characteristic set including at least the at least two user characteristics, and the first user ID Is any one of N user IDs, and N is an integer greater than or equal to 2;
- the selection unit is used to select a target hash function
- the calculation unit is configured to calculate the similarity between the target user characteristics of the N user IDs by using the target hash function to obtain the initial similarity between the N user IDs;
- the classification unit is configured to divide the N user IDs into M hash buckets according to the initial similarity between the N user IDs, where M is an integer greater than or equal to 2;
- the calculation unit is further configured to calculate the similarity between any two user IDs in the first hash bucket, where the first hash bucket is any one of the M hash buckets.
- an embodiment of the present application provides a server, including a processor and a memory, the memory is used to store one or more programs, and the one or more programs are configured to be executed by the processor.
- the program includes instructions for executing the steps in the first aspect of the embodiments of the present application.
- an embodiment of the present application provides a computer-readable storage medium, wherein the foregoing computer-readable storage medium stores a computer program for electronic data exchange, wherein the foregoing computer program enables a computer to execute Some or all of the steps described in one aspect.
- embodiments of the present application provide a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute Example part or all of the steps described in the first aspect.
- the computer program product may be a software installation package.
- the user similarity calculation method described in the embodiment of this application specifically includes the following steps: extract at least two user characteristics of a first user ID to obtain a feature set of the first user ID, and the feature set At least the at least two user characteristics are included, the first user ID is any one of N user IDs, and N is an integer greater than or equal to 2; the target hash function is selected, and the target hash function is used to calculate the The similarity between the target user characteristics of the N user IDs is used to obtain the initial similarity between the N user IDs; the N users are calculated according to the initial similarity between the N user IDs The ID is divided into M hash buckets, where M is an integer greater than or equal to 2; the similarity between any two user IDs in the first hash bucket is calculated, and the first hash bucket is the M hash buckets.
- an appropriate hash function can be selected to divide the N user IDs into M hash buckets according to the initial similarity, and only the user IDs in each hash bucket Perform similarity calculation, which avoids calculating the similarity between the user ID and all other user IDs, and can reduce the calculation amount of the user ID similarity calculation, thereby reducing the complexity of the user similarity calculation and improving the user similarity calculation speed.
- FIG. 1 is a schematic flowchart of a method for calculating user similarity disclosed in an embodiment of the present application
- FIG. 2 is a schematic flowchart of another user similarity calculation method disclosed in an embodiment of the present application.
- FIG. 3 is a schematic structural diagram of a user similarity calculation device disclosed in an embodiment of the present application.
- Fig. 4 is a schematic structural diagram of a server disclosed in an embodiment of the present application.
- the mobile terminals involved in the embodiments of this application may include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to wireless modems, as well as various forms of user equipment (User Equipment, UE), mobile station (Mobile Station, MS), terminal device (terminal device), etc.
- UE User Equipment
- MS Mobile Station
- terminal device terminal device
- FIG. 1 is a schematic flowchart of a user similarity calculation method disclosed in an embodiment of the present application. As shown in FIG. 1, the user similarity calculation method includes the following steps.
- the server extracts at least two user features of the first user ID to obtain a feature set of the first user ID.
- the feature set includes at least two user features.
- the first user ID is any one of N user IDs, and N Is an integer greater than or equal to 2.
- the server serves the client, and the content of the service includes providing resources to the client and storing client data.
- the server is a targeted service program, and the device running the server can be called a server.
- the server can establish connections with multiple clients at the same time, and can provide services to multiple clients at the same time.
- the service provided by the server for the client in the embodiment of the present application may include a content push service.
- the content push service may include: browser content push service, application download push service, game content push service, etc.
- the server can include application server, browser server, game server, etc.
- User ID can include any one or more of the following types: single sign on identity (SSOID), OpenID, integrated circuit card identity (ICCID), international mobile device identity (International Mobile) Equipment Identity, IMEI), telephone number (telephone, TEL), Globally Unique Identifier (GUID), etc.
- SSO is in multiple application systems. Users only need to log in once to access all mutually trusted application systems.
- the server extracts at least two user characteristics of the first user ID based on the historical user behavior data of the first user ID.
- User behavior data may include: device characteristics, positioning characteristics, and application (Application, APP) characteristics.
- Device characteristics can include the model of the device, the identification of the device, the media access control address (MAC address) of the device, and the usage habits of the device (for example, the backlight brightness of the device, the volume of the device, and the holding of the device Posture, average use time of the device, power on time of the device, shutdown time of the device, etc.).
- the positioning feature may include Global Positioning System (GPS) positioning information (for example, latitude and longitude information), location-based service (LBS) location trajectory, etc.
- Application features can include application setting parameters (for example, application brightness, application volume, application refresh frequency), application opening time, application closing time, application function usage, application Continuous running time, cumulative application running time, application installation data, application uninstallation data, etc.
- the server can extract device features, location features, and APP features from the user behavior data of the first user ID, and compose the device features, location features, and APP features of the first user ID into a feature set of the first user ID.
- the server selects a target hash function, uses the target hash function to calculate the similarity between the target user characteristics of the N user IDs, and obtains the initial similarity between the N user IDs.
- the server may select the target hash function according to at least two user characteristics included in the characteristic set of the first user ID.
- the server selects the target hash function, specifically:
- the server determines the types of the at least two user characteristics
- the server determines the target hash function corresponding to the at least two user characteristics according to the correspondence between the type and the hash function.
- the types of at least two user characteristics may include a first type and a second type.
- the first type may include a large data volume feature type
- the second type may further include a small data volume feature type.
- the large data volume feature type refers to the type with a large amount of user feature data
- the small data volume feature type refers to the type with a small amount of user feature data.
- the server can determine the type of user feature according to the number of bytes of data contained in the user feature.
- the server can determine that the user feature whose number of bytes of data contained in the user feature is greater than the preset number of bytes is a large data feature type, and determine that the number of bytes of data contained in the user feature is less than or equal to the preset byte
- the number of user features is a small data volume feature type.
- the wireless MAC address in the user feature is a feature type of large data volume
- the latitude and longitude information in the user feature is a feature type of small data volume.
- the server uses the target hash function to calculate the similarity between the target user characteristics of the N user IDs, which may specifically be:
- the server uses a Hamming distance calculation formula to calculate the similarity between the target user characteristics of the N user IDs.
- the server uses the Hamming distance calculation formula to calculate the similarity between the target user characteristics of the N user IDs.
- the first target vector of the target user characteristic of the first user ID is acquired, and the target user characteristic of the second user ID is acquired.
- the second target vector of compare whether each bit corresponding to the first target vector and the second target vector is the same. If they are the same, it indicates that the Hamming distance corresponding to the bit is 0. If they are different, it indicates that the Hamming distance corresponding to the bit is 0.
- the distance is 1, the Hamming distances of all bits are added to obtain the final Hamming distance between the first target vector and the second target vector.
- the first user ID and the second user ID are two different user IDs among the N user IDs.
- the target user characteristic of the first user ID is the first wireless MAC address
- the target user characteristic of the second user ID is the second wireless MAC address.
- the first target vector and the second target vector are both 10 bits. Whether each bit of the first target vector is the same as each bit of the second target vector, if they are the same, it means that the Hamming distance corresponding to the bit is 0, and if they are different, it means that the Hamming distance corresponding to the bit is 1.
- the Hamming distances of all bits are added to obtain the final Hamming distance between the first target vector and the second target vector, and the final Hamming distance is between 0-10.
- the Hamming distance is 0 to 3, it is considered that the similarity between the target user feature of the first user ID and the target user feature of the second user ID is greater than the first preset similarity threshold, and the first user ID and the second user ID Put the user ID in the same hash bucket; the Hamming distance is 4-10, it is considered that the similarity between the target user feature of the first user ID and the target user feature of the second user ID is less than or equal to the first preset similarity Threshold, it is determined that the first user ID and the second user ID do not belong to the same hash bucket.
- the server uses the target hash function to calculate the similarity between the target user characteristics of the N user IDs, which may specifically be:
- the server uses the Euclidean distance calculation formula to calculate the similarity between the target user characteristics of the N user IDs.
- the server uses the Euclidean distance calculation formula to calculate the similarity between the target user features of the N user IDs.
- the server obtains the longitude and latitude parameters of the first user ID when calculating the similarity between the target user feature of the first user ID and the target user feature of the second user ID, and obtains the first user ID.
- Longitude and latitude parameters of the user ID are two different user IDs among the N user IDs.
- the server when calculating the similarity between the target user characteristics of the first user ID and the target user characteristics of the second user ID, the server can obtain the longitude and latitude information of the first user ID (the longitude parameter is x 1 , the latitude parameter y 1 ), the longitude and latitude information of the second user ID (the longitude parameter is x 2 , the latitude parameter y 2 ), the server can use the following Euclidean distance calculation formula to calculate the target user characteristics of the first user ID and the target user characteristics of the second user ID Similarity:
- d is less than or equal to the preset threshold, it indicates that the similarity between the target user feature of the first user ID and the target user feature of the second user ID is greater than the first preset similarity threshold, and the first user ID and the second user ID are put Into the same hash bucket. If d is greater than the preset threshold, it indicates that the similarity between the target user feature of the first user ID and the target user feature of the second user ID is less than or equal to the first preset similarity threshold, then the first user ID and the second user ID are determined Do not belong to the same hash bucket.
- the server divides the N user IDs into M hash buckets according to the initial similarity between the N user IDs, where M is an integer greater than or equal to 2.
- the greater the initial similarity between any two user IDs the greater the probability that any two user IDs will be classified into the same hash bucket; any two user IDs The smaller the initial similarity between the two, the less likely that any two user IDs will be classified into the same hash bucket.
- step 103 may specifically include the following steps:
- the server determines whether there is a user ID whose initial similarity with the first user ID is greater than a first preset similarity threshold among the N user IDs;
- the server classifies the user IDs of the first user ID and the N user IDs whose initial similarity with the first user ID is greater than the first preset similarity threshold into the same Ha Hope in the barrel.
- the server can randomly select a user ID from N user IDs, such as the first user ID, and compare the initial similarity between the first user ID and other user IDs among the N user IDs, and serve
- the terminal determines whether there is a user ID whose initial similarity with the first user ID is greater than a first preset similarity threshold among the N user IDs, and if it exists, the server sends the first user ID, the N Among the user IDs, the user IDs whose initial similarity with the first user ID is greater than the first preset similarity threshold are divided into the same hash bucket (for example, the first hash bucket).
- the server randomly selects a user ID from the N user IDs except the user ID to be allocated in the first hash bucket, such as the second user ID, and compares the second user ID with the other user IDs to be allocated.
- the initial similarity of the user ID The server determines whether there is a user ID whose initial similarity with the second user ID is greater than the first preset similarity threshold among the user IDs to be allocated. If there is, the server will The second user ID and the user IDs whose initial similarity with the second user ID among the user IDs to be allocated are greater than the first preset similarity threshold are divided into the same hash bucket (for example, the second hash bucket) , And so on, until the N user IDs are divided into M hash buckets.
- a user ID can only be divided into one of M hash buckets, and the number of all user IDs in M hash buckets is equal to N.
- the number of user IDs in each hash bucket can be the same or different.
- the server calculates the similarity between any two user IDs in the first hash bucket, where the first hash bucket is any one of the M hash buckets.
- the server can obtain any two user IDs in the first hash bucket, such as the first user ID and the second user ID.
- the server obtains at least two user characteristics of the first user ID and at least two user characteristics of the second user ID, and calculates the first based on at least two user characteristics of the first user ID and at least two user characteristics of the second user ID.
- the similarity between the user ID and the second user ID In the embodiment of this application, it is only necessary to calculate the similarity of user IDs in the same hash bucket.
- the calculation amount of similarity calculation between user IDs in each hash bucket is much smaller, which can reduce the calculation amount of user ID similarity calculation, thereby reducing the complexity of user similarity calculation and improving The speed of user similarity calculation.
- step 104 may also include the following steps:
- the server obtains the feature set of each user ID in the first hash bucket
- the server determines the feature vector of each user ID in the first hash bucket based on the feature set of each user ID in the first hash bucket;
- the server uses the Hamming distance calculation formula to calculate the distance between the feature vectors of any two user IDs in the first hash bucket based on the feature vector of each user ID in the first hash bucket;
- the server uses the similarity between any two user IDs in the first hash bucket according to the distance between the feature vectors of any two user IDs in the first hash bucket.
- the Hamming distance calculation formula can be used to greatly simplify the calculation amount of similarity calculation and increase the speed of similarity calculation.
- an appropriate hash function can be selected to divide the N user IDs into M hash buckets according to the initial similarity, and only the user IDs in each hash bucket Perform similarity calculation, which avoids calculating the similarity between the user ID and all other user IDs, and can reduce the calculation amount of the user ID similarity calculation, thereby reducing the complexity of the user similarity calculation and improving the user similarity calculation speed.
- FIG. 2 is a flowchart of another user similarity calculation method disclosed in an embodiment of the present application.
- FIG. 2 is further optimized on the basis of FIG. 1.
- the user similarity is The calculation method includes the following steps.
- the server extracts at least two user features of the first user ID to obtain a feature set of the first user ID.
- the feature set includes at least two user features.
- the first user ID is any one of N user IDs, and N Is an integer greater than or equal to 2.
- the server selects a target hash function, uses the target hash function to calculate the similarity between the target user characteristics of the N user IDs, and obtains the initial similarity between the N user IDs.
- the server divides the N user IDs into M hash buckets according to the initial similarity between the N user IDs, where M is an integer greater than or equal to 2.
- the server calculates the similarity between any two user IDs in the first hash bucket, where the first hash bucket is any one of the M hash buckets.
- step 201 to step 204 in the embodiment of the present application can refer to step 101 to step 204 shown in FIG. 1, which will not be repeated here.
- the server determines whether there are P user IDs with mutual similarity greater than a second preset similarity threshold in the first hash bucket, and P is an integer greater than or equal to 2.
- the second preset similarity threshold may be preset and stored in a memory (for example, a non-volatile memory) of the server.
- the server establishes a correspondence relationship between P user IDs and target natural person IDs.
- the server establishes the corresponding relationship between the user ID and the target natural person ID, and can associate multiple user IDs with one natural person.
- the natural person ID in the embodiment of this application corresponds to a natural person.
- This natural person may correspond to a mobile terminal (for example, a mobile phone), at least one phone number, at least one application account, at least one OpenID, one SSOID, at least one ICCID, and at least one IMEI.
- a mobile terminal for example, a mobile phone
- the IMEI, phone number, and 5 application accounts of the mobile phone are labeled with a natural person ID.
- the user behavior data corresponding to these 5 application accounts all belong to the user behavior data of this natural person ID.
- a real natural person can have many user IDs (for example, the IMEI of a mobile phone, a phone number, and 5 application accounts), but only one unique natural person ID is corresponding.
- the specific presentation form of the natural person ID can be a string of characters.
- the natural person ID may correspond to an identification of a mobile terminal.
- the server can establish a correspondence table of the user ID and the natural person ID after constructing the correspondence between the user ID and the target natural person ID.
- one natural person ID can correspond to multiple user IDs.
- the server can push content (for example, various types of push messages) to the natural person ID.
- content for example, various types of push messages
- the server can push the content to the mobile terminal corresponding to the natural person ID without sending the pushed content to the application account separately, thereby improving the push efficiency.
- the server After the server establishes the corresponding relationship between the user ID and the target natural person ID, the corresponding relationship between the user ID and the target natural person ID can be stored in the server's database.
- the server can analyze the user behavior data of the newly registered user ID, analyze the user behavior data of the newly registered user ID and the user behavior data of all natural person IDs that have been stored, if the above has been stored.
- the similarity of the natural person ID with the greatest similarity to the newly registered user ID among all the natural person IDs is greater than the preset similarity threshold, and the corresponding relationship between the natural person ID with the greatest similarity and the newly registered user ID is established.
- the server includes hardware structures and/or software modules corresponding to each function.
- the present invention can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.
- the embodiment of the present application may divide the server side into functional units according to the foregoing method examples.
- each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
- FIG. 3 is a schematic structural diagram of a user similarity calculation device disclosed in an embodiment of the present application.
- the user similarity calculation device 300 includes an extraction unit 301, a selection unit 302, a calculation unit 303, and a classification unit 304, wherein:
- the extraction unit 301 is configured to extract at least two user characteristics of a first user ID to obtain a characteristic set of the first user ID, and the characteristic set includes at least the at least two user characteristics, and the first user ID is any one of N user IDs, and N is an integer greater than or equal to 2;
- the selecting unit 302 is used to select a target hash function
- the calculation unit 303 is configured to calculate the similarity between the target user characteristics of the N user IDs by using the target hash function to obtain the initial similarity between the N user IDs;
- the classification unit 304 is configured to divide the N user IDs into M hash buckets according to the initial similarity between the N user IDs, where M is an integer greater than or equal to 2;
- the calculation unit 303 is further configured to calculate the similarity between any two user IDs in the first hash bucket, where the first hash bucket is any one of the M hash buckets.
- the selecting unit 302 selects the target hash function, specifically: determining the type of the at least two user characteristics; determining the target corresponding to the at least two user characteristics according to the correspondence between the type and the hash function Hash function.
- the calculation unit 303 uses the target hash function to calculate the similarity between the target user characteristics of the N user IDs, specifically: if the types of the at least two user characteristics are the first type Calculate the similarity between the target user characteristics of the N user IDs using a Hamming distance calculation formula.
- the calculating unit 303 uses the target hash function to calculate the similarity between the target user characteristics of the N user IDs, specifically: if the type of the at least two user characteristics is the second type , Using the Euclidean distance calculation formula to calculate the similarity between the target user features of the N user IDs.
- the classification unit 304 divides the N user IDs into M hash buckets according to the initial similarity between the N user IDs, specifically: determining the N user IDs Whether there is a user ID whose initial similarity with the first user ID is greater than a first preset similarity threshold; if it exists, compare the first user ID and the N user IDs with the first user User IDs whose initial similarity of IDs are greater than the first preset similarity threshold are divided into the same hash bucket.
- the calculation unit 303 calculates the similarity between any two user IDs in the first hash bucket, specifically: acquiring a feature set of each user ID in the first hash bucket; The feature set of each user ID in the first hash bucket determines the feature vector of each user ID in the first hash bucket; based on the feature vector of each user ID in the first hash bucket, Hamming
- the distance calculation formula calculates the distance between the feature vectors of any two user IDs in the first hash bucket; according to the distance between the feature vectors of any two user IDs in the first hash bucket, the first The similarity between any two user IDs in the hash bucket.
- the user similarity calculation device 300 includes a determining unit 305 and a establishing unit 306.
- the determining unit 305 is configured to calculate the similarity between any two user IDs in the first hash bucket in the calculating unit 303, and the first hash bucket is any of the M hash buckets. After one, determine whether there are P user IDs whose mutual similarity is greater than a second preset similarity threshold in the first hash bucket, where P is an integer greater than or equal to 2;
- the establishing unit 306 is configured to, when the determining unit 305 determines that there are P user IDs whose mutual similarity is greater than a second preset similarity threshold, in the first hash bucket, establish the Correspondence between P user IDs and target natural person IDs.
- the extraction unit 301, the selection unit 302, the calculation unit 303, the classification unit 304, the determination unit 305, and the establishment unit 306 in FIG. 3 may be processors.
- an appropriate hash function can be selected to divide the N user IDs into M hash buckets according to the initial similarity, and only for each hash It is hoped that the user ID in the bucket is calculated for similarity, thereby avoiding the calculation of similarity between the user ID and all other user IDs, which can reduce the calculation amount of the similarity calculation of the user ID, thereby reducing the complexity of the user similarity calculation , Improve the speed of user similarity calculation.
- FIG. 4 is a schematic structural diagram of a server disclosed in an embodiment of the present application.
- the server 400 includes a processor 401 and a memory 402.
- the server 400 may also include a bus 403.
- the processor 401 and the memory 402 may be connected to each other through the bus 403.
- the bus 403 may be a peripheral component. Connect the standard (Peripheral Component Interconnect, referred to as PCI) bus or extended industry standard architecture (Extended Industry Standard Architecture, referred to as EISA) bus, etc.
- PCI Peripheral Component Interconnect
- EISA Extended Industry Standard Architecture
- the bus 403 can be divided into an address bus, a data bus, a control bus, and so on. For ease of presentation, only one thick line is used to represent in FIG.
- the server 400 may further include a communication interface 404, and the server 400 may communicate with the client through the communication interface 404.
- the memory 402 is used to store one or more programs containing instructions; the processor 401 is used to call the instructions stored in the memory 402 to execute some or all of the method steps in FIGS. 1 to 2.
- a suitable hash function can be selected to divide N user IDs into M hash buckets according to the initial similarity, and only for each hash bucket.
- the similarity calculation of the user ID of the user ID thereby avoiding the calculation of the similarity between the user ID and all other user IDs, can reduce the calculation amount of the similarity calculation of the user ID, thereby reducing the complexity of the user similarity calculation and improving the user The speed of similarity calculation.
- An embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables the computer to execute any of the user similarity calculation methods described in the above method embodiments Some or all of the steps.
- the embodiments of the present application also provide a computer program product.
- the computer program product includes a non-transitory computer-readable storage medium storing a computer program.
- the computer program is operable to cause a computer to execute any of the methods described in the foregoing method embodiments. Part or all of the steps of a user similarity calculation method.
- the disclosed device may be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable memory.
- the technical solution of the present invention essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present invention.
- the aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other various media that can store program codes.
- the program can be stored in a computer-readable memory, and the memory can include: flash disk , Read-only memory (English: Read-Only Memory, abbreviation: ROM), random access device (English: Random Access Memory, abbreviation: RAM), magnetic disk or optical disc, etc.
Abstract
Description
Claims (10)
- 一种用户相似度计算方法,其特征在于,包括:提取第一用户ID的至少两个用户特征,得到所述第一用户ID的特征集,所述特征集至少包括所述至少两个用户特征,所述第一用户ID为N个用户ID中的任一个,N为大于或等于2的整数;选取目标哈希函数,采用所述目标哈希函数计算所述N个用户ID的目标用户特征之间的相似度,得到所述N个用户ID之间的初始相似度;依据所述N个用户ID之间的初始相似度的大小将所述N个用户ID划分到M个哈希桶中,M为大于或等于2的整数;计算第一哈希桶中任意两个用户ID之间的相似度,所述第一哈希桶为所述M个哈希桶中的任一个。
- 根据权利要求1所述的方法,其特征在于,所述选取目标哈希函数,包括:确定所述至少两个用户特征的类型;根据类型与哈希函数的对应关系确定与所述至少两个用户特征对应的目标哈希函数。
- 根据权利要求2所述的方法,其特征在于,所述采用所述目标哈希函数计算所述N个用户ID的目标用户特征之间的相似度,包括:若所述至少两个用户特征的类型为第一类型,采用汉明距离计算公式计算所述N个用户ID的目标用户特征之间的相似度。
- 根据权利要求2所述的方法,其特征在于,所述采用所述目标哈希函数计算所述N个用户ID的目标用户特征之间的相似度,包括:若所述至少两个用户特征的类型为第二类型,采用欧式距离计算公式计算所述N个用户ID的目标用户特征之间的相似度。
- 根据权利要求1所述的方法,其特征在于,所述依据所述N个用户ID之间的初始相似度的大小将所述N个用户ID划分到M个哈希桶中,包括:确定所述N个用户ID中是否存在与所述第一用户ID的初始相似度大于第一预设相似度阈值的用户ID;若存在,将所述第一用户ID、所述N个用户ID中与所述第一用户ID的初始相似度大于第一预设相似度阈值的用户ID划分到同一个哈希桶中。
- 根据权利要求1所述的方法,其特征在于,所述计算第一哈希桶中任意两个用户ID之间的相似度,包括:获取所述第一哈希桶中每个用户ID的特征集;基于所述第一哈希桶中每个用户ID的特征集确定所述第一哈希桶中每个用户ID的特征向量;基于所述第一哈希桶中每个用户ID的特征向量,采用汉明距离计算公式计算所述第一哈希桶中任意两个用户ID的特征向量之间的距离;根据所述第一哈希桶中任意两个用户ID的特征向量之间的距离所述第一哈希桶中任意两个用户ID之间的相似度。
- 根据权利要求1~6任一项所述的方法,其特征在于,所述计算第一哈希桶中任意两个用户ID之间的相似度,所述第一哈希桶为所述M个哈希桶中的任一个之后,所述方法还包括:确定所述第一哈希桶中是否存在相互之间的相似度大于第二预设相似度阈值的P个用户ID,P为大于或等于2的整数;若存在,建立所述P个用户ID与目标自然人ID的对应关系。
- 一种用户相似度计算装置,其特征在于,所述用户相似度计算装置包括提取单元、选取单元、计算单元和分类单元,其中:所述提取单元,用于提取第一用户ID的至少两个用户特征,得到所述第一用户ID的特征集,所述特征集至少包括所述至少两个用户特征,所述第一用户ID为N个用户ID中的任一个,N为大于或等于2的整数;所述选取单元,用于选取目标哈希函数;所述计算单元,用于采用所述目标哈希函数计算所述N个用户ID的目标用户特征之间的相似度,得到所述N个用户ID之间的初始相似度;所述分类单元,用于依据所述N个用户ID之间的初始相似度的大小将所述N个用户ID划分到M个哈希桶中,M为大于或等于2的整数;所述计算单元,还用于计算第一哈希桶中任意两个用户ID之间的相似度,所述第一哈希桶为所述M个哈希桶中的任一个。
- 一种服务端,其特征在于,包括处理器以及存储器,所述存储器用于存储一个或多个程序,所述一个或多个程序被配置成由所述处理器执行,所述程序包括用于执行如权利要求1~7任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如权利要求1~7任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/093109 WO2020258101A1 (zh) | 2019-06-26 | 2019-06-26 | 用户相似度计算方法、装置、服务端及存储介质 |
CN201980091291.0A CN113383314B (zh) | 2019-06-26 | 2019-06-26 | 用户相似度计算方法、装置、服务端及存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2019/093109 WO2020258101A1 (zh) | 2019-06-26 | 2019-06-26 | 用户相似度计算方法、装置、服务端及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020258101A1 true WO2020258101A1 (zh) | 2020-12-30 |
Family
ID=74061169
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/093109 WO2020258101A1 (zh) | 2019-06-26 | 2019-06-26 | 用户相似度计算方法、装置、服务端及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113383314B (zh) |
WO (1) | WO2020258101A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117061254A (zh) * | 2023-10-12 | 2023-11-14 | 之江实验室 | 异常流量检测方法、装置和计算机设备 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8515964B2 (en) * | 2011-07-25 | 2013-08-20 | Yahoo! Inc. | Method and system for fast similarity computation in high dimensional space |
CN105608219A (zh) * | 2016-01-07 | 2016-05-25 | 上海通创信息技术有限公司 | 一种基于聚类的流式推荐引擎、推荐系统以及推荐方法 |
CN109255640A (zh) * | 2017-07-13 | 2019-01-22 | 阿里健康信息技术有限公司 | 一种确定用户分组的方法、装置及系统 |
CN109815406A (zh) * | 2019-01-31 | 2019-05-28 | 腾讯科技(深圳)有限公司 | 一种数据处理、信息推荐方法及装置 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102622366B (zh) * | 2011-01-28 | 2014-07-30 | 阿里巴巴集团控股有限公司 | 相似图像的识别方法和装置 |
WO2013178286A1 (en) * | 2012-06-01 | 2013-12-05 | Qatar Foundation | A method for processing a large-scale data set, and associated apparatus |
CN106570141B (zh) * | 2016-11-04 | 2020-05-19 | 中国科学院自动化研究所 | 近似重复图像检测方法 |
CN109697641A (zh) * | 2017-10-20 | 2019-04-30 | 北京京东尚科信息技术有限公司 | 计算商品相似度的方法和装置 |
CN109800325B (zh) * | 2018-12-26 | 2021-10-26 | 北京达佳互联信息技术有限公司 | 视频推荐方法、装置和计算机可读存储介质 |
CN109558512B (zh) * | 2019-01-24 | 2020-07-14 | 广州荔支网络技术有限公司 | 一种基于音频的个性化推荐方法、装置和移动终端 |
-
2019
- 2019-06-26 WO PCT/CN2019/093109 patent/WO2020258101A1/zh active Application Filing
- 2019-06-26 CN CN201980091291.0A patent/CN113383314B/zh active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8515964B2 (en) * | 2011-07-25 | 2013-08-20 | Yahoo! Inc. | Method and system for fast similarity computation in high dimensional space |
CN105608219A (zh) * | 2016-01-07 | 2016-05-25 | 上海通创信息技术有限公司 | 一种基于聚类的流式推荐引擎、推荐系统以及推荐方法 |
CN109255640A (zh) * | 2017-07-13 | 2019-01-22 | 阿里健康信息技术有限公司 | 一种确定用户分组的方法、装置及系统 |
CN109815406A (zh) * | 2019-01-31 | 2019-05-28 | 腾讯科技(深圳)有限公司 | 一种数据处理、信息推荐方法及装置 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117061254A (zh) * | 2023-10-12 | 2023-11-14 | 之江实验室 | 异常流量检测方法、装置和计算机设备 |
CN117061254B (zh) * | 2023-10-12 | 2024-01-23 | 之江实验室 | 异常流量检测方法、装置和计算机设备 |
Also Published As
Publication number | Publication date |
---|---|
CN113383314B (zh) | 2023-01-10 |
CN113383314A (zh) | 2021-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10897685B2 (en) | Matching users in a location-based service | |
WO2020257993A1 (zh) | 内容推送方法、装置、服务端及存储介质 | |
TWI659300B (zh) | 一種設備標識提供方法及裝置 | |
CN109213781B (zh) | 风控数据查询方法及装置 | |
WO2019042180A1 (zh) | 资源配置方法及相关产品 | |
WO2018149137A1 (zh) | 无线保真Wi-Fi连接方法及相关产品 | |
WO2014180145A1 (en) | Methods and systems for connecting a mobile device to a network | |
WO2023020187A1 (zh) | 数据获取方法、装置、电子设备及存储介质 | |
CN109858250A (zh) | 一种基于级联分类器的安卓恶意代码检测模型方法 | |
WO2020252639A1 (zh) | 内容推送方法及相关产品 | |
CN109121157B (zh) | 一种网络限速确定方法及终端、存储介质 | |
WO2020258101A1 (zh) | 用户相似度计算方法、装置、服务端及存储介质 | |
CN111405007B (zh) | Tcp会话管理方法、装置、存储介质及电子设备 | |
US11323873B2 (en) | Method for wireless fidelity connection and related products | |
US9490914B2 (en) | Electronic device and its wireless network communication method | |
WO2020019524A1 (zh) | 数据处理方法及装置 | |
CN111885664B (zh) | 用户设备路由选择方法及相关产品 | |
CN106612262B (zh) | 用于建立pcc会话的方法、装置以及系统 | |
CN113383360B (zh) | 内容推送方法、装置、服务端及存储介质 | |
CN109547317B (zh) | 连接隧道的建立方法及装置 | |
CN108028854A (zh) | 一种数据传输方法以及宿主机 | |
CN114071455A (zh) | 免密认证方法、服务器和系统、网关设备 | |
WO2016058388A1 (zh) | 一种短消息发送方法、短消息中心及存储介质 | |
CN117640363B (zh) | 微服务配置与管控方法和系统 | |
CN114793234B (zh) | 消息处理方法、装置、设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19935166 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19935166 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.05.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19935166 Country of ref document: EP Kind code of ref document: A1 |