WO2022041600A1 - 一种对象间相似性的确定方法及装置 - Google Patents

一种对象间相似性的确定方法及装置 Download PDF

Info

Publication number
WO2022041600A1
WO2022041600A1 PCT/CN2020/139531 CN2020139531W WO2022041600A1 WO 2022041600 A1 WO2022041600 A1 WO 2022041600A1 CN 2020139531 W CN2020139531 W CN 2020139531W WO 2022041600 A1 WO2022041600 A1 WO 2022041600A1
Authority
WO
WIPO (PCT)
Prior art keywords
attribute
association
dimension
network
nodes
Prior art date
Application number
PCT/CN2020/139531
Other languages
English (en)
French (fr)
Inventor
刘红宝
郑建宾
高鹏飞
贡钟瑞
孙权
孙郯
王臻
陈玥如
陈滢
Original Assignee
中国银联股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国银联股份有限公司 filed Critical 中国银联股份有限公司
Priority to KR1020237009620A priority Critical patent/KR20230054438A/ko
Publication of WO2022041600A1 publication Critical patent/WO2022041600A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design

Definitions

  • the invention relates to the field of similarity analysis, in particular to a method and device for determining similarity between objects.
  • the present invention provides a method and device for determining the similarity between objects, which solves the problem that there is no method for evaluating the similarity of different objects in the prior art.
  • the present invention provides a method for determining similarity between objects, comprising: for at least one dimension attribute of any attribute group in a plurality of attribute groups, for the at least one dimension attribute of an evaluation object, generating a plurality of evaluations The attribute association network of the object under the at least one dimension attribute; wherein, each evaluation object has a uniquely mapped node in the attribute association network; the edge information between the nodes in the attribute association network represents the The degree of association between the evaluation objects under the at least one dimension attribute; for any two attribute association networks, the two attribute association networks are fused to obtain a fusion association network; each evaluation object is in the fusion association network.
  • the edge information between the nodes in the fusion association network represents the comprehensive correlation degree between the evaluation objects; random traversal of the nodes of each fusion association network, to obtain a plurality of nodes of the fusion association network sequence; according to the multiple node sequences, determine the similarity of any two evaluation objects in the multiple evaluation objects under the fusion association network.
  • an attribute association network of a plurality of evaluation objects under the at least one dimension attribute is generated, the degree of association between the plurality of evaluation objects is represented in the attribute association network, and then the two Integrate each attribute association network to obtain a fusion association network, so as to fully characterize the comprehensive association degree between multiple evaluation objects for regional features, and further traverse the nodes of the fusion association network randomly to obtain the multi-dimensional association network of the fusion association network.
  • a node sequence is obtained, thereby determining the similarity of any two evaluation objects in the plurality of evaluation objects under the fusion association network, thereby providing a method for determining the similarity between objects.
  • generating an attribute association network of multiple evaluation objects under the at least one dimension attribute for at least one dimension attribute of the evaluation object includes: for any two evaluation objects in the plurality of evaluation objects, according to The attribute values of the two evaluation objects under the dimension attribute determine the edge information between the corresponding two nodes of the two evaluation objects in the attribute association network; according to the multiple evaluation objects in the attribute Corresponding to the edge information between each node in the association network, an attribute association network of the plurality of evaluation objects for the dimension attribute is generated.
  • the attribute values of the two evaluation objects under the dimension attribute further determine the edge information between the two evaluation objects corresponding to the two nodes in the attribute association network, and generate an attribute association network,
  • a method for generating an attribute association network under the same attribute value of the dimension attribute is provided, thereby increasing the flexibility of generating the attribute association network of the plurality of evaluation objects for the dimension attribute.
  • the evaluation object is an area; the dimension attribute includes the time series location attribute of the user in the area; the attribute value under the time series location attribute of the user in the area includes: a user identifier; The attribute value of the object under the dimension attribute, and determining the edge information between the two evaluation objects in the attribute association network corresponding to the two nodes, including: according to the user identifiers in the two areas, determining the The side information of the two regions.
  • the evaluation object is an area
  • the time series location attribute can represent the relevance of the area in time series
  • the user identifiers in the two areas can be more accurately determined. side information method.
  • the at least one dimension attribute includes a first type attribute dimension and a second type attribute dimension; the first type attribute dimension and the second type attribute dimension are preset associated attribute dimensions; the At least one dimension attribute of the object, generating an attribute association network of multiple evaluation objects under the at least one dimension attribute, including: for the first evaluation object and the second evaluation object in the plurality of evaluation objects, according to the first evaluation object and the second evaluation object The attribute value of an evaluation object in the attribute dimension of the first type and the attribute value of the second evaluation object in the attribute dimension of the second type, it is determined that the two evaluation objects correspond to two nodes in the attribute association network side information, or, according to the attribute value of the first evaluation object in the second type of attribute dimension and the attribute value of the second evaluation object in the first type of attribute dimension, determine the two The evaluation object corresponds to the edge information between two nodes in the attribute association network; according to the edge information of the multiple evaluation objects corresponding to each node in the attribute association network, the multiple evaluation objects are generated for all the evaluation objects.
  • the attribute values of different types of attribute dimensions of the first evaluation object and the second evaluation object can be used to determine the
  • the attribute association network corresponds to edge information between two nodes, and then generates an attribute association network, thereby providing an attribute association network generation method for different categories of attribute dimensions.
  • randomly traversing the nodes of each fusion association network to obtain a plurality of node sequences includes: determining, according to edge information between each node in the fusion association network, the number of nodes in the fusion association network. Random walk probability; based on the random walk probability among the nodes in the fusion association network, randomly traverse the nodes of the fusion association network to obtain the plurality of node sequences.
  • the random walk probability between each node is determined according to the edge information between the nodes in the fusion association network, so that the random walk probability between the nodes is considered, and the fusion association network is determined.
  • the nodes are randomly traversed to obtain the plurality of node sequences more accurately.
  • the edge information between the nodes in the attribute association network is the attribute association weight value between the nodes;
  • the edge information between the nodes in the fusion association network is the comprehensive association weight value between the nodes;
  • the Fusion of the two attribute association networks to obtain a fusion association network includes: for any two nodes in the fusion association network, according to the attribute association weight values of the two nodes in the two attribute association networks and a weighting coefficient to determine the comprehensive correlation weight value of the two nodes in the fusion correlation network; based on the comprehensive correlation weight value between each node in the fusion correlation network, generate the characteristic correlation network of the multiple evaluation objects .
  • the edge information between each node in the attribute association network is the attribute association weight value between each node, and the attribute association weight value and the weighted value of the two nodes in the two attribute association networks are comprehensively considered. coefficients, and based on the comprehensive association weight values between the nodes in the fusion association network, the feature association networks of the multiple evaluation objects are generated, and the feature association networks of the multiple evaluation objects are more accurately generated.
  • determining the similarity of any two evaluation objects among the multiple evaluation objects according to the multiple node sequences includes: inputting the multiple node sequences into a correlation model of preset word vectors, and generating The embedding vector of the fusion association network; according to the embedding vector of the fusion association network, the similarity of any two evaluation objects in the plurality of evaluation objects is determined.
  • the similarity of any two evaluation objects among the plurality of evaluation objects can be determined according to the embedding vector of the fusion association network, because the embedding vector can be more sufficient and
  • the fusion association network is characterized in detail, so that the similarity of any two evaluation objects among the plurality of evaluation objects can be more accurately determined.
  • the present invention provides an apparatus for determining similarity between objects, comprising: a generating module configured to, for at least one dimension attribute of any attribute group in a plurality of attribute groups, evaluate the at least one dimension attribute of an object , to generate an attribute association network of multiple evaluation objects under the at least one dimension attribute; wherein, each evaluation object has a uniquely mapped node in the attribute association network; the edges between the nodes in the attribute association network The information represents the degree of association between the evaluation objects under the at least one dimension attribute; the fusion module is used for any two attribute association networks to fuse the two attribute association networks to obtain a fusion association network; each The evaluation objects have uniquely mapped nodes in the fusion association network; the edge information between the nodes in the fusion association network represents the comprehensive correlation degree between the evaluation objects; the processing module is used for each fusion association network.
  • the generating module is specifically configured to: for any two evaluation objects in the plurality of evaluation objects, determine the two evaluation objects according to the attribute values of the two evaluation objects under the dimension attribute. Corresponding edge information between two nodes in the attribute association network; according to the edge information of the multiple evaluation objects corresponding to each node in the attribute association network, generating the multiple evaluation objects for the dimension Attribute association network for attributes.
  • the evaluation object is an area; the dimension attribute includes the time series location attribute of the user in the area; the attribute value under the time series location attribute of the user in the area includes: a user identifier; the generating module is specifically used for: The side information of the two areas is determined according to the user identifiers in the two areas.
  • the at least one dimension attribute includes a first type attribute dimension and a second type attribute dimension; the first type attribute dimension and the second type attribute dimension are preset associated attribute dimensions; the generating module It is specifically used for: for the first evaluation object and the second evaluation object in the plurality of evaluation objects, according to the attribute value of the first evaluation object in the first type of attribute dimension and the second evaluation object in the the attribute value of the second type of attribute dimension, determine the edge information between the two evaluation objects in the attribute association network corresponding to the two nodes, or, according to the first evaluation object in the second type of attribute dimension The attribute value of the second evaluation object and the attribute value of the second evaluation object in the first type of attribute dimension, determine the edge information between the two evaluation objects in the attribute association network corresponding to the two nodes; according to the multiple The evaluation object corresponds to edge information between nodes in the attribute association network, and an attribute association network of the plurality of evaluation objects for the first type of attribute dimension and the second type of attribute dimension is generated.
  • the processing module is specifically configured to: determine the random walk probability among the nodes in the fused associative network according to the edge information between the nodes in the fused associative network;
  • the random walk probability between nodes is to randomly traverse the nodes of the fusion association network to obtain the plurality of node sequences.
  • the edge information between the nodes in the attribute association network is the attribute association weight value between the nodes;
  • the edge information between the nodes in the fusion association network is the comprehensive association weight value between the nodes;
  • the The fusion module is specifically used for: for any two nodes in the fusion association network, according to the attribute association weight values and weighting coefficients of the two nodes in the two attribute association networks, determine whether the two nodes are in the same attribute association network.
  • the comprehensive correlation weight value in the fusion correlation network is generated; based on the comprehensive correlation weight value between each node in the fusion correlation network, the characteristic correlation network of the plurality of evaluation objects is generated.
  • the fusion module is specifically configured to: input the plurality of node sequences into a correlation model of preset word vectors, and generate an embedding vector of the fusion association network; determine the embedding vector according to the embedding vector of the fusion association network. The similarity of any two evaluation objects among the multiple evaluation objects.
  • the present invention provides a computer device, including a program or an instruction, which, when the program or instruction is executed, is used to execute the above-mentioned first aspect and each optional method of the first aspect.
  • the present invention provides a storage medium, including a program or an instruction, which, when the program or instruction is executed, is used to execute the above-mentioned first aspect and each optional method of the first aspect.
  • FIG. 1 is a schematic flowchart of steps of a method for determining similarity between objects provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of specific steps of a method for determining similarity between objects provided by an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of an apparatus for determining similarity between objects according to an embodiment of the present application.
  • Step 101 For at least one dimension attribute of any attribute group in the plurality of attribute groups, and for the at least one dimension attribute of the evaluation object, generate an attribute association network of the plurality of evaluation objects under the at least one dimension attribute.
  • Step 102 For any two attribute association networks, fuse the two attribute association networks to obtain a fusion association network.
  • Step 103 Randomly traverse the nodes of each fused associative network to obtain multiple node sequences of the fused associative network.
  • Step 104 Determine, according to the multiple node sequences, the similarity of any two evaluation objects among the multiple evaluation objects under the fusion association network.
  • each evaluation object has a uniquely mapped node in the attribute association network; the edge information between the nodes in the attribute association network represents the relationship between the evaluation objects under the at least one dimension attribute. degree of correlation between.
  • each evaluation object has a uniquely mapped node in the fusion association network; the edge information between the nodes in the fusion association network represents the comprehensive association degree between the evaluation objects.
  • the attribute association network may include the attribute association network of user behavior and the attribute association network of non-user behavior. When including the attribute association network of user behavior and the attribute association network of non-user behavior, the edge information between the nodes in the fusion association network It characterizes the comprehensive degree of association between evaluated objects under the influence of user behavior.
  • step 101 may specifically be:
  • Step (1-1) for any two evaluation objects in the plurality of evaluation objects, according to the attribute values of the two evaluation objects under the dimension attribute, determine that the two evaluation objects are associated in the attribute Corresponding edge information between two nodes in the network.
  • the dimension attribute may include one or more dimension attributes.
  • the attribute value of the dimension attribute of the number of merchants in area A and area B, and the attribute value of the dimension attribute of the number of users who set the business such as the number of merchants in area A and the number of users who set the business in area A, and the number of merchants in area B and the number of users of the set service in area B.
  • the specific expression form of the side information of the two evaluation objects in step (1-2) may be an associated weight value.
  • the evaluation object is an area; the dimension attribute includes the time series location attribute of the user in the area; the attribute value under the time series location attribute of the user in the area includes: a user identifier; step (1- 2) Specifically, it can be:
  • the side information of the two areas is determined according to the user identifiers in the two areas.
  • the side information of the two areas may be determined according to the number of users with the same user ID in the two areas.
  • the two areas may also be determined according to the time difference between the stays of users in the two areas with the same user identifiers in the two areas. side information.
  • the attribute value of the object in the second type of attribute dimension determine the edge information between the two evaluation objects in the attribute association network corresponding to the two nodes, or, according to the first evaluation object in the second
  • the attribute value of the class attribute dimension and the attribute value of the second evaluation object in the first class attribute dimension determine the edge information between the corresponding two nodes of the two evaluation objects in the attribute association network.
  • the first type of attribute dimension in step (2-1) is the proportion of business merchants set in the area
  • the second type of attribute dimension is the set business in the area.
  • the ratio of users then it can be determined according to the ratio of business merchants set in the first region and the ratio of business users set in the second region to determine which of the two nodes corresponding to the first region and the second region in the attribute association network. or, according to the ratio of business users set in the first region and the ratio of business merchants set in the second region, it can be determined that the first region and the second region correspond to two in the attribute association network. Edge information between nodes.
  • the edge information between the nodes in the attribute association network is the attribute association weight value between the nodes; the edge information between the nodes in the fusion association network is the comprehensive association weight between the nodes. value; step 102 may specifically be:
  • the two attribute association networks may be an attribute association network of any user behavior and an attribute association network of any non-user behavior.
  • the attribute association weight value of the attribute association network of user behavior is the first weight value
  • the weighting coefficient value is the first weight value.
  • the attribute association weight value of the attribute association network that is not user behavior is the second weight value
  • the weighting coefficient value is the second weighting coefficient.
  • w synthesis w 1-1 ⁇ a 1-1 +w 1-2 ⁇ a 1-2 +w 1-3 ⁇ a 1-3 +...+w 2-1 ⁇ a 2-1 +w 2-2 ⁇ a 2-2 +w 2-3 ⁇ a 2-3 +...;
  • w 1-x represents the first weight value
  • w 2-x represents the second weight value
  • a 1-x represents the first weighting coefficient
  • a 2-x represents the second weight value
  • step 103 may specifically be:
  • Step (3-1) According to the edge information between the nodes in the fusion association network, determine the random walk probability between the nodes in the fusion association network.
  • the edge between two nodes may be two directed edges, such as the edge from node A to node B, and the edge from node B to node A.
  • the edge information can be the weight value of two directed edges.
  • the random walk probability of one node walking to another node can be determined according to the proportion of the weight value. For example, there are edges between node A and nodes B, C, and D, and the corresponding weights from node A to nodes B, C, and D are 3, 4, and 5, then the random walk probability from node A to node B is 1. /4, the random walk probability from node A to node C is 1/3, and the random walk probability from node A to node C is 5/12.
  • each fusion association network can be traversed multiple times to obtain multiple node sequences.
  • the node sequences of the fusion association network are ABCDE, ABCEF, ACEF, and ABEC.
  • the similarity of the two evaluation objects under the fusion association network can be determined by counting the node sequences, for example, the higher the proportion of the two consecutive sequences of the two nodes in the total sequence, the higher the similarity.
  • step 104 may specifically be:
  • the geographic location is segmented into different regions through spatial data processing methods.
  • GPS Global Positioning System
  • regional feature attribute data multiple attribute association networks are constructed, and the node sequence is generated by the method of probabilistic random walk, and finally the word vector model (such as Skip-Gram) is used to generate node sequences.
  • the method obtains multiple embedding vectors of region nodes and calculates the similarity between regions. Thereby, other unexpanded areas similar to the currently expanded area are excavated.
  • This method fully considers the closeness of association between regions and the similarity of regional attribute portrait features, etc., to mine regions that are more similar to the current region in terms of network structure and attribute information.
  • the specific steps can be:
  • step 101 to step 104 is as follows:
  • the user will generate some GPS location data.
  • the position of the user at a certain time t 1 is denoted as G t1
  • the position of the user at the next moment is denoted as G t2
  • the position G t1 and G t2 constitute an edge between two points through the user's behavioral relationship.
  • a user behavior-based attribute association network can be constructed from multiple user location data between different regions. The strength of the connection between regions is determined by the number of users who generate the edge and the time difference. Therefore, the user behavior relation network in this area is a weighted directed association network.
  • a region-based portrait feature can be designed, which is not an attribute association network of user behavior.
  • the attribute association network for non-user behavior is as follows. By classifying the regional feature portraits, including regional population density, regional merchant industry distribution, regional user age distribution, regional user consumption level, regional user consumption preferences, etc., more regional portrait information can be added. Calculate the similarity of different features between regions, and establish an edge for regions with similar features. For example, the merchants in area A and area B are basically catering merchants, then the associated network is generated based on the feature of merchant type distribution, and an edge can be established between area A and area B. If the age of users in area C and area D is concentrated between (25-40), then when the association network is generated with the feature of user age distribution, an edge can be established between area C and area D, and the same is true for others.
  • the fusion association network generates:
  • the present invention provides an apparatus for determining similarity between objects, including: a generating module 301, for at least one dimension attribute of any attribute group in a plurality of attribute groups, for the at least one dimension attribute of the evaluation object A dimension attribute, generating attribute association networks of multiple evaluation objects under the at least one dimension attribute; wherein, each evaluation object has a uniquely mapped node in the attribute association network; each node in the attribute association network The side information between them represents the degree of association between the evaluation objects under the at least one dimension attribute; the fusion module 302 is used to fuse the two attribute association networks for any two attribute association networks to obtain a fusion association network; each evaluation object has a uniquely mapped node in the fusion association network; the edge information between the nodes in the fusion association network represents the comprehensive correlation degree between the evaluation objects; the processing module 303 is used for The nodes of each fusion association network are randomly traversed to obtain multiple node sequences of the fusion association network; according to the multiple node sequences, it is determined that any two
  • the generating module 301 is specifically configured to: for any two evaluation objects in the plurality of evaluation objects, determine the two evaluation objects according to the attribute values of the two evaluation objects under the dimension attribute.
  • the object corresponds to the edge information between two nodes in the attribute association network; according to the edge information of the multiple evaluation objects corresponding to each node in the attribute association network, the multiple evaluation objects are generated for the Attribute association network for dimension attributes.
  • the evaluation object is an area; the dimension attribute includes the time sequence location attribute of the user in the area; the attribute value under the time sequence location attribute of the user in the area includes: a user identifier; the generating module 301 is specifically used for : Determine the side information of the two areas according to the user identifiers in the two areas.
  • the at least one dimension attribute includes a first type attribute dimension and a second type attribute dimension; the first type attribute dimension and the second type attribute dimension are preset associated attribute dimensions; the generating module 301 is specifically used for: for the first evaluation object and the second evaluation object in the plurality of evaluation objects, according to the attribute value of the first evaluation object in the first type of attribute dimension and the second evaluation object
  • the attribute value of the attribute dimension of the second type determines the edge information between the corresponding two nodes of the two evaluation objects in the attribute association network, or, according to the attribute value of the first evaluation object in the second type attribute
  • the attribute value of the dimension and the attribute value of the second evaluation object in the first type of attribute dimension determine the edge information between the two evaluation objects in the attribute association network corresponding to the two nodes;
  • Each evaluation object corresponds to edge information between nodes in the attribute association network, and an attribute association network of the plurality of evaluation objects for the first type of attribute dimension and the second type of attribute dimension is generated.
  • the processing module 303 is specifically configured to: determine a random walk probability between nodes in the fused associative network according to edge information between the nodes in the fused associative network; The random walk probability between each node is randomly traversed on the nodes of the fusion association network to obtain the plurality of node sequences.
  • the edge information between the nodes in the attribute association network is the attribute association weight value between the nodes;
  • the edge information between the nodes in the fusion association network is the comprehensive association weight value between the nodes;
  • the The fusion module 302 is specifically configured to: for any two nodes in the fusion association network, according to the attribute association weight values and weighting coefficients of the two nodes in the two attribute association networks, determine whether the two nodes are in the attribute association network.
  • the comprehensive correlation weight value in the fusion correlation network based on the comprehensive correlation weight value between each node in the fusion correlation network, the feature correlation network of the plurality of evaluation objects is generated.
  • the fusion module 302 is specifically configured to: input the multiple node sequences into the correlation model of the preset word vector, and generate the embedding vector of the fusion association network; determine the embedding vector of the fusion association network according to the embedding vector of the fusion association network. Similarity of any two evaluation objects among the plurality of evaluation objects.
  • An embodiment of the present application provides a computer device, including a program or an instruction, which, when the program or instruction is executed, is used to perform a method for determining similarity between objects and any optional method provided by the embodiment of the present application.
  • the embodiments of the present application provide a computer-readable storage medium, including a program or an instruction, and when the program or the instruction is executed, the method for determining the similarity between objects provided by the embodiments of the present application and any other possible Choose method.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, etc.) having computer-usable program code embodied therein.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种对象间相似性的确定方法及装置,其中方法为:针对多个属性组中任一属性组的至少一个维度属性,针对评估对象的所述至少一个维度属性,生成多个评估对象在所述至少一个维度属性下的属性关联网络;针对任意两个属性关联网络,将所述两个属性关联网络进行融合,得到融合关联网络;对每个融合关联网络的节点随机遍历,得到所述融合关联网络的多个节点序列;根据所述多个节点序列,确定所述多个评估对象中任意两个评估对象在所述融合关联网络下的相似度。

Description

一种对象间相似性的确定方法及装置
相关申请的交叉引用
本申请要求在2020年08月31日提交中国专利局、申请号为202010896318.8、申请名称为“一种对象间相似性的确定方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及相似性分析领域,尤其涉及一种对象间相似性的确定方法及装置。
背景技术
机构在日常运转的过程中,会涉及各种各样的对象。然而,不同对象可能有不同的特性,举例来说,对于不同区域,机构适合开展业务的类型、方式等可能都不尽相同。为了提升机构决策的效率,这就需要对不同对象的特性做考察,对不同对象有针对性地做合理的决策。
如果能通过一个对象在决策下的表现,能够推知在另一相似对象的表现,这无疑可以很好地指导另一评估对象的决策。因此,如何判断评估对象间的相似性,对于机构决策来说很有研究价值。然而,目前还没有对不同对象的相似性进行评估的方法,这是一个亟待解决的问题。
发明内容
本发明提供一种对象间相似性的确定方法及装置,解决了现有技术中没有对不同对象的相似性进行评估的方法的问题。
第一方面,本发明提供一种对象间相似性的确定方法,包括:针对多个属性组中任一属性组的至少一个维度属性,针对评估对象的所述至少一个维度属性,生成多个评估对象在所述至少一个维度属性下的属性关联网络;其 中,每个评估对象在所述属性关联网络中均有唯一映射的节点;所述属性关联网络中各节点间的边信息表征了在所述至少一个维度属性下评估对象之间的关联程度;针对任意两个属性关联网络,将所述两个属性关联网络进行融合,得到融合关联网络;每个评估对象在所述融合关联网络中均有唯一映射的节点;所述融合关联网络中各节点间的边信息表征了评估对象之间的综合关联程度;对每个融合关联网络的节点随机遍历,得到所述融合关联网络的多个节点序列;根据所述多个节点序列,确定所述多个评估对象中任意两个评估对象在所述融合关联网络下的相似度。
上述方法中,生成了多个评估对象在所述至少一个维度属性下的属性关联网络,在所述属性关联网络中表征了所述多个评估对象之间的关联程度,再通过将所述两个属性关联网络进行融合,得到融合关联网络,从而充分表征多个评估对象之间针对区域特征的综合关联程度,并进一步对所述融合关联网络的节点随机遍历,得到所述融合关联网络的多个节点序列,从而确定所述多个评估对象中任意两个评估对象在所述融合关联网络下的相似度,从而提供了一种确定对象间相似性的方法。
可选的,所述针对评估对象的至少一个维度属性,生成多个评估对象在所述至少一个维度属性下的属性关联网络,包括:针对所述多个评估对象中任意两个评估对象,根据所述两个评估对象在所述维度属性下的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息;根据所述多个评估对象在所述属性关联网络中对应各节点之间的边信息,生成所述多个评估对象针对所述维度属性的属性关联网络。
上述方法中,根据所述两个评估对象在所述维度属性下的属性值,进一步确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息,生成属性关联网络,从而提供了一种在相同的所述维度属性下的属性值情况下生成属性关联网络的方法,从而增加了生成所述多个评估对象针对所述维度属性的属性关联网络的灵活性。
可选的,所述评估对象为区域;所述维度属性包括区域中用户的时序位 置属性;所述区域中用户的时序位置属性下的属性值包括:用户标识;所述根据所述两个评估对象在所述维度属性下的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息,包括:根据所述两个区域中的用户标识,确定所述两个区域的边信息。
上述方法中,当所述评估对象为区域时,由于时序位置属性能够表征区域在时序上的关联性,因此,根据所述两个区域中的用户标识,更精确地确定所述两个区域的边信息的方法。
可选的,所述至少一个维度属性包括第一类属性维度和第二类属性维度;所述第一类属性维度和所述第二类属性维度为预设关联的属性维度;所述针对评估对象的至少一个维度属性,生成多个评估对象在所述至少一个维度属性下的属性关联网络,包括:针对所述多个评估对象中的第一评估对象和第二评估对象,根据所述第一评估对象在所述第一类属性维度的属性值与所述第二评估对象在所述第二类属性维度的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息,或者,根据所述第一评估对象在所述第二类属性维度的属性值与所述第二评估对象在所述第一类属性维度的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息;根据所述多个评估对象在所述属性关联网络中对应各节点之间的边信息,生成所述多个评估对象针对所述第一类属性维度和所述第二类属性维度的属性关联网络。
上述方式下,由于所述第一类属性维度和所述第二类属性维度为预设关联的属性维度,可以通过第一评估对象和第二评估对象不同类别属性维度的属性值,确定所述属性关联网络中对应两节点之间的边信息,进而生成属性关联网络,从而提供了一种针对不同类别属性维度的属性关联网络生成方法。
可选的,所述对每个融合关联网络的节点随机遍历,得到多个节点序列,包括:根据所述融合关联网络中各节点间的边信息,确定所述融合关联网络中各节点间的随机游走概率;基于所述融合关联网络中各节点间的随机游走概率,对所述融合关联网络的节点随机遍历,得到所述多个节点序列。
上述方式下,根据所述融合关联网络中各节点间的边信息,确定各节点间的随机游走概率,从而在考虑各节点间的随机游走概率的基础上,对所述融合关联网络的节点随机遍历,更精确地得到所述多个节点序列。
可选的,所述属性关联网络中各节点间的边信息为各节点间的属性关联权重值;所述融合关联网络中各节点间的边信息为各节点间的综合关联权重值;所述将所述两个属性关联网络进行融合,得到融合关联网络,包括:针对所述融合关联网络中任意两个节点,根据所述两个节点在所述两个属性关联网络中的属性关联权重值以及加权系数,确定所述两个节点在所述融合关联网络中的综合关联权重值;基于所述融合关联网络中各节点间的综合关联权重值,生成所述多个评估对象的特征关联网络。
上述方式下,所述属性关联网络中各节点间的边信息为各节点间的属性关联权重值,综合考虑了所述两个节点在所述两个属性关联网络中的属性关联权重值以及加权系数,并基于所述融合关联网络中各节点间的综合关联权重值,生成所述多个评估对象的特征关联网络,更准确地生成所述多个评估对象的特征关联网络。
可选的,所述根据所述多个节点序列,确定所述多个评估对象中任意两个评估对象的相似度,包括:将所述多个节点序列输入预设词向量的相关模型,生成所述融合关联网络的嵌入向量;根据所述融合关联网络的嵌入向量,确定所述多个评估对象中任意两个评估对象的相似度。
上述方式下,生成所述融合关联网络的嵌入向量后,可以根据所述融合关联网络的嵌入向量,确定所述多个评估对象中任意两个评估对象的相似度,由于嵌入向量能更充分、细化地表征所述融合关联网络,从而可以更精确地确定所述多个评估对象中任意两个评估对象的相似度。
第二方面,本发明提供一种对象间相似性的确定装置,包括:生成模块,用于针对多个属性组中任一属性组的至少一个维度属性,针对评估对象的所述至少一个维度属性,生成多个评估对象在所述至少一个维度属性下的属性关联网络;其中,每个评估对象在所述属性关联网络中均有唯一映射的节点; 所述属性关联网络中各节点间的边信息表征了在所述至少一个维度属性下评估对象之间的关联程度;融合模块,用于针对任意两个属性关联网络,将所述两个属性关联网络进行融合,得到融合关联网络;每个评估对象在所述融合关联网络中均有唯一映射的节点;所述融合关联网络中各节点间的边信息表征了评估对象之间的综合关联程度;处理模块,用于对每个融合关联网络的节点随机遍历,得到所述融合关联网络的多个节点序列;根据所述多个节点序列,确定所述多个评估对象中任意两个评估对象在所述融合关联网络下的相似度。
可选的,所述生成模块具体用于:针对所述多个评估对象中任意两个评估对象,根据所述两个评估对象在所述维度属性下的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息;根据所述多个评估对象在所述属性关联网络中对应各节点之间的边信息,生成所述多个评估对象针对所述维度属性的属性关联网络。
可选的,所述评估对象为区域;所述维度属性包括区域中用户的时序位置属性;所述区域中用户的时序位置属性下的属性值包括:用户标识;所述生成模块具体用于:根据所述两个区域中的用户标识,确定所述两个区域的边信息。
可选的,所述至少一个维度属性包括第一类属性维度和第二类属性维度;所述第一类属性维度和所述第二类属性维度为预设关联的属性维度;所述生成模块具体用于:针对所述多个评估对象中的第一评估对象和第二评估对象,根据所述第一评估对象在所述第一类属性维度的属性值与所述第二评估对象在所述第二类属性维度的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息,或者,根据所述第一评估对象在所述第二类属性维度的属性值与所述第二评估对象在所述第一类属性维度的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息;根据所述多个评估对象在所述属性关联网络中对应各节点之间的边信息,生成所述多个评估对象针对所述第一类属性维度和所述第二类属性维度的属性关联 网络。
可选的,所述处理模块具体用于:根据所述融合关联网络中各节点间的边信息,确定所述融合关联网络中各节点间的随机游走概率;基于所述融合关联网络中各节点间的随机游走概率,对所述融合关联网络的节点随机遍历,得到所述多个节点序列。
可选的,所述属性关联网络中各节点间的边信息为各节点间的属性关联权重值;所述融合关联网络中各节点间的边信息为各节点间的综合关联权重值;所述融合模块具体用于:针对所述融合关联网络中任意两个节点,根据所述两个节点在所述两个属性关联网络中的属性关联权重值以及加权系数,确定所述两个节点在所述融合关联网络中的综合关联权重值;基于所述融合关联网络中各节点间的综合关联权重值,生成所述多个评估对象的特征关联网络。
可选的,所述融合模块具体用于:将所述多个节点序列输入预设词向量的相关模型,生成所述融合关联网络的嵌入向量;根据所述融合关联网络的嵌入向量,确定所述多个评估对象中任意两个评估对象的相似度。
上述第二方面及第二方面各个可选装置的有益效果,可以参考上述第一方面及第一方面各个可选方法的有益效果,这里不再赘述。
第三方面,本发明提供一种计算机设备,包括程序或指令,当所述程序或指令被执行时,用以执行上述第一方面及第一方面各个可选的方法。
第四方面,本发明提供一种存储介质,包括程序或指令,当所述程序或指令被执行时,用以执行上述第一方面及第一方面各个可选的方法。
附图说明
图1为本申请实施例提供的一种对象间相似性的确定方法的步骤流程示意图;
图2为本申请实施例提供的一种对象间相似性的确定方法的具体步骤流程示意图;
图3为本申请实施例提供的一种对象间相似性的确定装置的结构示意图。
具体实施方式
为了更好的理解上述技术方案,下面将结合说明书附图及具体的实施方式对上述技术方案进行详细的说明,应当理解本申请实施例以及实施例中的具体特征是对本申请技术方案的详细的说明,而不是对本申请技术方案的限定,在不冲突的情况下,本申请实施例以及实施例中的技术特征可以相互结合。
机构在日常运转的过程中,会涉及各种各样的对象。如果能通过一个对象在决策下的表现,可以推知在另一相似对象的表现,这无疑可以很好地指导另一评估对象的决策。因此,如何判断评估对象间的相似性,对于机构决策来说很有研究价值。然而,目前还没有对不同对象的相似性进行评估的方法,这是一个亟待解决的问题。为此,如图1所示,本申请提供一种对象间相似性的确定方法。
步骤101:针对多个属性组中任一属性组的至少一个维度属性,针对评估对象的所述至少一个维度属性,生成多个评估对象在所述至少一个维度属性下的属性关联网络。
步骤102:针对任意两个属性关联网络,将所述两个属性关联网络进行融合,得到融合关联网络。
步骤103:对每个融合关联网络的节点随机遍历,得到所述融合关联网络的多个节点序列。
步骤104:根据所述多个节点序列,确定所述多个评估对象中任意两个评估对象在所述融合关联网络下的相似度。
步骤101~步骤104中,每个评估对象在所述属性关联网络中均有唯一映射的节点;所述属性关联网络中各节点间的边信息表征了在所述至少一个维度属性下评估对象之间的关联程度。其中,每个评估对象在所述融合关联网络中均有唯一映射的节点;所述融合关联网络中各节点间的边信息表征了评 估对象之间的综合关联程度。需要说明的是,评估对象可以有多种情况,如区域,还可以为机构,此方法不仅可以用于区域之间的相似性评估,同时也可以应用于推荐系统之间的相似性评估。属性关联网络可以包括用户行为的属性关联网络以及非用户行为的属性关联网络,当包括用户行为的属性关联网络以及非用户行为的属性关联网络时,所述融合关联网络中各节点间的边信息表征了在用户行为的影响下评估对象之间的综合关联程度。
一种可选实施方式中,步骤101具体可以为:
步骤(1-1):针对所述多个评估对象中任意两个评估对象,根据所述两个评估对象在所述维度属性下的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息。
需要说明的是,所述维度属性可以包括一个或多个维度属性。
举例来说,区域A和区域B内在商户数量维度属性、设定业务的用户数量维度属性的属性值,如区域A的商户数量及区域A中设定业务的用户数量,以及区域B的商户数量及区域B中设定业务的用户数量。
步骤(1-2):根据所述多个评估对象在所述属性关联网络中对应各节点之间的边信息,生成所述多个评估对象针对所述维度属性的属性关联网络。
举例来说,步骤(1-2)中两个评估对象的边信息具体表现形式可以为关联权重值。
一种可选实施方式中,所述评估对象为区域;所述维度属性包括区域中用户的时序位置属性;所述区域中用户的时序位置属性下的属性值包括:用户标识;步骤(1-2)具体可以为:
根据所述两个区域中的用户标识,确定所述两个区域的边信息。
具体来说,可以根据所述两个区域中具有相同用户标识的用户个数,确定所述两个区域的边信息。
需要说明的是,上述实施方式中,除了所述两个区域中的用户标识外,还可以根据所述两个区域中具有相同用户标识的用户在两个区域的逗留时差确定所述两个区域的边信息。
另一种可选实施方式中,所述至少一个维度属性包括第一类属性维度和第二类属性维度;所述第一类属性维度和所述第二类属性维度为预设关联的属性维度;步骤101具体可以为:
步骤(2-1):针对所述多个评估对象中的第一评估对象和第二评估对象,根据所述第一评估对象在所述第一类属性维度的属性值与所述第二评估对象在所述第二类属性维度的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息,或者,根据所述第一评估对象在所述第二类属性维度的属性值与所述第二评估对象在所述第一类属性维度的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息。
举例来说,当评估对象为区域时,步骤(2-1)中的所述第一类属性维度为区域中设定业务商户的比例,所述第二类属性维度为区域中设定业务中用户的比例,那么可以根据第一区域中设定业务商户的比例与第二区域中设定业务用户的比例,确定所述第一区域和第二区域在所述属性关联网络中对应两节点之间的边信息,或者,可以根据第一区域中设定业务用户的比例与第二区域中设定业务商户的比例,确定所述第一区域和第二区域在所述属性关联网络中对应两节点之间的边信息。
步骤(2-2):根据所述多个评估对象在所述属性关联网络中对应各节点之间的边信息,生成所述多个评估对象针对所述第一类属性维度和所述第二类属性维度的属性关联网络。
一种可选实施方式中,所述属性关联网络中各节点间的边信息为各节点间的属性关联权重值;所述融合关联网络中各节点间的边信息为各节点间的综合关联权重值;步骤102具体可以为:
针对所述融合关联网络中任意两个节点,根据所述两个节点在所述两个属性关联网络中的属性关联权重值以及加权系数,确定所述两个节点在所述融合关联网络中的综合关联权重值;基于所述融合关联网络中各节点间的综合关联权重值,生成所述多个评估对象的特征关联网络。
举例来说,两个属性关联网络可以为任一用户行为的属性关联网络以及 任一非用户行为的属性关联网络,用户行为的属性关联网络的属性关联权重值为第一权重值,加权系数值为第一加权系数,非用户行为的属性关联网络的属性关联权重值为第二权重值,加权系数值为第二加权系数。上面仅举出了任意两个属性关联网络在所述融合关联网络中的综合关联权重值的例子,而实际上可以更多个属性关联网络一起融合,进一步地,第一权重值、第二权重值也可以有多个。举例来说,综合关联权重值可以按照以下方式计算:
w 综合=w 1-1·a 1-1+w 1-2·a 1-2+w 1-3·a 1-3+…+w 2-1·a 2-1+w 2-2·a 2-2+w 2-3·a 2-3+…;
其中,w 1-x表示第一权重值,w 2-x表示第二权重值,a 1-x表示第一加权系数,a 2-x表示第二权重值。
一种可选实施方式中,步骤103具体可以为:
步骤(3-1):根据所述融合关联网络中各节点间的边信息,确定所述融合关联网络中各节点间的随机游走概率。
需要说明的是,上述步骤(3-1)中,两个节点间的边可以为两条有向边,如节点A到节点B的边,节点B到节点A的边。边信息可以为两个有向边的权重值。其中,一个节点游走到另一节点的随机游走概率,可以按照权重值的比例来确定。举例来说,节点A与节点B、C和D之间有边,节点A到节点B、C和D权重值对应为3,4,5,则节点A到节点B的随机游走概率为1/4,节点A到节点C的随机游走概率为1/3,节点A到节点C的随机游走概率为5/12。
步骤(3-2):基于所述融合关联网络中各节点间的随机游走概率,对所述融合关联网络的节点随机遍历,得到所述多个节点序列。
需要说明的是,每个融合关联网络都可以进行多次遍历,得到多个节点序列,如融合关联网络的节点序列为ABCDE、ABCEF、ACEF、ABEC。可以通过统计节点序列的情况来确定两个评估对象在所述融合关联网络下的相似度,如两个节点连续的序列占总序列的比例越多,相似性越高。
一种可选实施方式中,步骤104具体可以为:
将所述多个节点序列输入预设词向量的相关模型,生成所述融合关联网 络的嵌入向量;根据所述融合关联网络的嵌入向量,确定所述多个评估对象中任意两个评估对象的相似度。
下面综合本申请步骤101~步骤104的描述,进一步举例详细描述本申请提供的一种对象间相似性的确定方法。具体来说,以评估对象是区域为例,该过程概括为:
通过空间数据处理方法,将地理位置切分为不同的区域。通过用户的全球定位系统(Global Positioning System,GPS)时空数据、区域特征属性数据,构建多个属性关联网络,利用概率随机游走的方法生成节点序列,最后通过词向量模型(如Skip-Gram)方式获取区域节点的多个嵌入向量,计算区域之间的相似性。从而挖掘出与当前拓展好的区域相似的其他未拓展的区域。该方法充分考虑区域之间的关联紧密度,区域属性画像特征的相似性等,来挖掘与当前区域从网络结构上、属性信息上更为相似的区域。具体步骤可以为:
(1)通过用户的GPS时空迁移数据,形成区域之间的时空关联网络G gps
(2)抽取区域的画像特征,构建区域的特征集V(v 1,v 2,v 3,……,v n),如区域的用户年龄分布、区域的商户分布、区域的用户消费力水平等。将具有相似属性的区域之间,建立一条边,从而构建区域之间的属性关联网络G v1,G v2,G vn
(3)通过属性关联网络的融合,形成融合关联网络,利用概率游走的方式生成区域节点序列。
(4)通过Skip-Gram模型,生成区域的多个嵌入向量,及不同融合网络下区域节点间的相似度。
(5)通过加权平均方法综合评定区域间的相似度。
更具体地,如图2所示,以用户行为的属性关联网络和非用户行为的属性关联网络为例,步骤101~步骤104的过程如下:
(1)用户行为的属性关联网络生成:
用户在应用(APP)的使用过程中,会产生一些GPS位置数据。将用户 在某一时刻t 1的位置记为G t1,下一时刻用户的位置记为G t2,依次类推。位置G t1与G t2之间,通过用户的行为关系,构成了两点之间的一条边。同理,不同的区域之间可以通过多个用户位置数据构建成一个基于用户行为的属性关联网络。区域与区域之间连接的强度由生成该条边的用户个数以及时间差决定。因此,该区域用户行为关系网络是一个带权重的有向关联网络。
(2)非用户行为的属性关联网络:
对于没有用户GPS覆盖的区域,无法加入到用户行为的属性关联网络中来,这些区域也是一个高潜力的拓展区域。基于此,可以设计基于区域的画像特征,来非用户行为的属性关联网络。
非用户行为的属性关联网络如下。将区域的特征画像进行分类,包括区域人流密度、区域商户行业分布、区域用户年龄分布、区域用户消费力水平,区域用户的消费偏好等,可以增加更多的区域画像信息。计算区域间不同特征的相似度,将具有相似特征的区域建立一条边。比如区域A和区域B内的商户基本都为餐饮商户,那么以商户类型分布这个特征进行关联网络生成是,区域A与区域B之间可以建立一条边。如果区域C和区域D里面的用户年龄集中在(25-40)之间,那么以用户年龄分布这个特征进行关联网络生成时,区域C和区域D之间可以建立一条边,其他同理。
通过不同的非用户属性的划分,可以生成多个非用户行为的属性关联网络。
融合关联网络生成:
通过将用户行为的属性关联网络与非用户行为的属性关联网络进行融合,可以生成多个融合关联网络。
区域节点相似度计算:
对不同的融合关联网络,通过前述方式,计算不同的融合关联网络下区域节点的嵌入向量,并获取不同的融合关联网络下节点间的相似度。
如图3所示,本发明提供一种对象间相似性的确定装置,包括:生成模块301,用于针对多个属性组中任一属性组的至少一个维度属性,针对评估对 象的所述至少一个维度属性,生成多个评估对象在所述至少一个维度属性下的属性关联网络;其中,每个评估对象在所述属性关联网络中均有唯一映射的节点;所述属性关联网络中各节点间的边信息表征了在所述至少一个维度属性下评估对象之间的关联程度;融合模块302,用于针对任意两个属性关联网络,将所述两个属性关联网络进行融合,得到融合关联网络;每个评估对象在所述融合关联网络中均有唯一映射的节点;所述融合关联网络中各节点间的边信息表征了评估对象之间的综合关联程度;处理模块303,用于对每个融合关联网络的节点随机遍历,得到所述融合关联网络的多个节点序列;根据所述多个节点序列,确定所述多个评估对象中任意两个评估对象在所述融合关联网络下的相似度。
可选的,所述生成模块301具体用于:针对所述多个评估对象中任意两个评估对象,根据所述两个评估对象在所述维度属性下的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息;根据所述多个评估对象在所述属性关联网络中对应各节点之间的边信息,生成所述多个评估对象针对所述维度属性的属性关联网络。
可选的,所述评估对象为区域;所述维度属性包括区域中用户的时序位置属性;所述区域中用户的时序位置属性下的属性值包括:用户标识;所述生成模块301具体用于:根据所述两个区域中的用户标识,确定所述两个区域的边信息。
可选的,所述至少一个维度属性包括第一类属性维度和第二类属性维度;所述第一类属性维度和所述第二类属性维度为预设关联的属性维度;所述生成模块301具体用于:针对所述多个评估对象中的第一评估对象和第二评估对象,根据所述第一评估对象在所述第一类属性维度的属性值与所述第二评估对象在所述第二类属性维度的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息,或者,根据所述第一评估对象在所述第二类属性维度的属性值与所述第二评估对象在所述第一类属性维度的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息; 根据所述多个评估对象在所述属性关联网络中对应各节点之间的边信息,生成所述多个评估对象针对所述第一类属性维度和所述第二类属性维度的属性关联网络。
可选的,所述处理模块303具体用于:根据所述融合关联网络中各节点间的边信息,确定所述融合关联网络中各节点间的随机游走概率;基于所述融合关联网络中各节点间的随机游走概率,对所述融合关联网络的节点随机遍历,得到所述多个节点序列。
可选的,所述属性关联网络中各节点间的边信息为各节点间的属性关联权重值;所述融合关联网络中各节点间的边信息为各节点间的综合关联权重值;所述融合模块302具体用于:针对所述融合关联网络中任意两个节点,根据所述两个节点在所述两个属性关联网络中的属性关联权重值以及加权系数,确定所述两个节点在所述融合关联网络中的综合关联权重值;基于所述融合关联网络中各节点间的综合关联权重值,生成所述多个评估对象的特征关联网络。
可选的,所述融合模块302具体用于:将所述多个节点序列输入预设词向量的相关模型,生成所述融合关联网络的嵌入向量;根据所述融合关联网络的嵌入向量,确定所述多个评估对象中任意两个评估对象的相似度。
本申请实施例提供一种计算机设备,包括程序或指令,当所述程序或指令被执行时,用以执行本申请实施例提供的一种对象间相似性的确定方法及任一可选方法。
本申请实施例提供一种计算机可读存储介质,包括程序或指令,当所述程序或指令被执行时,用以执行本申请实施例提供的一种对象间相似性的确定方法及任一可选方法。
最后应说明的是:本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质 (包括但不限于磁盘存储器、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (10)

  1. 一种对象间相似性的确定方法,其特征在于,包括:
    针对多个属性组中任一属性组的至少一个维度属性,针对评估对象的所述至少一个维度属性,生成多个评估对象在所述至少一个维度属性下的属性关联网络;其中,每个评估对象在所述属性关联网络中均有唯一映射的节点;所述属性关联网络中各节点间的边信息表征了在所述至少一个维度属性下评估对象之间的关联程度;
    针对任意两个属性关联网络,将所述两个属性关联网络进行融合,得到融合关联网络;每个评估对象在所述融合关联网络中均有唯一映射的节点;所述融合关联网络中各节点间的边信息表征了评估对象之间的综合关联程度;
    对每个融合关联网络的节点随机遍历,得到所述融合关联网络的多个节点序列;
    根据所述多个节点序列,确定所述多个评估对象中任意两个评估对象在所述融合关联网络下的相似度。
  2. 如权利要求1所述的方法,其特征在于,所述针对评估对象的至少一个维度属性,生成多个评估对象在所述至少一个维度属性下的属性关联网络,包括:
    针对所述多个评估对象中任意两个评估对象,根据所述两个评估对象在所述维度属性下的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息;
    根据所述多个评估对象在所述属性关联网络中对应各节点之间的边信息,生成所述多个评估对象针对所述维度属性的属性关联网络。
  3. 如权利要求2所述的方法,其特征在于,所述评估对象为区域;所述维度属性包括区域中用户的时序位置属性;所述区域中用户的时序位置属性下的属性值包括:用户标识;所述根据所述两个评估对象在所述维度属性下的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的 边信息,包括:
    根据所述两个区域中的用户标识,确定所述两个区域的边信息。
  4. 如权利要求1所述的方法,其特征在于,所述至少一个维度属性包括第一类属性维度和第二类属性维度;所述第一类属性维度和所述第二类属性维度为预设关联的属性维度;所述针对评估对象的至少一个维度属性,生成多个评估对象在所述至少一个维度属性下的属性关联网络,包括:
    针对所述多个评估对象中的第一评估对象和第二评估对象,根据所述第一评估对象在所述第一类属性维度的属性值与所述第二评估对象在所述第二类属性维度的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息,或者,根据所述第一评估对象在所述第二类属性维度的属性值与所述第二评估对象在所述第一类属性维度的属性值,确定所述两个评估对象在所述属性关联网络中对应两节点之间的边信息;
    根据所述多个评估对象在所述属性关联网络中对应各节点之间的边信息,生成所述多个评估对象针对所述第一类属性维度和所述第二类属性维度的属性关联网络。
  5. 如权利要求1至4任一所述的方法,其特征在于,所述对每个融合关联网络的节点随机遍历,得到多个节点序列,包括:
    根据所述融合关联网络中各节点间的边信息,确定所述融合关联网络中各节点间的随机游走概率;
    基于所述融合关联网络中各节点间的随机游走概率,对所述融合关联网络的节点随机遍历,得到所述多个节点序列。
  6. 如权利要求1至4任一所述的方法,其特征在于,所述属性关联网络中各节点间的边信息为各节点间的属性关联权重值;所述融合关联网络中各节点间的边信息为各节点间的综合关联权重值;所述将所述两个属性关联网络进行融合,得到融合关联网络,包括:
    针对所述融合关联网络中任意两个节点,根据所述两个节点在所述两个属性关联网络中的属性关联权重值以及加权系数,确定所述两个节点在所述 融合关联网络中的综合关联权重值;
    基于所述融合关联网络中各节点间的综合关联权重值,生成所述多个评估对象的特征关联网络。
  7. 如权利要求1至4任一所述的方法,其特征在于,所述根据所述多个节点序列,确定所述多个评估对象中任意两个评估对象的相似度,包括:
    将所述多个节点序列输入预设词向量的相关模型,生成所述融合关联网络的嵌入向量;
    根据所述融合关联网络的嵌入向量,确定所述多个评估对象中任意两个评估对象的相似度。
  8. 一种对象间相似性的确定装置,其特征在于,包括:
    生成模块,用于针对多个属性组中任一属性组的至少一个维度属性,针对评估对象的所述至少一个维度属性,生成多个评估对象在所述至少一个维度属性下的属性关联网络;其中,每个评估对象在所述属性关联网络中均有唯一映射的节点;所述属性关联网络中各节点间的边信息表征了在所述至少一个维度属性下评估对象之间的关联程度;
    融合模块,用于针对任意两个属性关联网络,将所述两个属性关联网络进行融合,得到融合关联网络;每个评估对象在所述融合关联网络中均有唯一映射的节点;所述融合关联网络中各节点间的边信息表征了评估对象之间的综合关联程度;
    处理模块,用于对每个融合关联网络的节点随机遍历,得到所述融合关联网络的多个节点序列;根据所述多个节点序列,确定所述多个评估对象中任意两个评估对象在所述融合关联网络下的相似度。
  9. 一种计算机设备,其特征在于,包括程序或指令,当所述程序或指令被执行时,如权利要求1至7中任意一项所述的方法被执行。
  10. 一种计算机可读存储介质,其特征在于,包括程序或指令,当所述程序或指令被执行时,如权利要求1至7中任意一项所述的方法被执行。
PCT/CN2020/139531 2020-08-31 2020-12-25 一种对象间相似性的确定方法及装置 WO2022041600A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020237009620A KR20230054438A (ko) 2020-08-31 2020-12-25 객체 간 유사성 결정을 위한 방법 및 장치

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010896318.8A CN112016836B (zh) 2020-08-31 2020-08-31 一种对象间相似性的确定方法及装置
CN202010896318.8 2020-08-31

Publications (1)

Publication Number Publication Date
WO2022041600A1 true WO2022041600A1 (zh) 2022-03-03

Family

ID=73503484

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/139531 WO2022041600A1 (zh) 2020-08-31 2020-12-25 一种对象间相似性的确定方法及装置

Country Status (3)

Country Link
KR (1) KR20230054438A (zh)
CN (1) CN112016836B (zh)
WO (1) WO2022041600A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016836B (zh) * 2020-08-31 2023-11-03 中国银联股份有限公司 一种对象间相似性的确定方法及装置
CN113362158B (zh) * 2021-05-31 2024-06-11 中国银联股份有限公司 一种信用评估方法、装置及计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760503A (zh) * 2016-02-23 2016-07-13 清华大学 一种快速计算图节点相似度的方法
CN109712678A (zh) * 2018-12-12 2019-05-03 中国人民解放军军事科学院军事医学研究院 关系预测方法、装置及电子设备
CN110046301A (zh) * 2019-01-24 2019-07-23 阿里巴巴集团控股有限公司 对象推荐方法和装置
CN112016836A (zh) * 2020-08-31 2020-12-01 中国银联股份有限公司 一种对象间相似性的确定方法及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8874616B1 (en) * 2011-07-11 2014-10-28 21Ct, Inc. Method and apparatus for fusion of multi-modal interaction data
US9946800B2 (en) * 2015-07-06 2018-04-17 International Business Machines Corporation Ranking related objects using blink model based relation strength determinations
GB2545931A (en) * 2015-12-31 2017-07-05 Francis Murphy Dominic Defining edges and their weights between nodes in a network
US10129276B1 (en) * 2016-03-29 2018-11-13 EMC IP Holding Company LLC Methods and apparatus for identifying suspicious domains using common user clustering
CN108132927B (zh) * 2017-12-07 2022-02-11 西北师范大学 一种融合图结构与节点关联的关键词提取方法
CN110659799A (zh) * 2019-08-14 2020-01-07 深圳壹账通智能科技有限公司 基于关系网络的属性信息处理方法、装置、计算机设备和存储介质
CN110968701A (zh) * 2019-11-05 2020-04-07 量子数聚(北京)科技有限公司 用于图神经网络的关系图谱建立方法以及装置、设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760503A (zh) * 2016-02-23 2016-07-13 清华大学 一种快速计算图节点相似度的方法
CN109712678A (zh) * 2018-12-12 2019-05-03 中国人民解放军军事科学院军事医学研究院 关系预测方法、装置及电子设备
CN110046301A (zh) * 2019-01-24 2019-07-23 阿里巴巴集团控股有限公司 对象推荐方法和装置
CN112016836A (zh) * 2020-08-31 2020-12-01 中国银联股份有限公司 一种对象间相似性的确定方法及装置

Also Published As

Publication number Publication date
TW202211052A (zh) 2022-03-16
CN112016836B (zh) 2023-11-03
CN112016836A (zh) 2020-12-01
KR20230054438A (ko) 2023-04-24

Similar Documents

Publication Publication Date Title
WO2019233258A1 (zh) 信息发送方法、装置、系统和计算机可读存储介质
CN109492111B (zh) 最短路径查询方法、系统、计算机设备和存储介质
Pavlis et al. A modified DBSCAN clustering method to estimate retail center extent
TW201901539A (zh) 風險評估方法、裝置、電腦設備及存儲介質
WO2022041600A1 (zh) 一种对象间相似性的确定方法及装置
CN106708844A (zh) 一种用户群体的划分方法和装置
CN107220312B (zh) 一种基于共现图的兴趣点推荐方法及系统
WO2017173929A1 (zh) 无监督的特征选择方法、装置
TWI760325B (zh) 一種應用程式的分類方法和裝置
Sharma et al. Improved density based spatial clustering of applications of noise clustering algorithm for knowledge discovery in spatial data
Lin et al. Inferring the home locations of Twitter users based on the spatiotemporal clustering of Twitter data
CN110119478B (zh) 一种结合多种用户反馈数据的基于相似度的物品推荐方法
Zhang et al. Detecting colocation flow patterns in the geographical interaction data
Li et al. A spatial-temporal probabilistic matrix factorization model for point-of-interest recommendation
Singh et al. A survey on the generation of recommender systems
Li et al. Social recommendation based on trust and influence in SNS environments
CN113220904A (zh) 数据处理方法及数据处理装置、电子设备
Reddy et al. An enhanced travel package recommendation system based on location dependent social data
Song et al. Personalized poi recommendation based on check-in data and geographical-regional influence
CN112214684B (zh) 一种种子扩展的重叠社区发现方法及装置
Pan et al. Continuous top-k query for graph streams
TWI842973B (zh) 一種物件間相似性的確定方法及裝置
CN112328835A (zh) 对象的向量表示的生成方法、装置、电子设备及存储介质
CN110473052B (zh) 一种基于用户交互信誉的社区发现方法
Kumari et al. Travel recommendation system using geotagged photos

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20951266

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20237009620

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20951266

Country of ref document: EP

Kind code of ref document: A1