TWI621989B - Graph-based method and system for analyzing users - Google Patents

Graph-based method and system for analyzing users Download PDF

Info

Publication number
TWI621989B
TWI621989B TW105143938A TW105143938A TWI621989B TW I621989 B TWI621989 B TW I621989B TW 105143938 A TW105143938 A TW 105143938A TW 105143938 A TW105143938 A TW 105143938A TW I621989 B TWI621989 B TW I621989B
Authority
TW
Taiwan
Prior art keywords
user
information
merchant
merchants
association
Prior art date
Application number
TW105143938A
Other languages
Chinese (zh)
Other versions
TW201725499A (en
Inventor
Dong-Jie He
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed filed Critical
Publication of TW201725499A publication Critical patent/TW201725499A/en
Application granted granted Critical
Publication of TWI621989B publication Critical patent/TWI621989B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Discrete Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

公開一種基於圖的分析用戶的方法和系統。方法包括:A.資料特徵解析過程,包括:對用戶、商戶、以及用戶與和商戶之間發生的資料記錄進行解析,獲取關鍵資訊,其中,關鍵資訊包括用戶標識、商戶標識、在用戶與商戶之間產生的消費資訊;利用獲取的關鍵資訊產生該圖的頂點資訊和邊資訊,其中將用戶標識和商戶標識作為頂點資訊、將消費資訊作為邊資訊;B.關聯分析過程,包括:至少基於與第一用戶關聯的一個或多個商戶,分析該第一用戶與其他用戶關聯。 A method and system for analyzing a user based on a graph is disclosed. The method includes: A. data feature parsing process, including: parsing data records occurring between users, merchants, and users and merchants to obtain key information, wherein key information includes user identifier, merchant identifier, and user and merchant The consumption information generated between the two uses the acquired key information to generate vertex information and side information of the graph, wherein the user identifier and the merchant identifier are used as vertex information, and the consumption information is used as side information; B. the association analysis process includes: at least based on One or more merchants associated with the first user, analyzing the first user to be associated with other users.

Description

基於圖的分析用戶的方法和系統 Graph-based method and system for analyzing users

本發明的實施例涉及資料分析,並且具體地涉及基於圖的分析用戶的方法和系統。 Embodiments of the present invention relate to data analysis and, in particular, to graph-based methods and systems for analyzing users.

隨著大資料技術的快速發展,面向用戶個人的資料分析成為可能。傳統的用戶分析通過貝葉斯、決策樹等方法進行用戶的分類和聚類,發現用戶之間的關聯關係。然而,在大規模資料的情況下,面向用戶個體的關聯分類演算法難以進行有效的處理,其往往花費很長的計算時間。特別是,面向迭代類型的模型演算法在處理大規模資料時的效率極其低下。另外,一旦用戶資訊的被更新,則需要重新計算用戶的關聯分類,這將極大影響結果資料的效用。 With the rapid development of large data technologies, data analysis for individual users has become possible. Traditional user analysis uses user methods such as Bayesian and decision tree to classify and cluster users, and discover the relationship between users. However, in the case of large-scale data, the association classification algorithm for individual users is difficult to perform effectively, which often takes a long calculation time. In particular, model-oriented algorithms for iterative types are extremely inefficient at processing large-scale data. In addition, once the user information is updated, the user's associated classification needs to be recalculated, which will greatly affect the utility of the resulting data.

根據本發明的一個實施例,公開一種基於圖的分析用戶的方法,維護以物件為頂點,物件與物件之間的關聯資訊為邊的圖,其中,所述物件包括用戶和商戶, 所述邊指示用戶和商戶的關聯關係,所述方法包括:A.資料特徵解析過程,包括:對用戶、商戶、以及用戶與和商戶之間發生的資料記錄進行解析,獲取關鍵資訊,其中,關鍵資訊包括用戶標識、商戶標識、在用戶與商戶之間產生的消費資訊;利用獲取的關鍵資訊產生該圖的頂點資訊和邊資訊,其中將用戶標識和商戶標識作為頂點資訊、將消費資訊作為邊資訊;B.關聯分析過程,包括:至少基於與第一用戶關聯的一個或多個商戶,分析該第一用戶與其他用戶關聯。 According to an embodiment of the present invention, a method for analyzing a user based on a graph is disclosed, which maintains an image with an object as a vertex and an association information between the object and the object as an edge, wherein the object includes a user and a merchant. The edge indicates the association relationship between the user and the merchant, and the method includes: A. the data feature parsing process, including: parsing the data records occurring between the user, the merchant, and the user and the merchant to obtain key information, where The key information includes user identification, merchant identification, consumption information generated between the user and the merchant; the key information obtained by the user is used to generate vertex information and side information of the graph, wherein the user identifier and the merchant logo are used as vertex information, and the consumption information is taken as Side information; B. Association analysis process, comprising: analyzing the first user to be associated with other users based at least on one or more merchants associated with the first user.

根據本發明的一個實施例,公開基於圖的分析用戶的系統,維護以物件為頂點,物件與物件之間的關聯資訊為邊的圖,其中,所述物件包括用戶和商戶,所述邊指示用戶和商戶的關聯關係,所述系統包括:A.資料特徵解析模組,被配置成:對用戶、商戶、以及用戶與和商戶之間發生的資料記錄進行解析,獲取關鍵資訊,其中,關鍵資訊包括用戶標識、商戶標識、在用戶與商戶之間產生的消費資訊;利用獲取的關鍵資訊產生該圖的頂點資訊和邊資訊,其中將用戶標識和商戶標識作為頂點資訊、將消費資訊作為邊資訊;B.關聯分析模組,被配置成:至少基於與第一用戶關聯的一個或多個商戶,分析該第一用戶與其他用戶關聯。 According to an embodiment of the present invention, a graph-based analysis user system is disclosed, which maintains an image with an object as a vertex and an association information between an object and an object as an edge, wherein the object includes a user and a merchant, and the side indication The relationship between the user and the merchant, the system includes: A. a data feature parsing module configured to: parse data records generated between the user, the merchant, and the user and the merchant to obtain key information, wherein the key The information includes user identification, merchant identification, consumption information generated between the user and the merchant, and uses the acquired key information to generate vertex information and side information of the graph, wherein the user identifier and the merchant logo are used as vertex information, and the consumption information is used as the edge. Information; B. The association analysis module is configured to analyze the first user to be associated with other users based on at least one or more merchants associated with the first user.

本發明的技術方案通過縮短資料更新以及資料分析的時間,有效提升資料的時效性,提高大資料環境下海量資料關聯分析和分類分析的效率。通過構建用戶和 商戶的關係圖、強弱關聯分析、邊分割分類等方法加快了分析處理的速度。同時,基於可即時更新的圖存儲架構,可提供准即時的資料分析能力。 The technical solution of the invention can effectively improve the timeliness of the data by shortening the time of data update and data analysis, and improve the efficiency of the association analysis and classification analysis of the massive data in the large data environment. By building users and The merchant's relationship diagram, strong-wet correlation analysis, and edge-segment classification accelerate the speed of analysis and processing. At the same time, based on the instant updateable graph storage architecture, it provides near-instant data analysis capabilities.

當結合附圖閱讀以下描述時也將理解本發明的實施例的其他特徵和優勢,其中附圖借助於實例示出了本發明的實施例的原理。 Other features and advantages of the embodiments of the present invention will be understood from the description of the appended claims.

200、210、220、300、310、320‧‧‧步驟 200, 210, 220, 300, 310, 320‧ ‧ steps

圖1是根據本發明實施例的基於以物件為頂點,物件與物件之間的關聯資訊為邊的圖來分析用戶的示意圖。 1 is a schematic diagram of analyzing a user based on a graph in which an object is a vertex and an association information between an object and an object is an edge, according to an embodiment of the present invention.

圖2是根據本發明實施例的基於圖的分析用戶的方法流程圖。 2 is a flow chart of a method for analyzing a user based on a graph, in accordance with an embodiment of the present invention.

圖3是根據本發明實施例的基於圖的分析用戶的系統示意圖。 3 is a schematic diagram of a graph-based system for analyzing users in accordance with an embodiment of the present invention.

在下文中,將結合實施例描述本發明的原理。應當理解的是,給出的實施例只是為了本領域技術人員更好地理解並且實踐本發明,而不是限制本發明的範圍。例如,本說明書中包含許多具體的實施細節不應被解釋為對發明的範圍或可能被要求保護的範圍的限制,而是應該被視為特定於實施例的描述。例如,在各實施例的上下文描述的特徵可被組合在單一實施例中來實施。在單一 實施例的上下文中描述的特徵可在多個實施例來實施。 Hereinafter, the principles of the present invention will be described in conjunction with the embodiments. It is to be understood that the present invention is not limited by the scope of the invention. For example, many specific implementation details are included in the description, and should not be construed as limiting the scope of the invention or the scope of the invention. For example, features described in the context of various embodiments can be implemented in a single embodiment. In a single Features described in the context of the embodiments can be implemented in various embodiments.

本發明提出基於圖存儲模型對要處理的資料的進行即時存儲和更新。圖是一種資料結構,定義為:graph=(V,E)。V是一個非空有限集合,代表頂點(節點),E代表邊的集合,一般用(Vx,Vy)表示,其中,Vx,Vy屬於V。若兩個結點U、V之間有一條邊連接,則稱這兩個結點U、V是關聯的。可以用帶權圖表示兩個相鄰頂點之間的除連接關係以外的其他關係。 The present invention proposes an instant storage and update of the material to be processed based on the graph storage model. A graph is a data structure defined as: graph = (V, E). V is a non-empty finite set representing vertices (nodes), and E is a set of edges, generally represented by (Vx, Vy), where Vx, Vy belong to V. If there is an edge connection between the two nodes U and V, the two nodes U and V are said to be associated. A weighted graph can be used to represent relationships other than the connected relationship between two adjacent vertices.

基於這樣的概念,本發明提出維護以物件為頂點,物件與物件之間的關聯資訊為邊的圖,以便基於圖的關聯模型和演算法進行物件(用戶或者商戶)之間的關聯分析來提高資料分析的性能和效率。在本發明中,用戶可以是例如銀行卡的用戶或者任何使用網路服務(例如,網上購物)的用戶,商戶可以是提供產品或者服務的任何實體(例如,實體商戶或者網路商戶)。 Based on such a concept, the present invention proposes to maintain an image with the object as a vertex and the associated information between the object and the object as an edge, so as to improve the association analysis between the object (user or merchant) based on the association model and algorithm of the graph. Performance and efficiency of data analysis. In the present invention, the user may be a user such as a bank card or any user using a web service (e.g., online shopping), and the merchant may be any entity (e.g., a physical merchant or an online merchant) that provides a product or service.

圖1是根據本發明實施例的基於以物件為頂點,物件與物件之間的關聯資訊為邊的圖來分析用戶的示意圖。圖1示出用戶1-7、商戶1-4,該11個對象通過用戶的消費(購買產品或者服務)行為被聯繫起來,並且形成圖。例如,用戶1在商戶1消費後,則建立用戶1與商戶1的連接。圖1中的圖的頂點代表物件,頂點與頂點之間的邊指示這兩個頂點之間的關聯資訊。例如,將用戶標識和商戶標識作為頂點資訊。作為邊資訊的關聯資訊可以是用戶與商戶之間發生的消費的資訊。例如,用戶在商戶 的消費事件發生的時間、時段、地點、頻率,消費金額,消費商品種類,或者商戶標識。本發明提出根據用戶的消費行為特徵產生以用戶和商戶為頂點的圖,並且根據該圖來估計用戶和商戶、用戶和用戶之間的關聯性。 1 is a schematic diagram of analyzing a user based on a graph in which an object is a vertex and an association information between an object and an object is an edge, according to an embodiment of the present invention. FIG. 1 shows users 1-7, merchants 1-4, which are linked by a user's consumption (purchase product or service) behavior, and form a map. For example, after the user 1 consumes the merchant 1, the connection between the user 1 and the merchant 1 is established. The vertices of the graph in Figure 1 represent objects, and the edges between vertices and vertices indicate association information between the two vertices. For example, the user ID and the merchant ID are used as vertex information. The associated information as side information may be information about consumption occurring between the user and the merchant. For example, the user is at the merchant The time, time of day, location, frequency, consumption amount, type of consumer goods, or merchant identification of the occurrence of the consumer event. The present invention proposes to generate a graph with the user and the merchant as vertices according to the consumption behavior characteristics of the user, and estimate the association between the user and the merchant, the user, and the user according to the map.

在圖1示出的示例中,可以根據特定分析的需求,根據商戶標識和消費資訊來過濾圖1中的頂點。 In the example shown in FIG. 1, the vertices in FIG. 1 can be filtered based on the merchant identification and consumption information according to the needs of the particular analysis.

在一個示例中,當分析用戶1時,可以首先過濾具有特定商戶標識的商戶3(例如,便利店),然後在剩下的與用戶1關聯的商戶中,查找與用戶1具有預定數量以上(較強的關聯度)的共同關聯商戶的用戶。例如,預定數量可以設置為3,那麼在該示例中,用戶4與用戶1的關聯度較強。 In one example, when analyzing user 1, a merchant 3 (eg, a convenience store) having a specific merchant identification may be first filtered, and then among the remaining merchants associated with the user 1, the search has a predetermined number or more with the user 1 ( Strong correlation degree) of users of jointly associated merchants. For example, the predetermined number can be set to 3, and in this example, the degree of association between the user 4 and the user 1 is strong.

在一個示例中,直接指定分析與商戶2相關的用戶之間的關係。然後,可以設置過濾條件為在一定時間段之內在商戶2消費的金額大於預定值(較強的關聯度)。根據該過濾條件,考慮用戶1、4、5、7與商戶2之間的消費資訊(邊資訊),可以知道用戶1、4、5、7中哪些用戶關於商戶2關聯度較強。 In one example, the relationship between the users associated with merchant 2 is directly specified. Then, the filtering condition can be set such that the amount consumed by the merchant 2 within a certain period of time is greater than a predetermined value (strong correlation degree). According to the filtering condition, considering the consumption information (side information) between the users 1, 4, 5, and 7 and the merchant 2, it can be known which of the users 1, 4, 5, and 7 are highly correlated with respect to the merchant 2.

本領域技術人員可以理解的是,還可以基於商戶標識和消費資訊的一項或多項資訊(例如,時間、時段、地點、頻率,消費金額,消費商品種類的一個或多個以及它們的各種組合)來分析一個用戶與商戶、以及該用戶與其他用戶之間的關聯關係。 It will be understood by those skilled in the art that one or more pieces of information (eg, time, time of day, location, frequency, amount of consumption, amount of consumer goods, and various combinations thereof) based on the merchant identification and consumption information may also be used. ) to analyze the relationship between a user and a merchant, and the user and other users.

通過基於圖的關聯分析,可以快速地分析用 戶群體、特定用戶的喜好趨勢、潛在喜好。為促進對本發明的理解,下文還將描述其他示例。但這些示例不應被視為是限制性的。 Quickly analyze with graph-based correlation analysis Household groups, specific user preferences, and potential preferences. Other examples are also described below to facilitate an understanding of the present invention. However, these examples should not be considered limiting.

圖2是根據本發明實施例的基於圖的分析用戶的方法流程圖。在該方法中,維護以物件為頂點,物件與物件之間的關聯資訊為邊的圖,其中,所述物件包括用戶和商戶,所述邊指示用戶和商戶的關聯關係,所述方法包括資料特徵解析過程200和關聯分析過程300。 2 is a flow chart of a method for analyzing a user based on a graph, in accordance with an embodiment of the present invention. In the method, the relationship information between the object and the object is maintained as a vertex of the object, wherein the object includes a user and a merchant, and the edge indicates a relationship between the user and the merchant, and the method includes data. Feature resolution process 200 and association analysis process 300.

資料特徵解析過程200,包括:步驟210:對用戶、商戶、以及用戶與和商戶之間發生的資料記錄進行解析,獲取關鍵資訊,其中,關鍵資訊包括用戶標識、商戶標識、在用戶與商戶之間產生的消費資訊;步驟220:利用獲取的關鍵資訊產生該圖的頂點資訊和邊資訊,其中將用戶標識和商戶標識作為頂點資訊、將消費資訊作為邊資訊;關聯分析過程300,包括:至少基於與第一用戶關聯的一個或多個商戶,分析該第一用戶與其他用戶關聯。 The data feature parsing process 200 includes: Step 210: Parsing data records generated between the user, the merchant, and the user and the merchant to obtain key information, wherein the key information includes the user identifier, the merchant identifier, and the user and the merchant. Inter-generational consumption information; Step 220: Using the acquired key information to generate vertex information and side information of the graph, wherein the user identifier and the merchant identifier are used as vertex information, and the consumption information is used as side information; the association analysis process 300 includes: The first user is analyzed to be associated with other users based on one or more merchants associated with the first user.

在一個實施例中,關聯分析過程300包括: In one embodiment, the association analysis process 300 includes:

步驟310:相對於第一用戶,根據預定條件過濾商戶。 Step 310: Filter the merchant according to the predetermined condition with respect to the first user.

步驟320:相對於第一用戶,根據預定條件過濾其他用戶。 Step 320: Filter other users according to predetermined conditions with respect to the first user.

由此,通過為商戶標識和消費資訊設置過濾 條件,可以在圖中迅速地第一用戶進行分析,找出與第一用戶具有較強關聯性的商戶或者其他用戶。 Thus, by filtering for business and consumer information Conditions, the first user can be quickly analyzed in the figure to find a merchant or other user who has strong affinity with the first user.

在一個示例中,在關聯分析過程中,根據商戶標識過濾與第一用戶關聯的所述一個或多個商戶,得到經過濾的一個或多個商戶。例如,通過商戶標識,將便利店、特定百貨商場、公共事業繳費單位、特定酒店從所述一個或多個商戶排除。這些被排除的商戶在特定的分析中可以被視為與第一用戶具有弱關聯度或者分析價值較低的物件。然而,根據分析需求的不同,在其他示例中,可以將這些商戶納入考慮範圍。 In one example, in the association analysis process, the one or more merchants associated with the first user are filtered according to the merchant identity to obtain the filtered one or more merchants. For example, a convenience store, a specific department store, a utility payment unit, and a specific hotel are excluded from the one or more merchants by a merchant logo. These excluded merchants can be considered as objects with weak relevance or low analytical value to the first user in a particular analysis. However, depending on the needs of the analysis, in other examples, these merchants can be considered.

在另一個示例中,在關聯分析過程中,根據消費資訊過濾與第一用戶關聯的所述一個或多個商戶,得到經過濾的一個或多個商戶。例如,將消費資訊中的單比消費金額的數額小於預定值的商戶排除,和/或將消費資訊中最後消費事件發生時間在特定時間以前的商戶排除。可選的,消費資訊中的消費頻率、消費產品或者服務的類型也可以被納入考慮範圍。 In another example, during the association analysis process, the one or more merchants associated with the first user are filtered according to the consumption information to obtain the filtered one or more merchants. For example, the merchant in the consumption information is excluded from the merchant whose amount of the consumption amount is less than the predetermined value, and/or the merchant in the consumption information before the time of the last consumption event is excluded before the specific time. Alternatively, the frequency of consumption, the type of consumer product, or the type of service in the consumer information can also be considered.

可以理解的是,可以結合商戶標識和消費資訊篩選與第一用戶相關聯的商戶中的商戶。如上所述,可以通過為商戶標識和消費資訊設置過濾條件,可以在圖中迅速地第一用戶進行分析,找出與第一用戶具有較強關聯性或者是符合特定關聯性的商戶。 It can be understood that the merchants in the merchants associated with the first user can be filtered in conjunction with the merchant identification and consumption information. As described above, by setting filter conditions for merchant identification and consumption information, the first user can quickly analyze in the figure to find a merchant that has strong relevance to the first user or that is in a specific association.

在一個實施例中,在關聯分析過程中,在圖中確定與該經過濾的一個或多個商戶關聯的其他用戶。通 過首先確定商戶,再將第一用戶關聯至其他用戶,可以大大減少計算量、提高分析效率。 In one embodiment, other users associated with the filtered one or more merchants are determined in the map during the association analysis process. through By first identifying the merchant and then associating the first user to other users, the amount of calculation can be greatly reduced and the analysis efficiency can be improved.

在一個示例中,在關聯分析過程中,進一步根據商戶標識,從其他用戶中選擇與所述第一用戶具有強關聯性的用戶,其中根據以下預置條件確定所述第一用戶和另一用戶具有強關聯性:所述第一用戶和另一用戶共同關聯的商戶數量超過預定值。例如,將與所述第一用戶共同關聯的商戶數量超過5家的用戶視為滿足特定分析目標的群體。 In an example, in the association analysis process, a user having strong affinity with the first user is further selected from other users according to the merchant identifier, wherein the first user and another user are determined according to the following preset conditions. Strongly associative: the number of merchants that the first user and another user are associated with exceeds a predetermined value. For example, a user who has more than five merchants associated with the first user is considered to be a group that meets a specific analysis target.

在另一個示例中,在關聯分析過程中,進一步根據消費資訊,來判斷所述第一用戶與其他用戶的關聯性的強弱。例如,對於同一商戶,當判斷所述第一用戶與另一用戶在特定時間段(例如,在兩個日期之間,或者一天的某個時段之間)內的消費頻率處於相同範圍內(例如,一個月消費5至10次)時,將兩者視為具有強的關聯性。又例如,對於同一商戶,當判斷所述第一用戶與另一用戶在特定時間段內的消費金額處於相同範圍內(例如,一個月消費5至10次)時,將兩者視為具有強的關聯性。又例如,對於同一商戶,當判斷所述第一用戶與另一用戶的消費的產品或者服務的類型相同時,將兩者視為具有強的關聯性。可以理解的是,可以結合一個或多個消費因素來判斷用戶之間的關聯性。例如,還可以將消費事件的地點納入考慮範圍。 In another example, in the association analysis process, the strength of the association between the first user and other users is further determined according to the consumption information. For example, for the same merchant, when it is judged that the consumption frequency of the first user and another user within a certain time period (for example, between two dates, or a certain time of day) is in the same range (for example When the consumption is 5 to 10 times a month, the two are considered to have strong correlation. For another example, for the same merchant, when it is determined that the consumption amount of the first user and another user in a certain time period is in the same range (for example, 5 to 10 times a month), the two are regarded as strong. Relevance. For another example, for the same merchant, when it is determined that the first user and the other user consume the same type of product or service, the two are regarded as having strong relevance. It can be understood that one or more consumption factors can be combined to determine the relevance between users. For example, the location of a consumer event can also be taken into account.

以下描述一個根據本發明一個或多個實施例 的實例。在該實例中,首先對用戶到商戶的邊資訊進行弱關聯識別。定義弱關聯關係如下:商戶標識指示為便利店、特定百貨商場、或者公共事業繳費單位的商戶。針對具體用戶A進行關聯分析,獲取其的所有非弱關聯商戶。然後,通過這些非弱關聯商戶獲得對應的所有非弱關聯多個用戶B,並記錄B1至Bn與A之間的共有商戶以及消費資訊。當用戶A與用戶B1所共同關聯的商戶數量達到A所有的非弱關聯商戶的一半以上和/或消費資訊具有強關聯性(例如,如上所述的)時,可以認為A和B1屬於同一群體。由此,可實現用戶分類和商戶分類,提升面向用戶的關聯關係的分析效率,提高資料服務的品質。 One of the embodiments in accordance with the present invention is described below. An example. In this example, the user-to-business side information is first weakly identified. The weak association relationship is defined as follows: the merchant identification indicates a merchant of a convenience store, a specific department store, or a public utility payment unit. Correlate analysis for specific user A to obtain all non-weakly associated merchants. Then, through these non-weakly associated merchants, all corresponding non-weakly associated multiple user Bs are obtained, and the common merchants and consumption information between B1 and Bn and A are recorded. When the number of merchants associated with user A and user B1 reaches more than half of all non-weakly associated merchants of A and/or the consumer information has strong correlation (for example, as described above), A and B1 may be considered to belong to the same group. . Thereby, user classification and merchant classification can be realized, the analysis efficiency of the user-oriented association relationship is improved, and the quality of the data service is improved.

在一個優選的實施例中,可以根據預定的條件對整個圖進行劃分,將不滿足預定條件的邊刪除,得到一個或多個群體。 In a preferred embodiment, the entire map may be divided according to predetermined conditions, and edges that do not satisfy the predetermined condition are deleted to obtain one or more groups.

圖2所示的各個框可被視為方法步驟、和/或被視為由於運行電腦程式代碼而導致的操作、和/或被視為構建為實施相關功能的多個耦合的邏輯電路元件。儘管操作按特定的順序在圖中被描繪,但這不應被理解為要求按照所示的特定順序或按依次順序來執行這些操作,或要求所有例示的操作被執行,以達到理想的結果。在某些情況下,多工並行處理可能是有利的。 The various blocks shown in FIG. 2 may be considered as method steps, and/or as operations resulting from running computer program code, and/or as a plurality of coupled logic circuit elements constructed to implement the associated functions. Although the operations are depicted in the figures in a particular order, this should not be construed as requiring that the operations are performed in the particular order shown or in the order of the order, or that all illustrated operations are performed to achieve the desired results. In some cases, multiplex parallel processing may be advantageous.

圖3是根據本發明實施例的基於圖的分析用戶的系統示意圖。如圖所示,系統包括資料特徵解析模組、關聯分析模組、可選的索引模組。特徵解析模組用於 維護以物件為頂點,物件與物件之間的關聯資訊為邊的圖,其中,所述物件包括用戶和商戶,所述邊指示用戶和商戶的關聯關係。 3 is a schematic diagram of a graph-based system for analyzing users in accordance with an embodiment of the present invention. As shown in the figure, the system includes a data feature analysis module, an association analysis module, and an optional index module. Feature analysis module for Maintaining the object as a vertex, the association information between the object and the object is a graph of the edge, wherein the object includes a user and a merchant, and the edge indicates the relationship between the user and the merchant.

根據一個實施例,資料特徵解析模組,被配置成:對用戶、商戶、以及用戶與和商戶之間發生的資料記錄進行解析,獲取關鍵資訊,其中,關鍵資訊包括用戶標識、商戶標識、在用戶與商戶之間產生的消費資訊;利用獲取的關鍵資訊產生該圖的頂點資訊和邊資訊,其中將用戶標識和商戶標識作為頂點資訊、將消費資訊作為邊資訊。關聯分析模組,被配置成:至少基於與第一用戶關聯的一個或多個商戶,分析該第一用戶與其他用戶關聯。 According to an embodiment, the data feature parsing module is configured to: parse data records generated between the user, the merchant, and the user and the merchant to obtain key information, wherein the key information includes the user identifier, the merchant identifier, and the The consumption information generated between the user and the merchant; the key information obtained by the user is used to generate vertex information and side information of the graph, wherein the user identifier and the merchant logo are used as vertex information, and the consumption information is used as side information. The association analysis module is configured to analyze the first user to be associated with other users based on at least one or more merchants associated with the first user.

用戶與商戶之間產生的消費資訊包括以下一個或多個:消費事件發生的時間、時段、地點、頻率,消費金額,消費商品種類。 The consumption information generated between the user and the merchant includes one or more of the following: time, time, location, frequency, consumption amount, and type of consumer goods.

在其他實施例中,所述關聯分析模組被配置成:根據商戶標識過濾與第一用戶關聯的所述一個或多個商戶,得到經過濾的一個或多個商戶。 In other embodiments, the association analysis module is configured to: filter the one or more merchants associated with the first user according to the merchant identity to obtain the filtered one or more merchants.

在其他實施例中,所述關聯分析模組被配置成:根據消費資訊過濾與第一用戶關聯的所述一個或多個商戶,得到經過濾的一個或多個商戶。 In other embodiments, the association analysis module is configured to: filter the one or more merchants associated with the first user according to the consumption information to obtain the filtered one or more merchants.

在其他實施例中,所述關聯分析模組被配置成:在圖中確定與該經過濾的一個或多個商戶關聯的其他用戶。 In other embodiments, the association analysis module is configured to determine other users associated with the filtered one or more merchants in the map.

在其他實施例中,所述關聯分析模組被配置 成:進一步根據商戶標識,從其他用戶中選擇與所述第一用戶具有強關聯性的用戶,其中根據以下預置條件確定所述第一用戶和另一用戶具有強關聯性:所述第一用戶和另一用戶共同關聯的商戶數量超過預定值。 In other embodiments, the association analysis module is configured Further: selecting, according to the merchant identifier, a user having strong affinity with the first user from among other users, wherein the first user and another user are determined to have strong affinity according to the following preset conditions: the first The number of merchants that the user and the other user are associated with exceeds a predetermined value.

在其他實施例中,所述關聯分析模組被配置成:進一步根據消費資訊,來判斷所述第一用戶與其他用戶的關聯性的強弱。 In other embodiments, the association analysis module is configured to: further determine, according to the consumption information, the strength of the association between the first user and other users.

在其他實施例中,索引模組,被配置成維護以物件的關鍵資訊中的一項為鍵、以物件在該圖中的位置資訊為輔助資訊的索引。作為示例,所述索引模組被配置成,通過第一物件的關鍵資訊利用該索引定位該第一物件在該圖中的位置,根據該第一物件在該圖中的位置找出與該第一物件關聯的其他物件。索引模組可以被配置成維護以物件的關鍵資訊的一項(例如,用戶ID或者商戶ID)為鍵、以物件在該圖中的位置資訊為輔助資訊的索引。這裏,位置資訊指示該物件所對應的頂點在圖的存儲結構(例如,鄰接矩陣、鄰接表等)中的與其他物件的位置關係。圖分析模組通過索引能夠快速定位物件在圖中的位置。 In other embodiments, the indexing module is configured to maintain an index of the auxiliary information by using one of the key information of the object as a key and the location information of the object in the figure. As an example, the indexing module is configured to use the index to locate the position of the first object in the figure by using the index information of the first object, and find the location according to the position of the first object in the figure. Other objects associated with an object. The indexing module can be configured to maintain an item (eg, user ID or merchant ID) of the key information of the object as a key, and the position information of the object in the figure as an index of the auxiliary information. Here, the location information indicates the positional relationship of the vertex corresponding to the object with other objects in the storage structure of the graph (for example, the adjacency matrix, the adjacency list, etc.). The graph analysis module can quickly locate the position of the object in the graph by indexing.

基於資料特徵解析模組、關聯分析模組、索引模組、可以高效地進行更新操作和分析操作。在更新操作過程中,當物件的關鍵資訊發生變化時,即時地更新該物件在該圖中的頂點資訊和邊資訊。 Based on the data feature analysis module, the association analysis module, and the index module, the update operation and the analysis operation can be performed efficiently. During the update operation, when the key information of the object changes, the vertex information and the side information of the object in the figure are updated in real time.

示例性實施例可在硬體、軟體或其組合中來 實施。例如,本發明的某些方面可在硬體中實施,而其他方面則可在軟體中實施。儘管本發明的示例性實施例的方面可被示出和描述為框圖、流程圖,但很好理解的是,這裏描述的這些裝置、或方法可在作為非限制性實例的系統中被實現為功能模組。此外,上述裝置不應被理解為要求在所有的實施例中進行這種分離,而應該被理解為所描述的程式元件和系統通常可以被集成在單一的軟體產品中或打包成多個軟體產品。 Exemplary embodiments may be in hardware, software, or a combination thereof Implementation. For example, certain aspects of the invention may be implemented in a hardware, while other aspects may be implemented in a software. Although aspects of the exemplary embodiments of the present invention may be shown and described as a block diagram, a flowchart, it is well understood that the devices, or methods described herein may be implemented in a system that is a non-limiting example It is a function module. Furthermore, the above-described apparatus should not be construed as requiring such separation in all embodiments, but it should be understood that the described program elements and systems can generally be integrated into a single software product or packaged into multiple software products. .

相關領域的技術人員當結合附圖閱讀前述說明書時,對本發明的前述示例性實施例的各種修改和變形對於相關領域的技術人員會變得明顯。因此,本發明的實施例不限於所公開的特定實施例,並且變形例和其他實施例意在涵蓋在所附權利要求的範圍內。 Various modifications and variations of the above-described exemplary embodiments of the present invention will become apparent to those skilled in Therefore, the embodiments of the invention are not limited to the specific embodiments disclosed, and the modifications and other embodiments are intended to be included within the scope of the appended claims.

Claims (18)

一種基於圖的分析用戶的方法,維護以物件為頂點,物件與物件之間的關聯資訊為邊的圖,其中,所述物件包括用戶和商戶,所述邊指示用戶和商戶的關聯關係,所述方法包括:A.資料特徵解析過程,包括:對用戶、商戶、以及用戶與和商戶之間發生的資料記錄進行解析,獲取關鍵資訊,其中,關鍵資訊包括用戶標識、商戶標識、在用戶與商戶之間產生的消費資訊;利用獲取的關鍵資訊產生該圖的頂點資訊和邊資訊,其中將用戶標識和商戶標識作為頂點資訊、將消費資訊作為邊資訊;B.關聯分析過程,包括:至少基於與第一用戶關聯的一個或多個商戶,分析該第一用戶與其他用戶關聯,在關聯分析過程中,根據商戶標識過濾與第一用戶關聯的所述一個或多個商戶,得到經過濾的一個或多個商戶。 A method for analyzing a user based on a graph, maintaining a graph in which an object is a vertex, and an association information between the object and the object is an edge, wherein the object includes a user and a merchant, and the edge indicates an association relationship between the user and the merchant. The method includes: A. a data feature parsing process, comprising: parsing data records generated between a user, a merchant, and a user and a merchant, and obtaining key information, wherein the key information includes a user identifier, a merchant identifier, and a user Consumption information generated between merchants; using the acquired key information to generate vertex information and side information of the graph, wherein the user identifier and the merchant logo are used as vertex information, and the consumer information is used as side information; B. the association analysis process includes: at least And analyzing, by the one or more merchants associated with the first user, the first user is associated with another user, and in the association analysis process, filtering the one or more merchants associated with the first user according to the merchant identifier, and obtaining filtering One or more merchants. 如請求項1所述的方法,其中,用戶與商戶之間產生的消費資訊包括以下一個或多個:消費事件發生的時間、時段、地點、頻率,消費金額,消費商品種類。 The method of claim 1, wherein the consumption information generated between the user and the merchant comprises one or more of the following: a time, a time slot, a place, a frequency, a consumption amount, and a consumer product category in which the consumption event occurs. 如請求項1所述的方法,其中, 在關聯分析過程中,根據消費資訊過濾與第一用戶關聯的所述一個或多個商戶,得到經過濾的一個或多個商戶。 The method of claim 1, wherein In the association analysis process, the one or more merchants associated with the first user are filtered according to the consumption information, and the filtered one or more merchants are obtained. 如請求項1或者3所述的方法,其中,在關聯分析過程中,在圖中確定與該經過濾的一個或多個商戶關聯的其他用戶。 The method of claim 1 or 3, wherein in the association analysis process, other users associated with the filtered one or more merchants are determined in the map. 如請求項1或者3所述的方法,其中,在關聯分析過程中,進一步根據商戶標識,從其他用戶中選擇與所述第一用戶具有強關聯性的用戶,其中根據以下預置條件確定所述第一用戶和另一用戶具有強關聯性:所述第一用戶和另一用戶共同關聯的商戶數量超過預定值。 The method of claim 1 or 3, wherein, in the association analysis process, the user who has strong affinity with the first user is further selected from other users according to the merchant identifier, wherein the determined condition is determined according to the following preset conditions. The first user and the other user have strong affinity: the number of merchants jointly associated with the first user and another user exceeds a predetermined value. 如請求項1或者3所述的方法,其中,在關聯分析過程中,進一步根據消費資訊,來判斷所述第一用戶與其他用戶的關聯性的強弱。 The method of claim 1 or 3, wherein in the association analysis process, the strength of the association between the first user and other users is further determined according to the consumption information. 如請求項1所述的方法,其中,該方法包括:維護以物件的關鍵資訊中的一項為鍵、以物件在該圖中的位置資訊為輔助資訊的索引。 The method of claim 1, wherein the method comprises: maintaining one of the key information of the object as a key, and using the location information of the object in the figure as an index of the auxiliary information. 如請求項7所述的方法,其中,通過第一物件的關鍵資訊利用該索引定位該第一物件在該圖中的位置,根據該第一物件在該圖中的位置找出與 該第一物件關聯的其他物件。 The method of claim 7, wherein the index of the first object is located by the index of the first object, and the position of the first object in the figure is determined according to the position of the first object in the figure. Other objects associated with the first object. 如請求項8所述的方法,其中,通過分散式架構存儲所述圖和所述索引。 The method of claim 8, wherein the map and the index are stored by a decentralized architecture. 一種基於圖的分析用戶的系統,維護以物件為頂點,物件與物件之間的關聯資訊為邊的圖,其中,所述物件包括用戶和商戶,所述邊指示用戶和商戶的關聯關係,所述系統包括:A.資料特徵解析模組,被配置成:對用戶、商戶、以及用戶與和商戶之間發生的資料記錄進行解析,獲取關鍵資訊,其中,關鍵資訊包括用戶標識、商戶標識、在用戶與商戶之間產生的消費資訊;利用獲取的關鍵資訊產生該圖的頂點資訊和邊資訊,其中將用戶標識和商戶標識作為頂點資訊、將消費資訊作為邊資訊;B.關聯分析模組,被配置成:至少基於與第一用戶關聯的一個或多個商戶,分析該第一用戶與其他用戶關聯,所述關聯分析模組被配置成:根據商戶標識過濾與第一用戶關聯的所述一個或多個商戶,得到經過濾的一個或多個商戶。 A graph-based analysis user system maintains an object with a vertices, and the association information between the object and the object is a side graph, wherein the object includes a user and a merchant, and the edge indicates a relationship between the user and the merchant. The system includes: A. a data feature parsing module configured to parse data records generated between a user, a merchant, and a user and a merchant to obtain key information, wherein the key information includes a user identifier, a merchant identifier, The consumption information generated between the user and the merchant; the key information obtained by the user is used to generate the vertex information and the side information of the graph, wherein the user identifier and the merchant logo are used as vertex information, and the consumption information is used as side information; B. association analysis module And configured to: associate the first user with other users based on at least one or more merchants associated with the first user, the association analysis module configured to: filter the association associated with the first user according to the merchant identifier Describe one or more merchants to get filtered one or more merchants. 如請求項10所述的系統,其中,用戶與商戶之間產生的消費資訊包括以下一個或多個:消費事件發生的時間、時段、地點、頻率,消費金 額,消費商品種類。 The system of claim 10, wherein the consumption information generated between the user and the merchant comprises one or more of the following: a time, a time, a place, a frequency of the consumption event, and a consumption gold Amount, the type of consumer goods. 如請求項10所述的系統,其中,所述關聯分析模組被配置成:根據消費資訊過濾與第一用戶關聯的所述一個或多個商戶,得到經過濾的一個或多個商戶。 The system of claim 10, wherein the association analysis module is configured to: filter the one or more merchants associated with the first user based on the consumption information to obtain the filtered one or more merchants. 如請求項10或者12所述的系統,其中,所述關聯分析模組被配置成:在圖中確定與該經過濾的一個或多個商戶關聯的其他用戶。 The system of claim 10 or 12, wherein the association analysis module is configured to determine other users associated with the filtered one or more merchants in the map. 如請求項10或者12所述的系統,其中,所述關聯分析模組被配置成:進一步根據商戶標識,從其他用戶中選擇與所述第一用戶具有強關聯性的用戶,其中根據以下預置條件確定所述第一用戶和另一用戶具有強關聯性:所述第一用戶和另一用戶共同關聯的商戶數量超過預定值。 The system of claim 10 or 12, wherein the association analysis module is configured to: further select, according to the merchant identifier, a user having strong affinity with the first user from among other users, wherein The condition determines that the first user and another user have strong affinity: the number of merchants that the first user and another user are associated with exceeds a predetermined value. 如請求項10或者12所述的系統,其中,所述關聯分析模組被配置成:進一步根據消費資訊,來判斷所述第一用戶與其他用戶的關聯性的強弱。 The system of claim 10 or 12, wherein the association analysis module is configured to: further determine, according to the consumption information, the strength of the association between the first user and other users. 如請求項10所述的系統,其中,該系統還包括:索引模組,被配置成維護以物件的關鍵資訊中的一項為鍵、以物件在該圖中的位置資訊為輔助資訊的索引。 The system of claim 10, wherein the system further comprises: an indexing module configured to maintain an index of the auxiliary information by using one of the key information of the object as a key and the location information of the object in the figure. . 如請求項16所述的系統,其中, 所述索引模組被配置成,通過第一物件的關鍵資訊利用該索引定位該第一物件在該圖中的位置,根據該第一物件在該圖中的位置找出與該第一物件關聯的其他物件。 The system of claim 16, wherein The indexing module is configured to use the index to locate the position of the first object in the figure by using the index information of the first object, and find the association with the first object according to the position of the first object in the figure. Other objects. 如請求項17所述的系統,其中,所述系統通過分散式架構存儲所述圖和所述索引。 The system of claim 17, wherein the system stores the map and the index through a decentralized architecture.
TW105143938A 2015-12-31 2016-12-29 Graph-based method and system for analyzing users TWI621989B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511028240.3A CN105678323A (en) 2015-12-31 2015-12-31 Image-based-on method and system for analysis of users

Publications (2)

Publication Number Publication Date
TW201725499A TW201725499A (en) 2017-07-16
TWI621989B true TWI621989B (en) 2018-04-21

Family

ID=56189899

Family Applications (1)

Application Number Title Priority Date Filing Date
TW105143938A TWI621989B (en) 2015-12-31 2016-12-29 Graph-based method and system for analyzing users

Country Status (3)

Country Link
CN (1) CN105678323A (en)
TW (1) TWI621989B (en)
WO (1) WO2017114276A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105678323A (en) * 2015-12-31 2016-06-15 中国银联股份有限公司 Image-based-on method and system for analysis of users
CN107316205A (en) * 2017-05-27 2017-11-03 银联智惠信息服务(上海)有限公司 Recognize humanized method, device, computer-readable medium and the system of holding
CN109947865B (en) * 2018-09-05 2023-06-30 中国银联股份有限公司 Merchant classifying method and merchant classifying system
CN111951035A (en) * 2019-05-17 2020-11-17 上海树融数据科技有限公司 Consumption analysis method, system, device and consumption analysis platform
CN111782847A (en) * 2019-07-31 2020-10-16 北京京东尚科信息技术有限公司 Image processing method, apparatus and computer-readable storage medium
CN111127089B (en) * 2019-12-18 2023-09-19 北京数衍科技有限公司 Bill data processing method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200849033A (en) * 2007-05-04 2008-12-16 Microsoft Corp Web page analysis using multiple graphs
TW201224808A (en) * 2010-08-30 2012-06-16 Ibm Method for classification of objects in a graph data stream
US20130282486A1 (en) * 2012-04-18 2013-10-24 Bruno Rahle Structured information about nodes on a social networking system
CN103838804A (en) * 2013-05-09 2014-06-04 电子科技大学 Social network user interest association rule mining method based on community division

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080300940A1 (en) * 2007-05-31 2008-12-04 Gosakan Aravamudan Capturing Consumer Requirements
CN102254028A (en) * 2011-07-22 2011-11-23 青岛理工大学 Personalized commodity recommending method and system which integrate attributes and structural similarity
CN102929892A (en) * 2011-08-12 2013-02-13 莫润刚 Accurate information promoting system and method based on social network
CN104915879B (en) * 2014-03-10 2019-08-13 华为技术有限公司 The method and device that social relationships based on finance data are excavated
CN105678323A (en) * 2015-12-31 2016-06-15 中国银联股份有限公司 Image-based-on method and system for analysis of users

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200849033A (en) * 2007-05-04 2008-12-16 Microsoft Corp Web page analysis using multiple graphs
TW201224808A (en) * 2010-08-30 2012-06-16 Ibm Method for classification of objects in a graph data stream
US20130282486A1 (en) * 2012-04-18 2013-10-24 Bruno Rahle Structured information about nodes on a social networking system
CN103838804A (en) * 2013-05-09 2014-06-04 电子科技大学 Social network user interest association rule mining method based on community division

Also Published As

Publication number Publication date
WO2017114276A1 (en) 2017-07-06
CN105678323A (en) 2016-06-15
TW201725499A (en) 2017-07-16

Similar Documents

Publication Publication Date Title
TWI621989B (en) Graph-based method and system for analyzing users
WO2017211191A1 (en) Method and device for pushing information
CN109840533B (en) Application topological graph identification method and device
CN109561326B (en) Data query method and device
CN107220266B (en) Method and device for creating service database, storing service data and determining service data
CN107102999B (en) Correlation analysis method and device
US20170140309A1 (en) Database analysis device and database analysis method
TW200919220A (en) Method and system for constructing data tag based on a concept relation network
CN111459985A (en) Identification information processing method and device
CN112650482A (en) Recommendation method and related device for logic component
CA3152848A1 (en) User identifying method and device, and computer equipment
WO2017203672A1 (en) Item recommendation method, item recommendation program, and item recommendation apparatus
WO2017158802A1 (en) Data conversion system and data conversion method
CN105335386A (en) Method and apparatus for providing navigation tag
CN112287102B (en) Data mining method and device
CN114723554B (en) Abnormal account identification method and device
JP2016014944A (en) Correlation rule analysis device and correlation rule analysis method
TWI686704B (en) Graph-based data processing method and system
CN111382343B (en) Label system generation method and device
CN108614811B (en) Data analysis method and device
Preethi et al. Data Mining In Banking Sector
CN105335385A (en) Project-based collaborative filtering recommendation method and device
JP6535591B2 (en) Image recognition apparatus and operation method of image recognition apparatus
CN110929207A (en) Data processing method, device and computer readable storage medium
CN112765216A (en) Data batch processing method based on Internet of things