TW202123118A - Relation network construction method and device based on privacy protection - Google Patents

Relation network construction method and device based on privacy protection Download PDF

Info

Publication number
TW202123118A
TW202123118A TW109115721A TW109115721A TW202123118A TW 202123118 A TW202123118 A TW 202123118A TW 109115721 A TW109115721 A TW 109115721A TW 109115721 A TW109115721 A TW 109115721A TW 202123118 A TW202123118 A TW 202123118A
Authority
TW
Taiwan
Prior art keywords
composite
node
nodes
candidate
composite node
Prior art date
Application number
TW109115721A
Other languages
Chinese (zh)
Other versions
TWI724896B (en
Inventor
張屹綮
肖凱
王維強
Original Assignee
大陸商支付寶(杭州)信息技術有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 大陸商支付寶(杭州)信息技術有限公司 filed Critical 大陸商支付寶(杭州)信息技術有限公司
Application granted granted Critical
Publication of TWI724896B publication Critical patent/TWI724896B/en
Publication of TW202123118A publication Critical patent/TW202123118A/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The embodiment of the invention provides a relation network construction method and device based on privacy protection. According to the embodiment of the invention, when the user relationship network is provided, the user relationship is aggregated and noise is added in advance to form the relationship network meeting the differential privacy, so that the data processing amount is reduced and the effectiveness of the user relationship network is improved on the basis of effectively protecting the privacy of the user relationship. Further, the relation network based on privacy protection is used for exploring a user group. The method is not limited to a specific data holder; any data processing party with calculation power can identify a candidate composite node set in the relational network through the group recognition model; the user ID contained in the user group is inquired and determined through the data holder of the initial relation network so as to be provided for the corresponding service party, so that the convenience of group recognition can be improved on the basis of ensuring the data safety.

Description

基於隱私保護的關係網路構建方法及裝置Method and device for constructing relational network based on privacy protection

本說明書一個或多個實施例涉及電腦技術領域,尤其涉及基於隱私保護的關係網路構建方法及裝置。 One or more embodiments of this specification relate to the field of computer technology, and in particular to a method and device for constructing a relational network based on privacy protection.

隨著大數據化的發展趨勢,關係網路的應用越來越廣泛。關係網路往往用於描述多個實體之間的關聯關係。例如,將使用者作為實體,關係網路中的每個節點對應有個使用者,節點之間的邊對應使用者之間的連接關係,可以描述出一個人際關係網路。關係網路應用過程中,可能涉及一些團體活動資料,例如,透過人際關係網路輸出具有聚集性的帳戶資料,作為打擊批量攻擊和有組織的黑產攻擊的有效手段。這種團體活動資料如果涉及諸如好友資料、轉帳資料、同設備環境運算元據等具有使用者隱私的關係資料,那麼使用者隱私的關係資料就非常容易被反解析甚至洩露。 With the development trend of big data, the application of relational network is becoming more and more extensive. The relationship network is often used to describe the relationship between multiple entities. For example, taking users as entities, each node in the relationship network corresponds to a user, and the edges between nodes correspond to the connection relationship between users, which can describe an interpersonal relationship network. The application process of the relationship network may involve some group activity data. For example, the output of aggregated account data through the relationship network can be used as an effective means to combat batch attacks and organized black product attacks. If this kind of group activity data involves user-private relational data such as friend data, transfer data, and computing metadata of the same device environment, the user-private relational data is very easy to be de-analyzed or even leaked.

本說明書一個或多個實施例描述的基於隱私保護的關係網路構建方法及裝置,可以用於解決背景技術部分提到的一個或多個問題。 根據第一態樣,提供了一種基於隱私保護的關係網路構建方法,其中,其中,基於隱私保護的關係網路透過多個複合節點構成,所述多個複合節點之間透過連接邊描述關聯關係,單個複合節點對應候選關係網路中的多個原始節點,各個原始節點分別對應各個使用者,原始節點之間的連接邊描述相應使用者之間的關聯關係;所述方法包括: 獲取所述候選關係網路; 將所述候選關係網路中的原始節點按照預設的複合節點容量,劃分出多個複合節點,其中,單個複合節點對應的原始節點數量不超過所述複合節點容量; 針對所述多個複合節點,檢測兩兩之間是否存在連接邊; 基於檢測結果,利用差分隱私方式對所述多個複合節點添加邊和權重,從而構建基於隱私保護的關係網路。 在一個實施例中,所述候選關係網路透過以下方式獲取: 獲取基於第三業務方提供的多個候選使用者的使用者標識; 基於所述使用者標識,從初始關係網路中篩選出所述多個候選使用者對應的原始節點,及其預定階數內的鄰居節點,作為候選節點; 將所述候選節點構成的關係網路,作為候選關係網路。 在一個實施例中,所述將所述候選關係網路中的原始節點按照預設的複合節點容量,劃分出多個複合節點包括: 判定所述候選關係網路中的原始節點數量; 根據所述原始節點數量和所述複合節點容量,判定第一數量,所述第一數量為,在各個複合節點對應的原始節點數量與所述複合節點容量相等的情況下,最多可以劃分的複合節點數量; 從所述候選關係網路中的原始節點中,隨機選取所述第一數量的原始節點,作為各個複合節點的基準節點; 對各個基準節點,分別從所述候選關係網路中判定第二數量的原始節點,和相應基準節點一起作為相應的複合節點,所述第二數量比所述第一數量小1個單位。 在一個實施例中,所述多個複合節點包括第一複合節點和第二複合節點,所述第一複合節點對應有第一原始節點,所述第二複合節點對應有第二原始節點,所述針對所述多個複合節點,檢測兩兩之間是否存在連接邊包括: 在所述第一原始節點和所述第二原始節點之間存在連接邊的情況下,判定所述第一複合節點和所述第二複合節點之間存在連接邊。 在一個實施例中,所述檢測結果包括,各個複合節點之間的連接邊集合,以及所述連接邊集合中的連接邊數量,所述基於檢測結果,利用差分隱私方式對所述多個複合節點添加邊和權重包括: 對所述連接邊數量添加在第一隱私代價下的雜訊。 在一個實施例中,所述在第一隱私代價下的雜訊滿足縮放參數為所述第一隱私代價的倒數的拉普拉斯分佈。 在一個實施例中,所述在第一隱私代價下的雜訊為,透過預定的隨機演算法產生第一隨機值,在拉普拉斯分佈的引數為所述第一隨機值時,拉普拉斯分佈的因變數值。 在一個實施例中,所述基於檢測結果,利用差分隱私方式對所述多個複合節點添加邊和權重還包括: 從所述連接邊集合中選擇第三數量的連接邊; 為各個複合節點構造第四數量的雜訊連接邊,所述雜訊連接邊是所述連接邊集合之外的連接邊。 在一個實施例中,對所述連接邊數量添加在第一隱私代價下的雜訊後得到第五數量,各個複合節點之間的最大連接邊數量為第六數量,所述第三數量和所述第四數量的比值,與所述第五數量與所述第六數量的比值一致。 在一個實施例中,所述連接邊集合中包括第一連接邊,所述連接邊集合中的連接邊分別對應有給定一致的初始權重,所述從所述連接邊集合中選擇第三數量的連接邊包括: 對於所述第一連接邊,在給定的初始權重上,添加符合基於第二隱私代價的累積概率滿足雙邊幾何分佈的雜訊,得到相應的第一雜訊權重,所述第二隱私代價是預定的整體隱私代價與所述第一隱私代價的差; 在所述第一雜訊權重大於第一權重臨限值的情況下,選擇所述第一連接邊作為基於隱私保護的關係網路中的連接邊,並將所述第一雜訊權重作為所述第一連接邊的權重。 在一個實施例中,所述給定的初始權重為1,並且,透過以下方式為所述第一連接邊添加雜訊: 透過預定的隨機演算法為集合雙邊分佈產生預定區間內的隨機值; 判定雙邊幾何分佈的引數在得到所述隨機值的情況下引數的取值; 為所述第一連接邊添加雜訊後的權重為所述初始權重與所述引數的取值的和。 在一個實施例中,所述第一權重臨限值是對所述連接邊集合中的各個連接邊,按照所述第二隱私代價 下的高通濾波器進行單邊濾波情況下,得到第一比例的連接邊的引數臨限值,其中,所述第一比例是以下第一項與第二項的比值: 所述第一項為基於對所述連接邊數量添加在第一隱私代價下的雜訊後得到的第五數量; 所述第二項為各個複合節點之間的最大連接邊數量。 在一個實施例中,所述第四數量是按照第二隱私代價下的高通濾波器的過濾比例判定的,所述第二隱私代價是預定的整體隱私代價與所述第一隱私代價的差,所述第四數量與以下項的差的比值與所述第二隱私代價下的高通濾波器的過濾比例一致:各個複合節點之間的最大連接邊數量、基於對所述連接邊數量添加在第一隱私代價下的雜訊後得到的連接邊數量。 在一個實施例中,所述多個複合節點包括第三複合節點和第四複合節點,所述第三複合節點和所述第四複合節點之間不存在所述連接邊集合中的連接邊相連,所述為各個複合節點構造第四數量的雜訊連接邊包括: 為所述第三複合節點和所述第四複合節點添加初始權重為0的第二連接邊; 為所述第二連接邊產生滿足在所述第二隱私代價下的累積概率滿足指數分佈的雜訊權重; 在為所述第二連接邊產生的雜訊權重大於0的情況下,將所述第二聯街邊判定為添加的連接邊,所產生的雜訊權重為所述第二連接邊的權重。 在一個實施例中,透過以下方式為所述第二連接邊產生滿足在所述第二隱私代價下的指數分佈的雜訊權重: 透過預定的隨機演算法產生一個預定概率區間的隨機值; 將在所述第二隱私代價下的指數分佈取所述隨機值的情況下,引數的取值作為為所述第二連接邊產生的雜訊權重。 根據第二態樣,提供了一種在多個候選使用者中判定使用者團體的方法,所述方法包括: 獲取利用第一態樣所述的方法為所述多個候選使用者產生的基於隱私保護的關係網路; 利用預定的團體識別模型處理基於隱私保護的關係網路,得到多個複合節點集合; 從所述多個複合節點集合中判定至少一個候選複合節點集合,以供初始關係網路的資料方按照單個候選複合節點集合中的各個候選複合節點從所述多個候選使用者中判定出相應的目標使用者團體。 在一個實施例中,所述利用預定的團體識別模型處理基於隱私保護的關係網路,得到多個複合節點集合包括: 將基於隱私保護的關係網路作為初始的當前關係網路,在初始的當前關係網路中,每個複合節點作為一個社區; 執行以下模組度最大化步驟:將每個複合節點移動到與之相鄰的複合節點所在的社區中,計算以社區為節點的當前關係網路的模組度大小,並選擇使得模組度最大的一種移動方式; 對移動後在同一個社區內的複合節點合併到同一個社區,反覆運算執行所述模組度最大化步驟,直至當前關係網路的模組度不再變化; 針對各個社區,分別產生相應的各個複合節點集合。 在一個實施例中,當前關係網路的模組度透過對各個社區的節點度求和得到,當前關係網路中第一社區的節點度為,以下第一項與第二項的差: 所述第一項為,所述第一社區中總的連接邊數量與當前關係網路中總的連接邊數的比值; 所述第二項為,聚類到所述第一社區的各個複合節點的總度數與當前關係網路中總的連接邊數的2倍的比值的平方。 在一個實施例中,所述模組度最大化步驟透過以下方式之一判定:貪心演算法、模擬退火演算法、隨機遊走演算法、統計原理演算法、標籤傳播演算法、InfoMap演算法、Louvain演算法。 在一個實施例中,所述從所述多個複合節點集合中判定至少一個候選複合節點集合包括: 將複合節點的數量大於預定數量臨限值的複合節點集合判定為候選複合節點集合; 從而使得初始關係網路的資料方透過以下方式按照單個候選複合節點集合中的各個候選複合節點從所述多個候選使用者中判定出相應的目標使用者團體: 按照預先設定的映射規則,將各個候選複合節點分別映射到初始關係網路的多個初始使用者; 從所述多個初始使用者中選擇所述多個候選使用者中的使用者,並將選擇出的使用者識別為所述單個候選複合節點集合對應的目標使用者團體。 在一個實施例中,所述方法的執行主體為初始關係網路的資料方,所述多個複合節點集合包括第一複合節點集合,所述從所述多個複合節點集合中判定至少一個候選複合節點集合包括: 按照預先設定的映射規則,將所述第一複合節點集合中的各個複合節點分別映射到初始關係網路的多個初始使用者; 檢測所述多個初始使用者中,是否存在預定數量或預定比例的初始使用者,註冊時間短於預定的時間臨限值; 若存在,則將所述第一複合節點集合判定為候選複合節點集合。 根據第三態樣,提供了一種基於隱私保護的關係網路構建裝置,其中,基於隱私保護的關係網路透過多個複合節點構成,所述多個複合節點之間透過連接邊描述關聯關係,單個複合節點對應候選關係網路中的多個原始節點,各個原始節點分別對應各個使用者,原始節點之間的連接邊描述相應使用者之間的關聯關係;所述裝置包括: 獲取單元,組態為獲取所述候選關係網路; 節點構建單元,組態為將所述候選關係網路中的原始節點按照預設的複合節點容量,劃分出多個複合節點,其中,單個複合節點對應的原始節點數量不超過所述複合節點容量; 檢測單元,組態為針對所述多個複合節點,檢測兩兩之間是否存在連接邊; 邊構建單元,組態為基於檢測結果,利用差分隱私方式對所述多個複合節點添加邊和權重,從而構建基於隱私保護的關係網路。 根據第四態樣,提供了一種在多個候選使用者中判定使用者團體的裝置,所述裝置包括: 獲取單元,組態為獲取利用第三態樣的裝置為所述多個候選使用者產生的基於隱私保護的關係網路; 處理單元,組態為利用預定的團體識別模型處理基於隱私保護的關係網路,得到多個複合節點集合; 判定單元,組態為從所述多個複合節點集合中判定至少一個候選複合節點集合,以供初始關係網路的資料方按照單個候選複合節點集合中的各個候選複合節點從所述多個候選使用者中判定出相應的目標使用者團體。 根據第五態樣,提供了一種電腦可讀儲存媒體,其上儲存有電腦程式,當所述電腦程式在電腦中執行時,令電腦執行上述第一態樣或第二態樣的方法。 根據第六態樣,提供了一種計算設備,包括記憶體和處理器,其特徵在於,所述記憶體中儲存有可執行代碼,所述處理器執行所述可執行代碼時,實現上述第一態樣或第二態樣的方法。 本說明書實施例提供了基於隱私保護的關係網路構建方法和裝置,可以利用在提供使用者關係網路時,將各個使用者預先聚合,添加雜訊,形成滿足差分隱私的關係網路,從而在有效保護使用者關係隱私的基礎上,減少資料處理量,提高使用者關係網路的有效性。進一步地,基於隱私保護的關係網路用於使用者團體發掘時,不局限於特定的資料持有方,任意有計算能力的資料處理方都可以透過團體識別模型識別關係網路中的候選複合節點,並經由初始關係網路的資料持有方查詢出使用者團體中包含的使用者ID,以提供給相應業務方,如此,可以在保證資料安全的基礎上增加團體識別的便利性。 The privacy protection-based relationship network construction method and device described in one or more embodiments of this specification can be used to solve one or more of the problems mentioned in the background art section. According to the first aspect, a method for constructing a relational network based on privacy protection is provided, wherein the relational network based on privacy protection is formed by a plurality of composite nodes, and the multiple composite nodes describe the relationship through connection edges. Relationship, a single composite node corresponds to multiple original nodes in the candidate relationship network, each original node corresponds to each user, and the connection edge between the original nodes describes the association relationship between the corresponding users; the method includes: Acquiring the candidate relationship network; Dividing the original nodes in the candidate relationship network into a plurality of composite nodes according to a preset composite node capacity, wherein the number of original nodes corresponding to a single composite node does not exceed the composite node capacity; For the multiple composite nodes, detecting whether there is a connecting edge between the two; Based on the detection result, a differential privacy method is used to add edges and weights to the multiple composite nodes, thereby constructing a relationship network based on privacy protection. In one embodiment, the candidate relationship network is obtained in the following manner: Obtain user IDs based on multiple candidate users provided by a third business party; Based on the user identification, the original nodes corresponding to the multiple candidate users and their neighbor nodes within a predetermined order are selected from the initial relationship network as candidate nodes; The relationship network formed by the candidate nodes is regarded as the candidate relationship network. In an embodiment, the dividing the original node in the candidate relationship network into a plurality of composite nodes according to a preset composite node capacity includes: Determining the number of original nodes in the candidate relationship network; According to the number of original nodes and the capacity of the composite node, determine a first number, where the first number is the maximum number of composite nodes that can be divided when the number of original nodes corresponding to each composite node is equal to the capacity of the composite node Number of nodes; Randomly selecting the first number of original nodes from the original nodes in the candidate relationship network as the reference node of each composite node; For each reference node, a second number of original nodes is determined from the candidate relationship network, and the corresponding reference node is used as a corresponding composite node, and the second number is 1 unit smaller than the first number. In one embodiment, the multiple composite nodes include a first composite node and a second composite node, the first composite node corresponds to a first original node, and the second composite node corresponds to a second original node, so For the multiple composite nodes, detecting whether there is a connecting edge between the two includes: In the case that there is a connecting edge between the first original node and the second original node, it is determined that there is a connecting edge between the first composite node and the second composite node. In one embodiment, the detection result includes a set of connected edges between each composite node, and the number of connected edges in the set of connected edges. Based on the detection result, a differential privacy method is used for the multiple composites. The added edges and weights of nodes include: The noise under the first privacy cost is added to the number of connected edges. In an embodiment, the noise under the first privacy cost satisfies a Laplacian distribution whose scaling parameter is the reciprocal of the first privacy cost. In one embodiment, the noise under the first privacy cost is that a first random value is generated through a predetermined random algorithm, and when the parameter of the Laplace distribution is the first random value, pull The dependent value of the Plass distribution. In an embodiment, the adding edges and weights to the multiple composite nodes by using a differential privacy method based on the detection result further includes: Selecting a third number of connected edges from the set of connected edges; A fourth number of noise connecting edges is constructed for each composite node, and the noise connecting edges are connecting edges outside the set of connecting edges. In one embodiment, the noise under the first privacy cost is added to the number of connected edges to obtain the fifth number, the maximum number of connected edges between each composite node is the sixth number, and the third number is the same as the total number. The ratio of the fourth quantity is consistent with the ratio of the fifth quantity to the sixth quantity. In an embodiment, the set of connected edges includes a first connected edge, the connected edges in the set of connected edges respectively correspond to a given initial weight, and the third number of connected edges is selected from the set of connected edges. Connecting edges include: For the first connected edge, on a given initial weight, add noise that meets the bilateral geometric distribution based on the cumulative probability of the second privacy cost to obtain the corresponding first noise weight, and the second privacy cost is The difference between the predetermined overall privacy price and the first privacy price; In the case that the first noise weight is greater than the first weight threshold, the first connection edge is selected as the connection edge in the privacy protection-based relationship network, and the first noise weight is used as the all The weight of the first connecting edge. In one embodiment, the given initial weight is 1, and noise is added to the first connecting edge in the following manner: Generate random values in a predetermined interval for the set bilateral distribution through a predetermined random algorithm; Determine the value of the parameter of the bilateral geometric distribution when the random value is obtained; The weight after adding noise to the first connecting edge is the sum of the initial weight and the value of the parameter. In an embodiment, the first weight threshold value is a case where each connected edge in the connected edge set is subjected to unilateral filtering according to the high-pass filter under the second privacy cost, and the first ratio is obtained The threshold value of the argument of the connecting edge of, where the first ratio is the ratio of the first term to the second term: The first item is a fifth quantity obtained based on adding noise under the first privacy cost to the number of connected edges; The second term is the maximum number of connected edges between each compound node. In one embodiment, the fourth number is determined according to the filtering ratio of the high-pass filter under the second privacy cost, and the second privacy cost is the difference between the predetermined overall privacy cost and the first privacy cost, The ratio of the difference between the fourth number and the following items is consistent with the filtering ratio of the high-pass filter under the second privacy cost: the maximum number of connected edges between each composite node, based on the number of connected edges added to the first The number of connected edges obtained after noise at a privacy cost. In an embodiment, the multiple composite nodes include a third composite node and a fourth composite node, and there is no connection edge in the connected edge set between the third composite node and the fourth composite node. , Said constructing a fourth number of noise connection edges for each composite node includes: Adding a second connecting edge with an initial weight of 0 to the third composite node and the fourth composite node; Generating, for the second connecting edge, a noise weight that satisfies the cumulative probability of satisfying the exponential distribution under the second privacy cost; In the case where the noise weight generated for the second connecting edge is greater than 0, the second connecting edge is determined as an added connecting edge, and the generated noise weight is the weight of the second connecting edge. In an embodiment, a noise weight that satisfies the exponential distribution under the second privacy cost is generated for the second connecting edge in the following manner: Generate a random value with a predetermined probability interval through a predetermined random algorithm; In the case where the exponential distribution under the second privacy cost takes the random value, the value of the parameter is used as the noise weight generated for the second connection edge. According to a second aspect, there is provided a method for determining a user community among a plurality of candidate users, and the method includes: Acquiring a privacy protection-based relationship network generated for the plurality of candidate users by using the method described in the first aspect; Use the predetermined group recognition model to process the relationship network based on privacy protection to obtain multiple composite node sets; At least one candidate composite node set is determined from the multiple composite node sets, so that the data party of the initial relationship network can determine the corresponding candidate composite node from the multiple candidate users according to each candidate composite node in the single candidate composite node set. The target user group. In an embodiment, the processing of a privacy-protected relational network by using a predetermined group recognition model to obtain multiple composite node sets includes: Regard the relationship network based on privacy protection as the initial current relationship network. In the initial current relationship network, each composite node serves as a community; Perform the following modularity maximization steps: move each composite node to the community where the adjacent composite node is located, calculate the modularity of the current relationship network with the community as the node, and select the modularity The largest type of movement; After moving, the composite nodes in the same community are merged into the same community, and the module degree maximization step is performed repeatedly until the module degree of the current relationship network no longer changes; For each community, a corresponding set of composite nodes is generated. In one embodiment, the module degree of the current relationship network is obtained by summing the node degrees of each community. The node degree of the first community in the current relationship network is the difference between the following first term and the second term: The first term is the ratio of the total number of connected edges in the first community to the total number of connected edges in the current relationship network; The second term is the square of the ratio of the total degree of each compound node clustered to the first community to twice the total number of connected edges in the current relationship network. In one embodiment, the module degree maximization step is determined by one of the following methods: Greedy algorithm, Simulated Annealing algorithm, Random walk algorithm, Statistical principle algorithm, Tag propagation algorithm, InfoMap algorithm, Louvain Algorithm. In an embodiment, the determining at least one candidate composite node set from the multiple composite node sets includes: Judging a composite node set whose number of composite nodes is greater than a predetermined threshold value as a candidate composite node set; As a result, the data party of the initial relationship network determines the corresponding target user community from the multiple candidate users according to each candidate composite node in a single candidate composite node set in the following manner: According to the preset mapping rules, map each candidate composite node to multiple initial users of the initial relationship network; A user of the plurality of candidate users is selected from the plurality of initial users, and the selected user is identified as a target user community corresponding to the single candidate composite node set. In one embodiment, the execution subject of the method is the data party of the initial relationship network, the multiple composite node sets include a first composite node set, and the at least one candidate is determined from the multiple composite node sets The composite node set includes: Mapping each composite node in the first composite node set to multiple initial users of the initial relationship network according to a preset mapping rule; Detecting whether there are a predetermined number or a predetermined proportion of the initial users among the multiple initial users, and the registration time is shorter than a predetermined time threshold; If it exists, the first composite node set is determined as a candidate composite node set. According to the third aspect, a privacy protection-based relationship network construction device is provided, wherein the privacy protection-based relationship network is formed by multiple composite nodes, and the multiple composite nodes describe the association relationship through connection edges. A single composite node corresponds to multiple original nodes in the candidate relationship network, each original node corresponds to each user, and the connection edge between the original nodes describes the association relationship between the corresponding users; the device includes: An obtaining unit, configured to obtain the candidate relationship network; The node construction unit is configured to divide the original nodes in the candidate relationship network into multiple composite nodes according to a preset composite node capacity, wherein the number of original nodes corresponding to a single composite node does not exceed the composite node capacity ; The detection unit is configured to detect whether there is a connecting edge between two of the multiple composite nodes; The edge construction unit is configured to add edges and weights to the multiple composite nodes based on the detection result by using a differential privacy method, thereby constructing a relationship network based on privacy protection. According to a fourth aspect, there is provided a device for determining a user group among a plurality of candidate users, and the device includes: An acquiring unit configured to acquire a privacy protection-based relationship network generated for the plurality of candidate users by using the device of the third aspect; The processing unit is configured to process the relational network based on privacy protection by using a predetermined group recognition model to obtain a set of multiple composite nodes; The determining unit is configured to determine at least one candidate composite node set from the multiple composite node sets, so that the data party of the initial relational network can select from the multiple candidate composite nodes according to each candidate composite node in the single candidate composite node set. The corresponding target user group is determined among the users. According to a fifth aspect, there is provided a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect or the second aspect. According to a sixth aspect, there is provided a computing device, including a memory and a processor, wherein the memory stores executable code, and when the processor executes the executable code, the first Aspect or second aspect method. The embodiments of this specification provide a method and device for constructing a relationship network based on privacy protection, which can be used when providing a user relationship network by pre-aggregating various users and adding noise to form a relationship network that satisfies differential privacy. On the basis of effectively protecting user relationship privacy, reduce the amount of data processing and improve the effectiveness of the user relationship network. Furthermore, when a privacy-protected relationship network is used for user community discovery, it is not limited to a specific data holder. Any data processor with computing power can identify candidate compounds in the relationship network through the group recognition model. The node and the data holder of the initial relationship network query the user ID contained in the user community to provide it to the corresponding business party. In this way, the convenience of group identification can be increased on the basis of ensuring data security.

下面結合圖式,對本說明書提供的方案進行描述。 首先,結合圖1、圖2示出一個具體實施場景進行說明。 圖1給出了該具體實施場景的實施架構示意圖。如圖1所示,在該實施架構中,包括業務平台、業務方和使用者。業務平台用於提供使用者交流,以及業務方和使用者之間進行業務交流互動的媒介。例如支付寶平台、微信平台,等等,可以是兼顧社交和商業服務的平台。使用者可以在業務平台註冊成為註冊使用者,各個業務方可以以子應用,或者在業務平台註冊成為註冊業務方等形式為使用者提供相關業務。 業務平台可以記錄使用者在業務平台的行為資訊(如支付行為資料、轉帳行為資料、消費行為資料等等),這些行為資訊可以用來建立關係網路。關係網路中,每個節點都可以表示一個實體(如使用者、商品、商家等等),實體之間的關聯關係透過連接邊來表示,具有直接關聯關係的實體對應的節點之間透過連接邊互相連接。如圖1所示,每個圓圈代表一個實體,一條線段代表一個連接邊。具有直接關聯關係的節點可以互為一階鄰居節點。如果兩個節點中間經過一個連接邊、一個節點、另一個連接邊的路徑相連接,則這兩個節點可以相互稱為二階鄰居節點,以此類推。通常,鄰居節點的階數,與中間間隔的最少連接邊數一致。在本說明書的實施架構下,關係網路中的實體可以是使用者。 可以理解的是,圖1中的業務方、使用者僅為示例,實踐中,分別可以是任意數量,業務平台的伺服器形式也可能是伺服器集群形式,本說明書對這些都不做限定。 請參考圖2,給出在圖1的實施架構下,一個具體實施場景示意圖。在該實施場景中,計算平台預先儲存或遠端獲取基於圖1中的業務平台記錄的使用者行為資料產生的原始關係網路,該原始關係網路中以使用者在業務平台註冊的使用者ID表示使用者。業務方a疑似遭遇批量攻擊或有組織的團夥攻擊,其可以向計算平台提供自有使用者資料中的各個使用者ID。計算平台根據業務方a提供的使用者ID從原始關係網路中抽取與這些使用者相關的關係網路,作為候選關係網路,進一步地,將候選關係網路中的多個節點進行劃分,形成複合節點,每個複合節點包括多個原始關係網路中的節點。如圖2所示,每個複合節點用圓形或橢圓形虛線框標識,複合節點之間的連接關係透過虛線描述。該複合節點可以看作一個虛擬的使用者,對應了初始關係網路中的多個使用者。在建立複合節點的關係網路中,可以透過差分隱私的方式進行,對網路結構引入雜訊,使得對引入雜訊的關係網路的處理結果與對原關係網路的處理結果一致。如此,這個關係網路在有效保護使用者之間的關係隱私資料基礎上,不僅規模得到了有效精簡,還可以提供準確的使用者聚集性關係。該關係網路可以稱為基於隱私保護的關係網路。 當該基於隱私保護的關係網路提供給任意第三方平台時,不會洩露使用者的關係隱私資料。因此,計算平台可以向第三方平台提供基於隱私保護的關係網路,由第三方平台透過預先訓練好的團體識別模型,識別關係網路中的團夥,並將識別結果回饋給業務方a。這樣,可以幫助業務方a預防和打擊攻擊行為、黑產行為等的團夥作案,排除風險。 這裡要說明的是,圖2中的計算平台可以設於圖1中的業務平台,也可以是設於負有保密職責的其他可信平台。第三方平台可以是具有一定計算能力的任意平台,其可以屬於圖2中的計算平台,也可以是獨立的他方平台,本說明書對此不做限定。 其中,圖1、圖2僅給出了本說明書實施例的一個實施架構,實踐中,圖2中的計算平台在初始的關係網路的基礎上構建基於隱私保護的關係網路的流程可以應用於任何涉及使用者關係的相關場景中,例如挖掘惡意團夥、識別潛在客戶等等,在此不再一一例舉。 下面首先詳細介紹基於隱私保護的關係網路構建的具體過程。 圖3示出根據一個實施例的基於隱私保護的關係網路構建方法流程圖。該方法的執行主體可以是任何具有計算、處理能力的系統、設備、裝置、平台或伺服器。例如圖1示出的業務平台。基於隱私保護的關係網路在候選關係網路的基礎上,將候選關係網路中的原始節點進行組合,並添加在預定隱私代價下的雜訊,透過差分隱私方式隱藏節點之間的真實連接關係。 如圖3所示,基於隱私保護的關係網路構建方法包括以下步驟:步驟301,獲取候選關係網路;步驟302,將候選關係網路中的原始節點按照預設的複合節點容量,劃分出多個複合節點,其中,單個複合節點包括的原始節點數量不超過複合節點容量;步驟303,針對上述多個複合節點,檢測兩兩之間是否存在連接邊;步驟304,基於檢測結果,利用差分隱私方式對上述多個複合節點添加連接邊和權重,從而構建基於隱私保護的關係網路。 首先,步驟301,獲取候選關係網路。可以理解,候選關係網路是用來構建基於隱私保護的關係網路的基礎網路。 初始的關係網路往往是根據應用場景構建的、包含實體之間的關聯關係的關係網路,其包含了大量實體關係資料,例如使用者關係資料。例如圖1、圖2示出的實施場景中,初始的關係網路可以用於描述使用者關係的網路。在本說明書實施例中,初始關係網路中的節點可以稱為原始節點。初始關係網路通常包含了相關場景下,所有實體之間的關聯關係構成的網路。候選關係網路可以是初始關係網路本身,也可以是初始關係網路的一部分。 根據一個實施方式,可以透過預先給定的節點範圍,從初始關係網路中提取候選節點對應的關係網路作為候選關係網路。 在一個實施例中,候選節點可以是上述給定的節點,以圖2示出的實施場景為例,業務方a提供的使用者列表中的各個使用者。這些使用者對應的節點就可以稱作給定的節點。假如這些使用者為使用者a、使用者b至使用者z共26個使用者,這26個使用者對應的節點稱為候選節點。此時,可以從初始關係網路中提取出使用者a、使用者b至使用者z對應的節點及它們相互之間的連接關係,作為候選關係網路。舉例而言,如果使用者a和使用者b、使用者d對應的節點有連接關係,還和使用者11對應的節點有連接關係,則由於候選關係網路中不包括使用者11對應的節點,因此也不包括使用者11對應的節點和使用者a對應的節點之間的連接邊,但包括使用者a、使用者b、使用者d對應的節點,以及使用者a對應的節點分別和使用者b對應的節點、使用者d對應的節點之間的連接邊。 在另一個實施例中,候選節點可以與給定的節點相關聯的節點,例如除了給定的節點,還包括給定的節點預定階數內的鄰居節點。以圖2示出的實施場景為例,給定節點可以是業務方a提供的使用者列表中的各個使用者對應的節點,候選節點可以是給定節點以及其預定階數(如二階)內的鄰居節點,如一階鄰居節點、二階鄰居節點等。此時,候選關係網路就可以是給定的節點及其預定階數內的鄰居節點組成的關係網路,在此不再贅述。 可以理解的是,由於候選關係網路中的節點個數可能是任意數量,在一些實施例中,為了複合節點中數量的均衡,在可選的實施例中,還可以對候選節點對應的關係網路進一步篩選後作為候選關係網路,詳細過程在步驟302中描述。 由於候選關係網路是初始關係網路或者從初始關係網路中提取的部分網路,節點本身還是作為獨立節點存在,也就是說節點沒有變化,因此,還可以稱為原始節點,僅僅是在候選關係網路中,一些原始節點的屬性發生了變化,例如,連接邊數(或鄰居節點個數)減少。 步驟302,將候選關係網路中的節點按照預設的複合節點容量,劃分出多個複合節點。其中,每個複合節點所包括的原始節點數量不超過上述複合節點容量。複合節點容量可以是根據經驗或候選關係網路的規模(包含節點數)預設的數值,例如5、8、10等。一個複合節點對應的原始節點的數量不超過複合節點容量。通常,一個複合節點對應的原始節點的數量可以與複合節點容量一致。 在一個實施例中,可以根據複合節點容量(以下記為k)來判定複合節點的數量。例如,複合節點的數量可以為候選關係網路中的節點數量與複合節點容量k的比值的整數部分。在可選的實現方式中,複合節點的數量還可以為上述整數部分減去1。如此,可以使得在後續的差分隱私處理中,有一定的誤差空間,從而可以在保證使用者關係準確度的基礎上維護關係隱私。 在可選的實現方式中,可以在判定複合節點數量之後,對候選關係網路進行隨機過濾,使得候選關係網路中的節點數量,與複合節點的數量和複合節點容量k的乘積一致,或者與複合節點的數量加1後的數值與複合節點容量k的乘積一致的節點數,具體和複合節點的數量的判定方法相關。這樣,相當於過濾掉了原候選關係網路與複合節點容量的餘數部分的節點,和步驟301中描述的節點篩選對應。換句話說,篩選後的候選關係網路中的節點數,是原候選關係網路中的節點數減去原候選關係網路中的節點數除以複合節點容量k的餘數後的數值。也就是說,根據候選關係網路中的原始節點數量和複合節點容量判定複合節點數量,再根據複合節點數量對候選關係網路中的原始節點進行篩選。如此,可以使得候選關係網路中的原始節點被均勻分配到各個複合節點,即每個複合節點均對應有k個原始節點,並據此判定複合節點的數量。 判定了複合節點的數量之後,可以針對候選關係網路中的各個原始節點劃分複合節點。在各個複合節點對應的原始節點數量與複合節點容量相等的情況下,可以劃分的符合節點數量可以記為第一數量。在一個實施例中,可以從候選關係網路中隨機選擇出第一數量的原始節點,作為各個複合節點的基準節點(類似“種子”的作用)。然後,按照複合節點容量k,將距離基準節點由近到遠的k-1個(第二數量)節點加入到相應的複合節點。這裡,距離可以理解為連接路徑上的連接邊數,例如基準節點和其一階鄰居節點之間的距離為1。可選地,遍歷各個基準節點,檢測距離由近到遠的原始節點時,可以排除已經加入到其他複合節點的原始節點。 這樣,由原始節點構成的候選關係網路,就變成了多個複合節點構成的集合。為了使得多個複合節點形成關係網路,進一步地,可以透過步驟303,針對多個複合節點,檢測兩兩之間是否存在連接邊。 首先,可以檢測兩兩複合節點的原始節點之間是否存在連接邊。如果存在連接邊,則判定兩個複合節點之間存在連接邊。為了更清楚地進行描述,假設第一複合節點包括原始節點A、B、C、D、E,第二複合節點包括原始節點F、G、H、I、J,如果原始節點A、B、C、D、E中的任一節點(如節點C,也可以稱為第一原始節點)和原始節點F、G、H、I、J任一節點(如節點H,可以稱為第二原始節點)之間有連接邊,則可以判定第一複合節點和第二複合節點之間有連接邊。如果第一複合節點中沒有一個原始節點和第二複合節點中的任意原始節點之間有連接邊,則第一複合節點和第二複合節點之間沒有連接邊。 根據一個實施例,根據步驟303的檢測結果,可以判定一個連接邊集合,用於儲存檢測到的連接邊。可選地,檢測結果中還可以包括連接邊集合中的連接邊數量。 步驟304,基於檢測結果,利用差分隱私方式對多個複合節點添加連接邊和權重,從而構建基於隱私保護的關係網路。可以理解,利用關係網路進行業務處理時,往往還需要考慮節點之間的關聯程度,該關聯程度可以用連接邊的權重來描述。 差分隱私(differential privacy)是密碼學中的一種手段,旨在提供一種當從統計資料庫查詢時,最大化資料查詢的準確性,同時最大限度減少識別其記錄的機會。設有隨機演算法M,PM為M所有可能的輸出構成的集合。對於任意兩個鄰近資料集D和D’以及PM的任何子集SM,若隨機演算法M滿足:Pr[M(D)∈SM]<=eε ×Pr[M(D’)∈SM],則稱演算法M提供ε-差分隱私保護,其中參數ε稱為隱私保護預算,用於平衡隱私保護程度和準確度。ε通常可以預先設定。ε越接近0,eε 越接近1,隨機演算法對兩個鄰近資料集D和D’的處理結果越接近,隱私保護程度越強。 差分隱私方法可以以添加受控雜訊實現降低查詢結果的靈敏度。差分隱私方法通常用於查詢領域,在本說明書的實施架構下,設想利用差分隱私的方式產生基於隱私保護的關係網路。 本領域技術人員可以理解,差分隱私通常具有可組合性。兩個隱私因數分別為ε1 和ε2 的差分隱私組合結果,其隱私因數為ε12 。用ε表示整體的差分隱私代價,則ε=ε12 。ε越大,隱私保護強度越低,因此,可以預先設定ε的最大值,作為最大隱私代價,例如ε設為1。 容易理解的是,差分隱私方法的目的是在隱私和準確度之間進行平衡,即,在保護隱私的基礎上,兼顧準確度。為連接邊添加雜訊的目的,是為了使得隨機演算法處理添加雜訊後的關係網路與處理原雜訊網路得到相同的結果,從而達到保護隱私的目的。為了產生基於隱私保護的關係網路,可以從步驟303中檢測到的連接邊中選擇一部分連接邊,並在不存在連接邊的複合節點之間添加一定數量的連接邊。 在本說明書的一個可能設計中下,可以考慮連接邊的滿足第一隱私因數ε2 差分隱私和連接邊權重滿足第二隱私因數ε1 的差分隱私。在差分隱私方式中,隱私因數越小,個體對整體結果的影響越小,隱私保護越好,但準確度會越低,因此,隱私因數ε2 可以根據經驗預先設定。可選地,第一隱私因數ε2 可以與複合節點總數量正相關,例如,複合節點的數量n1 為1000,可以將ε2 設為0.01。當整體的隱私因數ε和第一隱私因數ε2 被設定時,第二隱私因數ε1 可以由ε-ε2 判定。 基於以上理論,首先對連接邊進行差分隱私處理。複合節點之間的連接邊集合可以記為E1 ,連接邊數量可以記為|E1 |。為了確保基於隱私保護的關係網路的準確性,可以對|E1 |添加雜訊,從而增加連接邊集合中的連接邊的選擇比例(原理下文詳細描述)。 在可選的實現方式中,可以透過拉普拉斯機制(Laplace)進行連接邊數量的差分隱私。也就是說,為連接邊集合中的連接邊數量增加拉普拉斯雜訊。符合拉普拉斯分佈的雜訊,其可以用概率密度函數:noise(y)∝e-|y|/λ 表示,其均值為0,標準差是

Figure 02_image001
。拉普拉斯機制是適用於連續資料的噪音機制。對於給定資料集D,差分隱私保護概念中的隨機演算法M(D)=f(D)+Y,演算法M提供ε-差分隱私保護的情況下,Y服從參數為敏感度/ε的Laplace分佈,即Lap(敏感度/ε)。其中,靈敏度用於表示至少改變資料集中的多少個數,會對輸出結果產生影響。例如在由使用者的關係資料構成的關係網路中,靈敏度可以為1,滿足的ε2 -差分隱私的Laplace分佈可以記為Lap(1/ε2 )。假設拉普拉斯分佈雜訊的表達為:
Figure 02_image003
將連接邊的拉普拉斯雜訊的第一隱私因數ε2 、敏感度1代入,則Y為p取1/ε2 時的拉普拉斯分佈。根據隨機演算法M(D)=f(D)+Y可知,隨機演算法針對的資料集為複合節點之間真實存在的連接邊的集合E1 時,f(D)表示邊的數量,f(D)=|E1 |,可以使得添加拉普拉斯雜訊後的連接邊數量為:m1 =|E1 |+P(1/ε2 )。其中,使用預先選定的隨機演算法為
Figure 02_image005
產生一個隨機值(可以稱為第一隨機值),在
Figure 02_image007
取該隨機值時,拉普拉斯函數
Figure 02_image009
的值就是P(1/ε2 )。P(1/ε2 )可以看作增加的雜訊邊數量。在對連接邊添加雜訊後,還可以進一步根據添加雜訊後的連接邊數量選擇和添加複合節點之間的連接邊。在一個可能的實施例中,假設從步驟303中檢測到的連接邊中選擇第三數量的連接邊,為各個複合節點構造的雜訊連接邊(檢測結果中不存在的連接邊)數量為第四數量,對連接邊數量添加在第一隱私代價下的雜訊後得到連接邊的數量為第五數量,各個複合節點之間的最大連接邊數量為第六數量,則第三數量和第四數量的比值,與第五數量和以下數量的比值一致:第六數量與第五數量的差。由於第三數量對應的第五數量在本來檢測到的連接邊數量上添加了雜訊數量,因此可以增加從檢測到的連接邊中選擇的連接邊的比例。 假設複合節點的數量為n1 ,則考慮指向複合節點自身的連接,最大連接邊數量為m0 =n1 (n1 -1)/2。也就是說,上文可選實施例中的第六數量m0 可以基於複合節點的數量n1 判定。第五數量為前述的m1 =|E1 |+P(1/ε2 )。第三數量與第四數量的比值為:
Figure 02_image011
下面詳細介紹選擇第三數量和添加第四數量的連接邊的過程。 一方面,從E1 中選擇第三數量的連接邊,通常,可以將權重較大的連接邊保留,權重較小的連接邊刪除。 根據一個實施方式,可以對於步驟303中檢測到的任意一個連接邊(如集合E1 中的連接邊),記作第一連接邊,對於第一連接邊,在給定的初始權重上,添加符合基於第二隱私代價的雙邊幾何分佈的雜訊,得到相應的第一雜訊權重,在第一雜訊權重大於第一權重臨限值的情況下,選擇第一連接邊作為基於隱私保護的關係網路中的連接邊,並將第一雜訊權重作為第一連接邊的權重。其中第二隱私代價ε1 是預定的整體隱私代價ε與第一隱私代價ε2 的差。 作為示例,在第二隱私代價
Figure 02_image013
下,令
Figure 02_image015
則雜訊
Figure 02_image017
的累積概率值滿足雙邊幾何分佈:
Figure 02_image019
其中,取到所有
Figure 02_image017
的總概率為1,也就是說,
Figure 02_image021
在0-1之間取值,可以由隨機抽樣判定。當判定一個累計概率值
Figure 02_image023
時,可以唯一對應到一個
Figure 02_image017
。透過隨機產生的概率值,可以判定相應的雜訊
Figure 02_image017
。 對於檢測到的連接邊集合E1 中的連接邊e1 ,令其權重的初始值W0 為1或0,其中,1表示初始狀態真實存在一條連接邊,否則為0,則e1 的初始權重為1。添加雜訊後,其添加雜訊後的權重表示為1+
Figure 02_image025
。 如果連接邊e1 滿足ε1 -差分隱私,則其添加雜訊後的權重應足夠大,以與原始關係網路中的節點關係區分開。為了使得權重足夠大,可以將添加雜訊後的權重1+
Figure 02_image027
與第一權重臨限值θ進行比較。也就是說,為W0 添加雜訊
Figure 02_image025
,得到權重We1 ,則有:We1 ≥θ滿足時,相應連接邊e1 滿足ε1 -差分隱私。此時,可以將e1 判定為差分隱私下的關係網路中,複合節點之間的連接邊。其中,連接邊e1 的權重為We1 。可以理解,該權重是添加雜訊後的權重,因此,可以保證使用者關係隱私。 其中,第一權重臨限值θ可以根據臨限值設定,也可以透過諸如高通濾波的方式判定。以高通濾波的方式為例,根據高通濾波原理,假設第一權重臨限值為θ,用
Figure 02_image029
表示E1 中的第i個連接邊的權重,令
Figure 02_image015
則:
Figure 02_image031
在本說明書實施例中,適應單邊濾波情形(排除負值雜訊),即:
Figure 02_image033
從而:
Figure 02_image035
可選地,
Figure 02_image037
採用計算結果的上取整形式:
Figure 02_image039
其中,當計算結果為小數時,
Figure 02_image037
的值為計算結果的整數部分加1。這是因為,
Figure 02_image037
作為添加雜訊的下限權重臨限值,
Figure 02_image037
的值較大時,可以保證雜訊足夠大,有利於維護使用者關係隱私。 根據第一權重臨限值
Figure 02_image037
,就可以
Figure 02_image041
根據添加雜訊後的連接邊的權重與
Figure 02_image037
的比較,從步驟303中檢測到的連接邊中選擇第三數量的連接邊。 另一方面,需要在步驟303檢測到的連接邊(如集合E1 中的連接邊)之外,增加第四數量的連接邊,作為基於隱私保護的關係網路中複合節點間的連接邊。這些連接邊是在添加連接邊過程中暫時假設的連接邊,也可以將其看作“權重為0的連接邊”,如果滿足條件,則被添加為基於隱私保護的關係網路中的連接邊,否則,仍然視為不存在連接邊。 根據一個可能的實施例,可以從上述各個“權重為0的連接邊”隨機選擇出第四數量(如記為s個)連接邊作為基於隱私保護的關係網路中的連接邊,並為其隨機產生預定取值範圍(如0-1之間)的權重。其中,隨機產生的權重可以大於預定臨限值,如大於0.3等等。然後,按照產生的權重從大到小的順序選擇第四數量的連接邊,各個連接邊的權重為所產生的權重。 在可選的實現方式中,可以按照二項分佈雜訊為各個“權重為0的連接邊”產生權重,並按照高通濾波器的原理選擇s個連接邊。 根據前述類似的高通濾波原理,在單邊濾波的情況下:
Figure 02_image043
於是:
Figure 02_image045
也就是說,第四數量s可以透過第五數量
Figure 02_image047
、第六數量
Figure 02_image049
及前述的第一權重臨限值
Figure 02_image051
、第二隱私代價ε1 判定。其中,各個初始權重為0的連接邊產生的雜訊權重滿足指數分佈:
Figure 02_image053
這是因為,用
Figure 02_image029
表示第i個連接邊的權重的情況下,透過高通濾波器需滿足以下條件:
Figure 02_image055
進一步地,對於所有概率大於
Figure 02_image051
的連接邊,累計概率分佈為:
Figure 02_image057
因此,如果產生一個0-1之間的隨機值作為累計概率
Figure 02_image059
,那麼可以唯一對應到一個引數x的值,該引數x的值也就是隨機賦予當前連接邊的雜訊權重
Figure 02_image061
。 可以理解,由於x的值可能為正也可能為負,而在本說明書實施例中,只有權重為正的連接邊才有意義,因此,如果所產生的權重
Figure 02_image061
≥0,那麼可以將相應的連接邊作為一條雜訊邊,相應的權重對應雜訊邊的雜訊權重。如此,直至判定出s條雜訊邊。 以上過程中,邊數量雜訊滿足拉普拉斯分佈的情況下,任意隨機演算法對真實存在的連接邊數量為|E1 |的關係網路的處理結果,小於等於
Figure 02_image063
與該任意隨機演算法對連接邊數量為:m1 =|E1 |+P(1/ε2 )的關係網路的處理結果,所以滿足ε2 -差分隱私。對於連接邊的權重,添加雙邊幾何分佈雜訊或指數分佈雜訊,使得任意隨機演算法對包括連接邊集合E1 的關係網路的處理結果,小於等於
Figure 02_image065
與該任意隨機演算法對透過添加邊數量雜訊以及權重雜訊的關係網路的處理結果,所以滿足ε1 -差分隱私。 如此,對已有連接邊的數量進行基於第一隱私因數ε2 的差分隱私處理,同時,在選擇連接邊時,對連接邊權重進行基於第二隱私因數ε1 的差分隱私處理,從而可以產生滿足ε-差分隱私的關係網路,其中ε=ε21 。 對於滿足ε-差分隱私的關係網路,不僅簡化了關係網路結構,而且加入了雜訊,掩蓋了原有的使用者關係,因此,可以在保護使用者隱私的情況下,挖掘使用者之間的關係。例如,圖1示出的實施場景中,根據商家提供的使用者ID,發掘使用者之間的團夥關係。基於隱私保護的關係網路,即使被提供給第三方平台,也不會洩露使用者的關係隱私。 圖4示出利用基於隱私保護的關係網路在多個候選使用者中判定使用者團體的方法。該方法可以由與圖3所示的方法一致的執行主體執行,也可以由其他執行主體(例如圖1中提供使用者ID的商家)執行,在此不作限定。 圖4示出的在多個候選使用者中判定使用者團體的方法包括以下步驟:步驟401,獲取為多個候選使用者產生的基於隱私保護的關係網路;步驟402,利用預定的團體識別模型處理基於隱私保護的關係網路,得到多個複合節點集合;步驟403,從多個複合節點集合中判定至少一個候選複合節點集合,以供初始關係網路的資料方按照單個候選複合節點集合中的各個候選複合節點從多個候選使用者中判定出目標使用者團體。 首先,在步驟401中,獲取為多個候選使用者產生的基於隱私保護的關係網路。可以理解,這裡的候選使用者可以由相應業務方提供。相應業務方例如是消費平台的業務提供方(如商家)。相應業務方提供的多個使用者ID可以是其在某個業務平台的相對業務方(如消費者)在該業務平台的註冊ID。每個使用者ID對應一個候選使用者。該業務平台作為初始關係網路的資料方,可以預先產生初始的使用者關係網路。 初始關係網路的資料方可以根據這些候選使用者從初始的關係網路中判定候選關係網路,並將候選關係網路中的原始節點按照預設的複合節點容量,劃分出多個複合節點,針對多個複合節點,檢測兩兩之間是否存在連接邊,基於檢測結果,利用差分隱私方式對上述多個複合節點添加連接邊和權重,從而構建基於隱私保護的關係網路。可選地,候選關係網路中可以包括相應業務方提供的使用者及其在初始關係網路中的預定階數內的鄰居節點。該過程已在圖3示出的實施例中描述,在此不再贅述。 當圖4示出的流程的執行主體與初始關係網路的資料方一致時,基於隱私保護的關係網路可以從本地獲取。 然後,在步驟402中,利用預定的團體識別模型處理基於隱私保護的關係網路,得到多個複合節點集合。其中,預定的團體識別模型例如是Louvian演算法、最大連通圖等等。 以Louvian演算法為例,可以將基於隱私保護的關係網路中的每個複合節點作為一個社區,然後將每個複合節點移動到與之相鄰的複合節點的社區中,計算整個關係網路的模組度大小,並選擇使得模組度最大的一種移動方式。接著,將移動後在同一個社區內的複合節點組合成一個新的社區,重複以上步驟,直到模組度不再增大為止。每個社區可以看作一個複合節點集合。 根據一個實施方式,模組度可以透過以下方式判定:
Figure 02_image067
其中nc 是當前關係網路中社區的個數,初始時為基於隱私保護的關係網路中社區的個數,
Figure 02_image069
是社區c中總連接邊數,
Figure 02_image071
是社區c聚類到的各個複合節點的總度數,m是當前關係網路中總的連接邊數,初始時為基於隱私保護的關係網路中總的連接邊數。模組度優化演算法可以採用諸如貪心演算法(Newmann演算法)、模擬退火演算法、隨機遊走演算法、統計原理演算法、標籤傳播演算法、InfoMap演算法、Louvain演算法之類的演算法實現。 之後,在步驟403,從多個複合節點集合中判定至少一個候選複合節點集合。如此,如果將這至少一個候選複合節點集合提供給初始關係網路的資料方,可以使得初始關係網路的資料方按照單個候選複合節點集合中的各個候選複合節點從多個候選使用者中判定出相應的目標使用者團體。 根據一個可能的設計,可以將複合節點的數量大於預定數量臨限值(如10個)的複合節點集合判定為候選複合節點集合。這樣,可以使得初始關係網路的資料方透過以下方式按照單個候選複合節點集合中的各個候選複合節點從多個候選使用者中判定出相應的目標使用者團體: 按照預先設定的映射規則,將各個候選複合節點分別映射到初始關係網路的多個初始使用者;從得到的多個初始使用者中選擇多個候選使用者中的使用者,並將選擇出的使用者識別為單個候選複合節點集合對應的目標使用者團體。也就是說,查找到原始使用者後,過濾掉非候選使用者,剩下的使用者構成目標使用者團體。可選地,初始關係網路的產生方在產生基於隱私保護的關係網路時,可以記錄複合節點與原始節點的對應關係。這裡的映射規則就可以是這裡的對應關係。 根據另一個可能的設計,圖4示出的方法的執行主體為初始關係網路的資料方。此時,該執行主體可以按照前述可能設計中的方法判定候選複合節點集合,還可以透過其他方法判定候選複合節點集合。 例如,假設步驟402得到的多個複合節點集合包括第一複合節點集合,上述執行主體可以先按照預先設定的映射規則,將第一複合節點集合中的各個複合節點分別映射到初始關係網路的多個初始使用者,然後,檢測多個初始使用者中,是否存在預定數量(如20個)或預定比例(如60%)的初始使用者,註冊時間短於預定的時間臨限值(如1個月),若存在,則將第一複合節點集合判定為候選複合節點集合。否則,可以判定第一複合節點集合不是候選複合節點集合。 可以理解,由於步驟401中使用的基於隱私保護的關係網路,在相應業務方提供的多個使用者ID基礎上可能進行擴充和/或添加雜訊,因此,候選使用者ID中可能包含不在相應業務方提供的使用者ID中的其他使用者ID,透過對比從候選使用者ID中篩除這些使用者ID之後,剩餘的候選使用者ID可以被識別為使用者團體。 候選複合節點集合中對應的目標使用者團體,可以被提供給相應業務方。這裡的使用者團體可能是進行批量攻擊或有組織的團夥的各個使用者ID,相應業務方獲取相應使用者團體資訊之後,可以進行相應的防禦或追責處理。可選地,目標使用者團體可能只有一個,也可能有多個,用於為相應業務方提供參考。 回顧以上過程,本說明書實施例所提供的基於隱私保護的關係網路構建方法,可以利用在提供使用者關係網路時,將各個使用者預先聚合,添加雜訊,形成滿足差分隱私的關係網路,從而在有效保護使用者關係隱私的基礎上,減少資料處理量,提高使用者關係網路的有效性。進一步地,基於隱私保護的關係網路用於使用者團體發掘時,不局限於特定的資料持有方,任意有計算能力的資料處理方都可以透過團體識別模型識別關係網路中的候選複合節點,並經由初始關係網路的資料持有方查詢出使用者團體中包含的使用者ID,以提供給相應業務方,如此,可以在保證資料安全的基礎上增加團體識別的便利性。 根據另一方面的實施例,還提供一種基於隱私保護的關係網路構建裝置。其中,基於隱私保護的關係網路透過多個複合節點構成,多個複合節點之間透過連接邊描述關聯關係,單個複合節點對應候選關係網路中的多個原始節點,各個原始節點分別對應各個使用者,原始節點之間的連接邊描述相應使用者之間的關聯關係。圖5示出根據一個實施例的基於隱私保護的關係網路構建裝置的示意性方塊圖。如圖5所示,裝置500包括: 獲取單元51,組態為獲取候選關係網路; 節點構建單元52,組態為將候選關係網路中的原始節點按照預設的複合節點容量,劃分出多個複合節點,其中,單個複合節點對應的原始節點數量不超過複合節點容量; 檢測單元53,組態為針對多個複合節點,檢測兩兩之間是否存在連接邊; 邊構建單元54,組態為基於檢測結果,利用差分隱私方式對多個複合節點添加邊和權重,從而構建基於隱私保護的關係網路。 值得說明的是,以上對圖5所示的基於隱私保護的關係網路構建裝置500,與圖3示出的方法實施例相對應,圖3對應的方法實施例中的相應描述也適用於圖5所示的基於隱私保護的關係網路構建裝置,在此不再贅述。 根據另一方面的實施例,還提供一種在多個候選使用者中判定使用者團體的裝置。圖6示出了在多個候選使用者中判定使用者團體的裝置600。裝置600至少包括: 獲取單元61,組態為獲取利用裝置500為多個候選使用者產生的基於隱私保護的關係網路; 處理單元62,組態為利用預定的團體識別模型處理基於隱私保護的關係網路,得到多個複合節點集合; 判定單元63,組態為從上述多個複合節點集合中判定至少一個候選複合節點集合,以供初始關係網路的資料方按照單個候選複合節點集合中的各個候選複合節點從多個候選使用者中判定出相應的目標使用者團體。 值得說明的是,以上對圖6所示的在多個候選使用者中判定使用者團體的裝置600,與圖4示出的方法實施例相對應,圖4對應的方法實施例中的相應描述也適用於圖6所示的在多個候選使用者中判定使用者團體的裝置,在此不再贅述。 根據另一方面的實施例,還提供一種電腦可讀儲存媒體,其上儲存有電腦程式,當所述電腦程式在電腦中執行時,令電腦執行相應描述的方法。 根據再一方面的實施例,還提供一種計算設備,包括記憶體和處理器,所述記憶體中儲存有可執行代碼,所述處理器執行所述可執行代碼時,實現相應描述的方法。 本領域技術人員應該可以意識到,在上述一個或多個示例中,本說明書實施例所描述的功能可以用硬體、軟體、韌體或它們的任意組合來實現。當使用軟體實現時,可以將這些功能儲存在電腦可讀媒體中或者作為電腦可讀媒體上的一個或多個指令或代碼進行傳輸。 以上所述的具體實施方式,對本說明書的技術構思的目的、技術方案和有益效果進行了進一步詳細說明,所應理解的是,以上所述僅為本說明書的技術構思的具體實施方式而已,並不用於限定本說明書的技術構思的保護範圍,凡在本說明書的技術構思的技術方案的基礎之上,所做的任何修改、等同替換、改進等,均應包括在本說明書的技術構思的保護範圍之內。The following describes the solutions provided in this specification in conjunction with the drawings. First, a specific implementation scenario is shown in conjunction with Figure 1 and Figure 2 for description. Figure 1 shows a schematic diagram of the implementation architecture of this specific implementation scenario. As shown in Figure 1, the implementation architecture includes a business platform, business parties, and users. The business platform is used to provide user communication and a medium for business communication and interaction between business parties and users. For example, Alipay platform, WeChat platform, etc., can be platforms that take into account social and business services. Users can register as registered users on the service platform, and each service party can provide users with related services in the form of sub-applications or register as a registered service party on the service platform. The business platform can record user behavior information on the business platform (such as payment behavior data, transfer behavior data, consumption behavior data, etc.), which can be used to establish a relationship network. In a relational network, each node can represent an entity (such as users, commodities, merchants, etc.). The association relationship between entities is represented by connecting edges, and the nodes corresponding to entities with direct association relationships are connected through connections. The edges are connected to each other. As shown in Figure 1, each circle represents an entity, and a line segment represents a connected edge. Nodes that have a direct association relationship can be first-order neighbor nodes to each other. If two nodes are connected by a path connecting an edge, a node, and another connecting edge, the two nodes can be called second-order neighbor nodes with each other, and so on. Generally, the order of neighbor nodes is consistent with the minimum number of connected edges in the middle interval. Under the implementation framework of this specification, the entities in the relational network can be users. It is understandable that the business parties and users in FIG. 1 are only examples. In practice, they can be any number respectively. The server form of the business platform may also be a server cluster form, which is not limited in this specification. Please refer to Figure 2 for a schematic diagram of a specific implementation scenario under the implementation architecture of Figure 1. In this implementation scenario, the computing platform pre-stores or remotely obtains the original relational network generated based on the user behavior data recorded by the service platform in Figure 1. The original relational network is the user registered on the service platform in the original relational network ID represents the user. The business party a is suspected of encountering a batch attack or an organized group attack, and it can provide each user ID in its own user data to the computing platform. The computing platform extracts the relationship network related to these users from the original relationship network according to the user ID provided by the business party a, as a candidate relationship network, and further divides multiple nodes in the candidate relationship network. A composite node is formed, and each composite node includes multiple nodes in the original relational network. As shown in Figure 2, each composite node is identified by a circular or elliptical dashed frame, and the connection relationship between the composite nodes is described by the dashed line. The composite node can be regarded as a virtual user, corresponding to multiple users in the initial relationship network. In the establishment of the relationship network of composite nodes, differential privacy can be used to introduce noise into the network structure, so that the processing result of the noise-introduced relationship network is consistent with the processing result of the original relationship network. In this way, on the basis of effectively protecting the privacy data of the relationship between users, this relationship network not only effectively reduces the scale, but also provides accurate user aggregation relationships. This relationship network can be called a relationship network based on privacy protection. When the privacy protection-based relationship network is provided to any third-party platform, the user's privacy data will not be disclosed. Therefore, the computing platform can provide a third-party platform with a relationship network based on privacy protection. The third-party platform uses a pre-trained group identification model to identify groups in the relationship network, and the identification results are fed back to the business party a. In this way, it can help the business party a to prevent and fight offensive behaviors, illegal property behaviors and other gangs committing crimes, and eliminate risks. It should be noted here that the computing platform in Figure 2 can be located on the business platform in Figure 1, or can be located on other trusted platforms that are responsible for confidentiality. The third-party platform can be any platform with certain computing capabilities, and it can belong to the computing platform in Figure 2 or an independent other party platform, which is not limited in this specification. Among them, Figure 1 and Figure 2 only show an implementation structure of the embodiment of this specification. In practice, the computing platform in Figure 2 builds a privacy-protected relationship network based on the initial relationship network. In any related scenarios involving user relationships, such as mining malicious groups, identifying potential customers, etc., I will not list them one by one here. The following first introduces the specific process of building a relationship network based on privacy protection in detail. Fig. 3 shows a flowchart of a method for constructing a relational network based on privacy protection according to an embodiment. The execution subject of the method can be any system, equipment, device, platform or server with computing and processing capabilities. For example, the business platform shown in Figure 1. The relational network based on privacy protection combines the original nodes in the candidate relational network on the basis of the candidate relational network, and adds noise at a predetermined privacy cost, and hides the real connection between the nodes through the differential privacy method relationship. As shown in Figure 3, the method for constructing a relational network based on privacy protection includes the following steps: step 301, obtain candidate relational networks; step 302, divide the original nodes in the candidate relational network according to the preset composite node capacity Multiple composite nodes, where the number of original nodes included in a single composite node does not exceed the capacity of the composite node; step 303, for the multiple composite nodes described above, detect whether there is a connection edge between the two; step 304, based on the detection result, use the difference The privacy method adds connection edges and weights to the above multiple composite nodes, thereby constructing a relationship network based on privacy protection. First, in step 301, a candidate relationship network is obtained. It can be understood that the candidate relationship network is a basic network used to construct a relationship network based on privacy protection. The initial relationship network is usually a relationship network constructed according to application scenarios and containing association relationships between entities, which contains a large amount of entity relationship data, such as user relationship data. For example, in the implementation scenarios shown in Figures 1 and 2, the initial relationship network can be used to describe the user relationship network. In the embodiments of this specification, the nodes in the initial relationship network may be referred to as original nodes. The initial relationship network usually includes the network formed by the association relationships between all entities in related scenarios. The candidate relationship network can be the initial relationship network itself or a part of the initial relationship network. According to an embodiment, the relationship network corresponding to the candidate node can be extracted from the initial relationship network as the candidate relationship network through a predetermined node range. In an embodiment, the candidate node may be the above-mentioned given node. Taking the implementation scenario shown in FIG. 2 as an example, each user in the user list provided by the business party a. The nodes corresponding to these users can be called a given node. If these users are a total of 26 users from user a, user b to user z, the nodes corresponding to these 26 users are called candidate nodes. At this time, the nodes corresponding to user a, user b to user z and their mutual connection relationships can be extracted from the initial relationship network as a candidate relationship network. For example, if the nodes corresponding to user a and user b and user d have a connection relationship, and there is a connection relationship with the node corresponding to user 11, the candidate relationship network does not include the node corresponding to user 11 , So it does not include the connecting edge between the node corresponding to user 11 and the node corresponding to user a, but it includes the nodes corresponding to user a, user b, and user d, and the nodes corresponding to user a and The connecting edge between the node corresponding to user b and the node corresponding to user d. In another embodiment, the candidate node may be a node associated with a given node, for example, in addition to the given node, it also includes neighbor nodes within a predetermined order of the given node. Taking the implementation scenario shown in Figure 2 as an example, a given node can be a node corresponding to each user in the user list provided by business party a, and a candidate node can be a given node and its predetermined order (such as second order). Neighbor nodes, such as first-order neighbor nodes, second-order neighbor nodes, etc. At this time, the candidate relationship network can be a relationship network composed of a given node and its neighbor nodes within a predetermined order, which will not be repeated here. It is understandable that since the number of nodes in the candidate relationship network may be any number, in some embodiments, in order to balance the number of composite nodes, in an alternative embodiment, the relationship between candidate nodes may also be adjusted. The network is further screened as a candidate relationship network, and the detailed process is described in step 302. Since the candidate relationship network is the initial relationship network or a part of the network extracted from the initial relationship network, the node itself still exists as an independent node, which means that the node has not changed. Therefore, it can also be called the original node. In the candidate relationship network, the attributes of some original nodes have changed, for example, the number of connected edges (or the number of neighbor nodes) is reduced. Step 302: Divide the nodes in the candidate relationship network into multiple composite nodes according to a preset composite node capacity. Wherein, the number of original nodes included in each composite node does not exceed the capacity of the aforementioned composite node. The composite node capacity can be a preset value based on experience or the size of the candidate relationship network (including the number of nodes), such as 5, 8, 10, and so on. The number of original nodes corresponding to a composite node does not exceed the capacity of the composite node. Generally, the number of original nodes corresponding to a composite node can be the same as the capacity of the composite node. In one embodiment, the number of composite nodes can be determined according to the capacity of the composite nodes (hereinafter referred to as k). For example, the number of composite nodes may be the integer part of the ratio of the number of nodes in the candidate relationship network to the composite node capacity k. In an optional implementation manner, the number of composite nodes may also be the integer part minus one. In this way, there can be a certain error space in the subsequent differential privacy processing, so that the relationship privacy can be maintained on the basis of ensuring the accuracy of the user relationship. In an alternative implementation, after determining the number of composite nodes, the candidate relationship network can be randomly filtered, so that the number of nodes in the candidate relationship network is consistent with the product of the number of composite nodes and the composite node capacity k, or The number of nodes that is consistent with the product of the number of composite nodes plus 1 and the product of the composite node capacity k is specifically related to the determination method of the number of composite nodes. In this way, it is equivalent to filtering out the nodes of the remaining part of the original candidate relationship network and the composite node capacity, which corresponds to the node selection described in step 301. In other words, the number of nodes in the candidate relationship network after screening is the remainder of the number of nodes in the original candidate relationship network minus the number of nodes in the original candidate relationship network divided by the remainder of the compound node capacity k. That is to say, the number of composite nodes is determined according to the number of original nodes in the candidate relationship network and the capacity of the composite nodes, and then the original nodes in the candidate relationship network are screened according to the number of composite nodes. In this way, the original nodes in the candidate relationship network can be evenly distributed to each composite node, that is, each composite node corresponds to k original nodes, and the number of composite nodes can be determined accordingly. After the number of composite nodes is determined, composite nodes can be divided for each original node in the candidate relationship network. When the number of original nodes corresponding to each composite node is equal to the capacity of the composite node, the number of matching nodes that can be divided can be recorded as the first number. In an embodiment, the first number of original nodes can be randomly selected from the candidate relationship network as the reference node of each composite node (similar to a "seed" function). Then, according to the compound node capacity k, k-1 (the second number) nodes from near to far away from the reference node are added to the corresponding compound node. Here, the distance can be understood as the number of connected edges on the connection path, for example, the distance between the reference node and its first-order neighbor node is 1. Optionally, when traversing each reference node and detecting the original node from near to far, the original node that has been added to other composite nodes can be excluded. In this way, the candidate relationship network formed by the original nodes becomes a collection of multiple composite nodes. In order to make multiple composite nodes form a relational network, further, step 303 can be used to detect whether there is a connection edge between the multiple composite nodes. First, it can detect whether there is a connecting edge between the original nodes of the two-by-two compound node. If there is a connecting edge, it is determined that there is a connecting edge between the two composite nodes. For a clearer description, suppose that the first composite node includes the original nodes A, B, C, D, and E, and the second composite node includes the original nodes F, G, H, I, and J. If the original nodes A, B, C , D, E any node (such as node C, can also be called the first original node) and any node of the original node F, G, H, I, J (such as node H, can be called the second original node If there is a connecting edge between ), it can be determined that there is a connecting edge between the first compound node and the second compound node. If there is a connecting edge between no original node in the first compound node and any original node in the second compound node, there is no connecting edge between the first compound node and the second compound node. According to an embodiment, according to the detection result of step 303, a set of connected edges can be determined for storing the detected connected edges. Optionally, the detection result may also include the number of connected edges in the connected edge set. In step 304, based on the detection result, a differential privacy method is used to add connection edges and weights to multiple composite nodes, thereby constructing a relationship network based on privacy protection. It can be understood that when using a relational network for business processing, it is often necessary to consider the degree of association between nodes, and the degree of association can be described by the weight of the connecting edge. Differential privacy is a means in cryptography, which aims to provide a way to maximize the accuracy of data query when querying from a statistical database, while minimizing the chance of identifying its records. There is a random algorithm M, and PM is a set of all possible outputs of M. For any two adjacent data sets D and D'and any subset SM of PM, if the random algorithm M satisfies: Pr[M(D)∈SM]<=e ε ×Pr[M(D')∈SM] , It is said that the algorithm M provides ε-differential privacy protection, where the parameter ε is called the privacy protection budget, which is used to balance the degree of privacy protection and accuracy. ε can usually be set in advance. The closer ε is to 0, the closer e ε is to 1, and the closer the processing results of the random algorithm to the two adjacent data sets D and D'are, the stronger the degree of privacy protection. The differential privacy method can reduce the sensitivity of query results by adding controlled noise. The differential privacy method is usually used in the query field. Under the implementation framework of this specification, it is envisaged to use the differential privacy method to generate a relational network based on privacy protection. Those skilled in the art can understand that differential privacy is generally composable. The two privacy factors are the differential privacy combination results of ε 1 and ε 2 , and the privacy factor is ε 1 + ε 2 . Let ε represent the overall differential privacy cost, then ε=ε 12 . The greater the ε, the lower the privacy protection strength. Therefore, the maximum value of ε can be preset as the maximum privacy cost, for example, ε is set to 1. It is easy to understand that the purpose of the differential privacy method is to balance privacy and accuracy, that is, on the basis of protecting privacy and taking into account accuracy. The purpose of adding noise to the connection side is to make the random algorithm process the noise-added relational network and process the original noise network to obtain the same result, so as to achieve the purpose of protecting privacy. In order to generate a relational network based on privacy protection, a part of connected edges may be selected from the connected edges detected in step 303, and a certain number of connected edges may be added between composite nodes that do not have connected edges. In a possible design of this specification, it is possible to consider the differential privacy of the connected edge that satisfies the first privacy factor ε 2 and the differential privacy of the connected edge weight that satisfies the second privacy factor ε 1 . In the differential privacy method, the smaller the privacy factor, the smaller the individual's influence on the overall result, and the better the privacy protection, but the lower the accuracy. Therefore, the privacy factor ε 2 can be preset based on experience. Optionally, the first privacy factor ε 2 can be positively correlated with the total number of composite nodes. For example, if the number of composite nodes n 1 is 1000, ε 2 can be set to 0.01. When the overall privacy factor ε and the first privacy factor ε 2 are set, the second privacy factor ε 1 can be determined by ε-ε 2 . Based on the above theory, firstly, differential privacy processing is performed on the connected edges. The set of connected edges between composite nodes can be denoted as E 1 , and the number of connected edges can be denoted as |E 1 |. In order to ensure the accuracy of the relational network based on privacy protection, noise can be added to |E 1 |, thereby increasing the selection ratio of connected edges in the connected edge set (the principle is described in detail below). In an alternative implementation, the differential privacy of the number of connected edges can be performed through the Laplace mechanism (Laplace). In other words, add Laplacian noise to the number of connected edges in the connected edge set. The noise that conforms to the Laplace distribution can be represented by the probability density function: noise(y)∝e -|y|/λ , the mean value is 0, and the standard deviation is
Figure 02_image001
. The Laplace mechanism is a noise mechanism suitable for continuous data. For a given data set D, the random algorithm M(D)=f(D)+Y in the concept of differential privacy protection, and when algorithm M provides ε-differential privacy protection, Y obeys the parameter of sensitivity/ε Laplace distribution, namely Lap (sensitivity/ε). Among them, the sensitivity is used to indicate at least the number of changes in the data set, which will affect the output results. For example, in a relational network composed of user relational data, the sensitivity can be 1, and the satisfied ε 2 -Laplace distribution of differential privacy can be denoted as Lap(1/ε 2 ). Suppose the expression of Laplace distribution noise is:
Figure 02_image003
Substituting the first privacy factor ε 2 and sensitivity 1 of the Laplacian noise of the connected edges, Y is the Laplacian distribution when p takes 1/ε 2. According to a random algorithm M (D) = f (D ) + Y can be seen, the random data set for the algorithm for real edge existing connections between the composite node set when E 1, f (D) number of edges, f (D)=|E 1 |, the number of connected edges after adding Laplace noise can be made: m 1 =|E 1 |+P(1/ε 2 ). Among them, using a pre-selected random algorithm is
Figure 02_image005
Generate a random value (can be called the first random value), in
Figure 02_image007
When taking this random value, the Laplace function
Figure 02_image009
The value of is P(1/ε 2 ). P(1/ε 2 ) can be regarded as the increased number of noise edges. After adding noise to the connected edges, you can further select and add the connected edges between the composite nodes according to the number of connected edges after the noise is added. In a possible embodiment, it is assumed that the third number of connected edges is selected from the connected edges detected in step 303, and the number of noisy connected edges (connected edges that do not exist in the detection result) constructed for each composite node is the first Four numbers, the number of connected edges is the fifth number after adding the noise under the first privacy cost to the number of connected edges, the maximum number of connected edges between each composite node is the sixth number, then the third number and the fourth number The ratio of the quantity is consistent with the ratio of the fifth quantity to the following quantity: the difference between the sixth quantity and the fifth quantity. Since the fifth number corresponding to the third number adds the amount of noise to the originally detected number of connected edges, the proportion of connected edges selected from the detected connected edges can be increased. Assuming that the number of composite nodes is n 1 , then consider the connection to the composite node itself, and the maximum number of connected edges is m 0 =n 1 (n 1 -1)/2. That is to say, the sixth number m 0 in the above optional embodiment can be determined based on the number n 1 of composite nodes. The fifth quantity is the aforementioned m 1 =|E 1 |+P(1/ε 2 ). The ratio of the third quantity to the fourth quantity is:
Figure 02_image011
The process of selecting the third number and adding the fourth number of connected edges will be described in detail below. On the one hand, the third choice in the number of connections from the edge E 1, generally, the right to be re-connected to the larger edge retention, weight smaller connecting side deleted. According to an embodiment, for any connected edge (such as the connected edge in the set E 1 ) detected in step 303, it can be recorded as the first connected edge. For the first connected edge, on a given initial weight, add According to the bilateral geometric distribution of noise based on the second privacy cost, the corresponding first noise weight is obtained. When the first noise weight is greater than the first weight threshold, the first connection edge is selected as the privacy protection-based noise The connection edge in the relational network, and the first noise weight is used as the weight of the first connection edge. The second privacy cost ε 1 is the difference between the predetermined overall privacy cost ε and the first privacy cost ε 2 . As an example, at the second privacy cost
Figure 02_image013
Order
Figure 02_image015
Noise
Figure 02_image017
The cumulative probability value of satisfies the bilateral geometric distribution:
Figure 02_image019
Among them, get all
Figure 02_image017
The total probability of is 1, that is,
Figure 02_image021
A value between 0-1 can be determined by random sampling. When determining a cumulative probability value
Figure 02_image023
Time, it can uniquely correspond to one
Figure 02_image017
. Through the randomly generated probability value, the corresponding noise can be determined
Figure 02_image017
. For detection of the connection set of edges E 1 is connected edges e 1, weight make it weights the initial value W 0 is 1 or 0, where 1 is the initial state of real one connecting edge, otherwise 0, then e initial 1 The weight is 1. After adding noise, its weight after adding noise is expressed as 1+
Figure 02_image025
. If the connecting edge e 1 satisfies ε 1 -differential privacy, its weight after adding noise should be large enough to distinguish it from the node relationship in the original relationship network. In order to make the weight large enough, the weight after adding noise can be 1+
Figure 02_image027
Compare with the first weight threshold θ. In other words, add noise to W 0
Figure 02_image025
, Get the weight We 1 , then: when We 1 ≥ θ is satisfied, the corresponding connecting edge e 1 satisfies ε 1 -differential privacy. At this time, e 1 can be judged as the connecting edge between composite nodes in the relational network under differential privacy. Among them, the weight of the connecting edge e 1 is We 1 . It can be understood that the weight is the weight after noise is added, and therefore, the privacy of the user relationship can be guaranteed. Among them, the first weight threshold value θ can be set according to the threshold value, or it can be determined by means such as high-pass filtering. Take the high-pass filtering method as an example. According to the principle of high-pass filtering, assuming that the first weight threshold is θ, use
Figure 02_image029
Denote the weight of the i-th connected edge in E 1, let
Figure 02_image015
then:
Figure 02_image031
In the embodiment of this specification, the situation of unilateral filtering is adapted (negative noise is excluded), namely:
Figure 02_image033
thereby:
Figure 02_image035
Optionally,
Figure 02_image037
Use the round-up form of the calculation result:
Figure 02_image039
Among them, when the calculation result is a decimal,
Figure 02_image037
The value of is the integer part of the calculation result plus 1. This is because,
Figure 02_image037
As the lower weight threshold for adding noise,
Figure 02_image037
When the value of is large, it can ensure that the noise is large enough, which is beneficial to maintaining the privacy of the user relationship. According to the first weight threshold
Figure 02_image037
,can
Figure 02_image041
According to the weight of the connected edges after adding noise and
Figure 02_image037
The third number of connected edges is selected from the connected edges detected in step 303. On the other hand, at step 303 need to connect the detected edges (e.g., set in the connecting side E 1) addition, to increase the number of the fourth connecting edge, a relationship network based on the privacy of the connecting side between the composite node. These connecting edges are temporarily assumed in the process of adding connecting edges. They can also be regarded as “connected edges with a weight of 0”. If the conditions are met, they will be added as connecting edges in a relational network based on privacy protection. , Otherwise, it is still considered that there is no connected edge. According to a possible embodiment, a fourth number (for example, s) of connection edges can be randomly selected from the above-mentioned "connection edges with a weight of 0" as the connection edges in the relational network based on privacy protection, and be Randomly generate weights with a predetermined value range (such as between 0-1). Among them, the randomly generated weight can be greater than a predetermined threshold, such as greater than 0.3 and so on. Then, the fourth number of connected edges is selected in descending order of the generated weights, and the weight of each connected edge is the generated weight. In an optional implementation manner, a weight can be generated for each "connected edge with a weight of 0" according to the binomial distribution noise, and s connected edges can be selected according to the principle of a high-pass filter. According to the aforementioned principle of similar high-pass filtering, in the case of unilateral filtering:
Figure 02_image043
then:
Figure 02_image045
In other words, the fourth quantity s can pass through the fifth quantity
Figure 02_image047
, The sixth quantity
Figure 02_image049
And the aforementioned first weight threshold
Figure 02_image051
, The second privacy cost ε 1 judgment. Among them, the noise weight generated by each connected edge with an initial weight of 0 satisfies the exponential distribution:
Figure 02_image053
This is because, with
Figure 02_image029
In the case of expressing the weight of the i-th connected edge, the following conditions must be met through the high-pass filter:
Figure 02_image055
Further, for all probabilities greater than
Figure 02_image051
The cumulative probability distribution of the connected edges is:
Figure 02_image057
Therefore, if a random value between 0-1 is generated as the cumulative probability
Figure 02_image059
, Then it can uniquely correspond to the value of an argument x, and the value of the argument x is randomly assigned to the noise weight of the current connected edge
Figure 02_image061
. It can be understood that since the value of x may be positive or negative, and in the embodiment of this specification, only the connected edges with a positive weight are meaningful. Therefore, if the generated weight
Figure 02_image061
≥0, then the corresponding connected edge can be regarded as a noise edge, and the corresponding weight corresponds to the noise weight of the noise edge. In this way, until s noise edges are determined. In the above process, when the number of edges noise satisfies the Laplacian distribution, the processing result of any random algorithm for the relational network with the number of connected edges |E 1 | is less than or equal to
Figure 02_image063
The processing result of the relational network with the number of connected edges of this arbitrary random algorithm is: m 1 =|E 1 |+P(1/ε 2 ), so it satisfies ε 2 -differential privacy. For the weight of the connected edges, add bilateral geometric distribution noise or exponential distribution noise, so that the processing result of any random algorithm for the relational network including the connected edge set E 1 is less than or equal to
Figure 02_image065
And this arbitrary random algorithm is the processing result of the relational network by adding the number of edges noise and the weight noise, so it satisfies the ε 1 -differential privacy. In this way, differential privacy processing based on the first privacy factor ε 2 is performed on the number of existing connected edges. At the same time, when the connected edges are selected, the weight of the connected edges is subjected to differential privacy processing based on the second privacy factor ε 1 to generate A relational network that satisfies ε-differential privacy, where ε=ε 21 . For a relational network that satisfies ε-differential privacy, it not only simplifies the structure of the relational network, but also adds noise to cover up the original user relationship. Therefore, it is possible to mine the user’s relationship while protecting user privacy Relationship between. For example, in the implementation scenario shown in Fig. 1, the group relationship between users is explored according to the user ID provided by the merchant. The relationship network based on privacy protection will not reveal the user's relationship privacy even if it is provided to a third-party platform. Figure 4 shows a method for determining user communities among multiple candidate users using a relationship network based on privacy protection. The method can be executed by an execution subject consistent with the method shown in FIG. 3, or by another execution subject (for example, a merchant that provides a user ID in FIG. 1), which is not limited herein. The method for determining the user community among multiple candidate users shown in FIG. 4 includes the following steps: Step 401: Obtain a privacy protection-based relationship network generated for multiple candidate users; Step 402: Use predetermined community identification The model processes the relational network based on privacy protection to obtain multiple composite node sets; step 403, determine at least one candidate composite node set from the multiple composite node sets, so that the data party of the initial relationship network follows a single candidate composite node set Each candidate composite node in, determines the target user community from multiple candidate users. First, in step 401, a relationship network based on privacy protection generated for multiple candidate users is obtained. It is understandable that the candidate users here can be provided by the corresponding business party. The corresponding business party is, for example, a business provider (such as a merchant) of a consumer platform. The multiple user IDs provided by the corresponding business parties may be the registration IDs of the business parties (such as consumers) on a certain business platform that are registered on the business platform. Each user ID corresponds to a candidate user. As the data party of the initial relationship network, the business platform can generate the initial user relationship network in advance. The data party of the initial relationship network can determine the candidate relationship network from the initial relationship network based on these candidate users, and divide the original node in the candidate relationship network into multiple composite nodes according to the preset composite node capacity , For multiple composite nodes, detect whether there is a connection edge between the two, based on the detection result, use the differential privacy method to add connection edges and weights to the above multiple composite nodes, thereby constructing a relationship network based on privacy protection. Optionally, the candidate relationship network may include users provided by the corresponding business party and their neighbor nodes within a predetermined order in the initial relationship network. This process has been described in the embodiment shown in FIG. 3, and will not be repeated here. When the execution subject of the process shown in FIG. 4 is consistent with the data party of the initial relationship network, the relationship network based on privacy protection can be obtained locally. Then, in step 402, a predetermined group recognition model is used to process the relationship network based on privacy protection to obtain multiple composite node sets. Among them, the predetermined group recognition model is, for example, Louvian algorithm, maximum connected graph, and so on. Taking Louvian algorithm as an example, each composite node in the relationship network based on privacy protection can be regarded as a community, and then each composite node can be moved to the community of the adjacent composite node to calculate the entire relationship network The degree of modularity of, and choose a movement method that maximizes the degree of modularity. Then, combine the moved composite nodes in the same community into a new community, and repeat the above steps until the modularity no longer increases. Each community can be regarded as a set of composite nodes. According to one embodiment, the degree of modularity can be determined in the following ways:
Figure 02_image067
Where n c is the number of communities in the current relationship network, and initially is the number of communities in the relationship network based on privacy protection.
Figure 02_image069
Is the total number of connected edges in community c,
Figure 02_image071
Is the total degree of each composite node clustered by community c, m is the total number of connected edges in the current relational network, and initially is the total number of connected edges in the relational network based on privacy protection. Modularity optimization algorithms can use algorithms such as greedy algorithm (Newmann algorithm), simulated annealing algorithm, random walk algorithm, statistical principle algorithm, label propagation algorithm, InfoMap algorithm, Louvain algorithm, etc. achieve. After that, in step 403, at least one candidate composite node set is determined from the multiple composite node sets. In this way, if the at least one candidate composite node set is provided to the data side of the initial relationship network, the data side of the initial relationship network can be determined from multiple candidate users according to each candidate composite node in a single candidate composite node set. Identify the corresponding target user groups. According to a possible design, a composite node set whose number of composite nodes is greater than a predetermined threshold (for example, 10) can be determined as a candidate composite node set. In this way, the data party of the initial relationship network can determine the corresponding target user community from multiple candidate users according to each candidate composite node in a single candidate composite node set in the following way: According to the preset mapping rules, Each candidate composite node is respectively mapped to multiple initial users of the initial relationship network; a user among multiple candidate users is selected from the multiple initial users obtained, and the selected user is identified as a single candidate composite The target user community corresponding to the node set. In other words, after the original user is found, non-candidate users are filtered out, and the remaining users constitute the target user community. Optionally, the creator of the initial relationship network may record the corresponding relationship between the composite node and the original node when generating the privacy protection-based relationship network. The mapping rule here can be the corresponding relationship here. According to another possible design, the execution subject of the method shown in FIG. 4 is the data party of the initial relationship network. At this time, the execution subject can determine the set of candidate composite nodes according to the method in the aforementioned possible design, and can also determine the set of candidate composite nodes through other methods. For example, assuming that the multiple composite node sets obtained in step 402 include a first composite node set, the above-mentioned execution subject may first map each composite node in the first composite node set to the initial relationship network according to a preset mapping rule. Multiple initial users, and then detect whether there are a predetermined number (such as 20) or a predetermined proportion (such as 60%) of the initial users among the multiple initial users, and the registration time is shorter than the predetermined time threshold (such as 1 month), if it exists, the first composite node set is determined as a candidate composite node set. Otherwise, it can be determined that the first composite node set is not a candidate composite node set. It is understandable that due to the privacy protection-based relational network used in step 401, it is possible to expand and/or add noise based on multiple user IDs provided by the corresponding business party. Therefore, the candidate user IDs may contain non-existent user IDs. After comparing other user IDs among the user IDs provided by the corresponding business party, after these user IDs are filtered from the candidate user IDs, the remaining candidate user IDs can be identified as user groups. The corresponding target user community in the candidate composite node set can be provided to the corresponding business party. The user groups here may be individual user IDs that are carrying out batch attacks or organized groups. After the corresponding business party obtains the corresponding user group information, it can conduct corresponding defense or accountability processing. Optionally, there may be only one or multiple target user groups, which are used to provide references for the corresponding business parties. Recalling the above process, the privacy protection-based relationship network construction method provided by the embodiments of this specification can be used to pre-aggregate various users when providing user relationship networks, add noise, and form a relationship network that satisfies differential privacy. Therefore, on the basis of effectively protecting the privacy of the user's relationship, the amount of data processing is reduced and the effectiveness of the user's relationship network is improved. Further, when a privacy-protected relationship network is used for user community discovery, it is not limited to a specific data holder. Any data processor with computing power can identify candidate compounds in the relationship network through the group recognition model. The node and the data holder of the initial relationship network query the user ID contained in the user group to provide it to the corresponding business party. In this way, the convenience of group identification can be increased on the basis of ensuring data security. According to another embodiment, a privacy protection-based relationship network construction device is also provided. Among them, the relationship network based on privacy protection is composed of multiple composite nodes, and the relationship between multiple composite nodes is described by connecting edges. A single composite node corresponds to multiple original nodes in the candidate relationship network, and each original node corresponds to each Users, the connection edges between the original nodes describe the relationship between the corresponding users. Fig. 5 shows a schematic block diagram of an apparatus for constructing a relational network based on privacy protection according to an embodiment. As shown in FIG. 5, the device 500 includes: an obtaining unit 51, configured to obtain a candidate relationship network; a node construction unit 52, configured to divide the original nodes in the candidate relationship network according to a preset composite node capacity Multiple composite nodes, where the number of original nodes corresponding to a single composite node does not exceed the capacity of the composite node; the detection unit 53 is configured to detect whether there is a connecting edge between the two for multiple composite nodes; the edge construction unit 54 is a group The state is based on the detection results, using the differential privacy method to add edges and weights to multiple composite nodes, thereby constructing a relationship network based on privacy protection. It is worth noting that the foregoing privacy protection-based relational network construction device 500 shown in FIG. 5 corresponds to the method embodiment shown in FIG. 3, and the corresponding description in the method embodiment corresponding to FIG. 3 is also applicable to FIG. The privacy protection-based relational network construction device shown in 5 will not be repeated here. According to another embodiment, there is also provided an apparatus for determining a user group among a plurality of candidate users. FIG. 6 shows an apparatus 600 for determining the user community among a plurality of candidate users. The device 600 at least includes: an obtaining unit 61, configured to obtain a privacy protection-based relationship network generated by the device 500 for multiple candidate users; a processing unit 62, configured to use a predetermined group identification model to process privacy-protected relational networks The relationship network obtains multiple composite node sets; the determining unit 63 is configured to determine at least one candidate composite node set from the multiple composite node sets described above, so that the data party of the initial relationship network is based on a single candidate composite node set Each candidate composite node determines the corresponding target user community from multiple candidate users. It is worth noting that the above device 600 for determining user groups among multiple candidate users shown in FIG. 6 corresponds to the method embodiment shown in FIG. 4, and the corresponding description in the method embodiment corresponding to FIG. 4 It is also applicable to the device for determining the user community among multiple candidate users as shown in FIG. 6, which will not be repeated here. According to another embodiment, there is also provided a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in the computer, the computer is caused to execute the correspondingly described method. According to another embodiment, there is also provided a computing device, including a memory and a processor, the memory stores executable code, and the processor implements the correspondingly described method when the executable code is executed by the processor. Those skilled in the art should be aware that, in one or more of the foregoing examples, the functions described in the embodiments of this specification can be implemented by hardware, software, firmware, or any combination thereof. When implemented by software, these functions can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium. The specific implementations described above further describe the purpose, technical solutions, and beneficial effects of the technical concept of this specification in further detail. It should be understood that the above are only specific implementations of the technical concept of this specification, and It is not used to limit the protection scope of the technical idea of this specification. Any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of the technical idea of this specification shall be included in the protection of the technical idea of this specification Within range.

51:獲取單元 52:節點構建單元 53:檢測單元 54:邊構建單元 61:獲取單元 62:處理單元 63:判定單元 301,302,303,304,401,402,403:步驟 500,600:裝置51: get unit 52: Node building unit 53: detection unit 54: Edge Building Unit 61: get unit 62: Processing Unit 63: Judgment unit 301,302,303,304,401,402,403: steps 500,600: device

為了更清楚地說明本發明實施例的技術方案,下面將對實施例描述中所需要使用的圖式作簡單地介紹,顯而易見地,下面描述中的圖式僅僅是本發明的一些實施例,對於本領域普通技術人員來講,在不付出創造性勞動的前提下,還可以根據這些圖式獲得其它的圖式。 [圖1]示出本說明書實施例的一個實施架構示意圖; [圖2]示出本說明書實施例的一個實施場景示意圖; [圖3]示出根據一個實施例的基於隱私保護的關係網路構建流程示意圖; [圖4]示出根據一個實施例的在多個候選使用者中判定使用者團體的流程示意圖; [圖5]示出根據一個實施例的基於隱私保護的關係網路構建裝置示意圖; [圖6]示出根據一個實施例的在多個候選使用者中判定使用者團體的裝置的示意性方塊圖。 In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other schemas can be obtained based on these schemas without creative work. [Fig. 1] A schematic diagram showing an implementation architecture of the embodiment of this specification; [Fig. 2] A schematic diagram showing an implementation scenario of the embodiment of this specification; [Figure 3] shows a schematic diagram of a privacy protection-based relationship network construction process according to an embodiment; [FIG. 4] A schematic diagram showing the process of determining the user community among multiple candidate users according to an embodiment; [Figure 5] shows a schematic diagram of a privacy protection-based relationship network construction device according to an embodiment; [FIG. 6] A schematic block diagram showing an apparatus for determining a user community among a plurality of candidate users according to an embodiment.

Claims (25)

一種基於隱私保護的關係網路構建方法,其中,基於隱私保護的關係網路透過多個複合節點構成,所述多個複合節點之間透過連接邊描述關聯關係,單個複合節點對應候選關係網路中的多個原始節點,各個原始節點分別對應各個使用者,原始節點之間的連接邊描述相應使用者之間的關聯關係;所述方法包括: 獲取所述候選關係網路; 將所述候選關係網路中的原始節點按照預設的複合節點容量,劃分出多個複合節點,其中,單個複合節點對應的原始節點數量不超過所述複合節點容量; 針對所述多個複合節點,檢測兩兩之間是否存在連接邊; 基於檢測結果,利用差分隱私方式對所述多個複合節點添加連接邊和權重,從而構建基於隱私保護的關係網路。A method for constructing a relational network based on privacy protection, wherein the relational network based on privacy protection is formed by a plurality of composite nodes, and the multiple composite nodes describe the association relationship through connection edges, and a single composite node corresponds to the candidate relationship network There are multiple original nodes in, each original node corresponds to each user, and the connecting edge between the original nodes describes the association relationship between the corresponding users; the method includes: Acquiring the candidate relationship network; Dividing the original nodes in the candidate relationship network into a plurality of composite nodes according to a preset composite node capacity, wherein the number of original nodes corresponding to a single composite node does not exceed the composite node capacity; For the multiple composite nodes, detecting whether there is a connecting edge between the two; Based on the detection result, a differential privacy method is used to add connection edges and weights to the multiple composite nodes, thereby constructing a relationship network based on privacy protection. 如請求項1所述的方法,其中,所述候選關係網路透過以下方式獲取: 獲取基於第三業務方提供的多個候選使用者的使用者標識; 基於所述使用者標識,從初始關係網路中篩選出所述多個候選使用者對應的原始節點,及其預定階數內的鄰居節點,作為候選節點; 將所述候選節點構成的關係網路,作為候選關係網路。The method according to claim 1, wherein the candidate relationship network is obtained in the following manner: Obtain user IDs based on multiple candidate users provided by a third business party; Based on the user identification, the original nodes corresponding to the multiple candidate users and their neighbor nodes within a predetermined order are selected from the initial relationship network as candidate nodes; The relationship network formed by the candidate nodes is regarded as the candidate relationship network. 如請求項1所述的方法,其中,所述將所述候選關係網路中的原始節點按照預設的複合節點容量,劃分出多個複合節點包括: 判定所述候選關係網路中的原始節點數量; 根據所述原始節點數量和所述複合節點容量,判定第一數量,所述第一數量為,在各個複合節點對應的原始節點數量與所述複合節點容量相等的情況下,最多可以劃分的複合節點數量; 從所述候選關係網路中的原始節點中,隨機選取所述第一數量的原始節點,作為各個複合節點的基準節點; 對各個基準節點,分別從所述候選關係網路中判定第二數量的原始節點,和相應基準節點一起作為相應的複合節點,所述第二數量比所述第一數量小1個單位。The method according to claim 1, wherein the dividing the original node in the candidate relationship network into a plurality of composite nodes according to a preset composite node capacity includes: Determining the number of original nodes in the candidate relationship network; According to the number of original nodes and the capacity of the composite node, determine a first number, where the first number is the maximum number of composite nodes that can be divided when the number of original nodes corresponding to each composite node is equal to the capacity of the composite node Number of nodes; Randomly selecting the first number of original nodes from the original nodes in the candidate relationship network as the reference node of each composite node; For each reference node, a second number of original nodes is determined from the candidate relationship network, and the corresponding reference node is used as a corresponding composite node, and the second number is 1 unit smaller than the first number. 如請求項1所述的方法,其中,所述多個複合節點包括第一複合節點和第二複合節點,所述第一複合節點對應有第一原始節點,所述第二複合節點對應有第二原始節點,所述針對所述多個複合節點,檢測兩兩之間是否存在連接邊包括: 在所述第一原始節點和所述第二原始節點之間存在連接邊的情況下,判定所述第一複合節點和所述第二複合節點之間存在連接邊。The method according to claim 1, wherein the multiple composite nodes include a first composite node and a second composite node, the first composite node corresponds to a first original node, and the second composite node corresponds to a first composite node. Two original nodes. For the multiple composite nodes, detecting whether there is a connection edge between the two includes: In the case that there is a connecting edge between the first original node and the second original node, it is determined that there is a connecting edge between the first composite node and the second composite node. 如請求項1所述的方法,其中,所述檢測結果包括,各個複合節點之間的連接邊集合,以及所述連接邊集合中的連接邊數量,所述基於檢測結果,利用差分隱私方式對所述多個複合節點添加邊和權重包括: 對所述連接邊數量添加在第一隱私代價下的雜訊。The method according to claim 1, wherein the detection result includes a set of connected edges between each composite node, and the number of connected edges in the set of connected edges, and the detection result is based on a differential privacy method. The added edges and weights of the multiple composite nodes include: The noise under the first privacy cost is added to the number of connected edges. 如請求項5所述的方法,其中,所述在第一隱私代價下的雜訊滿足縮放參數為所述第一隱私代價的倒數的拉普拉斯分佈。The method according to claim 5, wherein the noise under the first privacy cost satisfies a Laplacian distribution whose scaling parameter is the reciprocal of the first privacy cost. 如請求項6所述的方法,其中,所述在第一隱私代價下的雜訊為,透過預定的隨機演算法產生第一隨機值,在拉普拉斯分佈的引數為所述第一隨機值時,拉普拉斯分佈的因變數值。The method according to claim 6, wherein the noise under the first privacy cost is a first random value generated by a predetermined random algorithm, and the argument in the Laplace distribution is the first Random value, the dependent variable value of Laplace distribution. 如請求項5所述的方法,其中,所述基於檢測結果,利用差分隱私方式對所述多個複合節點添加邊和權重還包括: 從所述連接邊集合中選擇第三數量的連接邊; 為各個複合節點構造第四數量的雜訊連接邊,所述雜訊連接邊是所述連接邊集合之外的連接邊。The method according to claim 5, wherein, based on the detection result, adding edges and weights to the multiple composite nodes using a differential privacy method further includes: Selecting a third number of connected edges from the set of connected edges; A fourth number of noise connecting edges is constructed for each composite node, and the noise connecting edges are connecting edges outside the set of connecting edges. 如請求項8所述的方法,其中,對所述連接邊數量添加在第一隱私代價下的雜訊後得到第五數量,各個複合節點之間的最大連接邊數量為第六數量,所述第三數量和所述第四數量的比值,與所述第五數量和以下數量的比值一致:所述第六數量與所述第五數量的差。The method according to claim 8, wherein the fifth number is obtained after the noise under the first privacy cost is added to the number of connected edges, the maximum number of connected edges between each composite node is the sixth number, and The ratio of the third quantity to the fourth quantity is consistent with the ratio of the fifth quantity to the following quantity: the difference between the sixth quantity and the fifth quantity. 如請求項8所述的方法,其中,所述連接邊集合中包括第一連接邊,所述連接邊集合中的連接邊分別對應有給定一致的初始權重,所述從所述連接邊集合中選擇第三數量的連接邊包括: 對於所述第一連接邊,在給定的初始權重上,添加符合基於第二隱私代價的累積概率滿足雙邊幾何分佈的雜訊,得到相應的第一雜訊權重,所述第二隱私代價是預定的整體隱私代價與所述第一隱私代價的差; 在所述第一雜訊權重大於第一權重臨限值的情況下,選擇所述第一連接邊作為基於隱私保護的關係網路中的連接邊,並將所述第一雜訊權重作為所述第一連接邊的權重。The method according to claim 8, wherein the set of connected edges includes a first connected edge, the connected edges in the set of connected edges respectively correspond to a given initial weight, and the set of connected edges Select the third number of connected edges to include: For the first connected edge, on a given initial weight, add noise that meets the bilateral geometric distribution based on the cumulative probability of the second privacy cost to obtain the corresponding first noise weight, and the second privacy cost is The difference between the predetermined overall privacy price and the first privacy price; In the case that the first noise weight is greater than the first weight threshold, the first connection edge is selected as the connection edge in the privacy protection-based relationship network, and the first noise weight is used as the all The weight of the first connecting edge. 如請求項10所述的方法,其中,所述給定的初始權重為1,並且,透過以下方式為所述第一連接邊添加雜訊: 透過預定的隨機演算法為集合雙邊分佈產生預定區間內的第二隨機值; 判定雙邊幾何分佈的引數在得到所述第二隨機值的情況下引數的取值; 為所述第一連接邊添加雜訊後的權重為所述初始權重與所述引數的取值的和。The method according to claim 10, wherein the given initial weight is 1, and noise is added to the first connecting edge in the following manner: Generate a second random value in a predetermined interval for the set bilateral distribution through a predetermined random algorithm; Determine the value of the parameter of the bilateral geometric distribution when the second random value is obtained; The weight after adding noise to the first connecting edge is the sum of the initial weight and the value of the parameter. 如請求項10所述的方法,其中,所述第一權重臨限值是對所述連接邊集合中的各個連接邊,按照所述第二隱私代價下的高通濾波器進行單邊濾波情況下,得到第一比例的連接邊的引數臨限值,其中,所述第一比例是以下第一項與第二項的比值: 所述第一項為基於對所述連接邊數量添加在第一隱私代價下的雜訊後得到的第五數量; 所述第二項為各個複合節點之間的最大連接邊數量與所述第五數量的差。The method according to claim 10, wherein the first weight threshold is a case in which each connected edge in the connected edge set is subjected to unilateral filtering according to the high-pass filter under the second privacy cost , The threshold value of the argument of the connecting edge of the first ratio is obtained, where the first ratio is the ratio of the following first term to the second term: The first item is a fifth quantity obtained based on adding noise under the first privacy cost to the number of connected edges; The second term is the difference between the maximum number of connected edges between each composite node and the fifth number. 如請求項8所述的方法,所述第四數量是按照第二隱私代價下的高通濾波器的過濾比例判定的,所述第二隱私代價是預定的整體隱私代價與所述第一隱私代價的差,所述第四數量與以下項的差的比值與所述第二隱私代價下的高通濾波器的過濾比例一致:各個複合節點之間的最大連接邊數量、基於對所述連接邊數量添加在第一隱私代價下的雜訊後得到的連接邊數量。According to the method described in claim 8, the fourth number is determined according to the filtering ratio of the high-pass filter under the second privacy cost, and the second privacy cost is the predetermined overall privacy cost and the first privacy cost The ratio of the fourth number to the difference between the following terms is consistent with the filtering ratio of the high-pass filter under the second privacy cost: the maximum number of connected edges between each composite node, based on the number of connected edges The number of connected edges obtained after adding the noise under the first privacy cost. 如請求項13所述的方法,其中,所述多個複合節點包括第三複合節點和第四複合節點,所述第三複合節點和所述第四複合節點之間不存在所述連接邊集合中的連接邊相連,所述為各個複合節點構造第四數量的雜訊連接邊包括: 為所述第三複合節點和所述第四複合節點添加初始權重為0的第二連接邊; 為所述第二連接邊產生滿足在所述第二隱私代價下的累積概率滿足指數分佈的雜訊權重; 在為所述第二連接邊產生的雜訊權重大於0的情況下,將所述第二聯街邊判定為添加的連接邊,所產生的雜訊權重為所述第二連接邊的權重。The method according to claim 13, wherein the plurality of composite nodes include a third composite node and a fourth composite node, and the set of connected edges does not exist between the third composite node and the fourth composite node The connecting edges in are connected, and the construction of the fourth number of noise connecting edges for each composite node includes: Adding a second connecting edge with an initial weight of 0 to the third composite node and the fourth composite node; Generating, for the second connecting edge, a noise weight that satisfies the cumulative probability of satisfying the exponential distribution under the second privacy cost; In the case where the noise weight generated for the second connecting edge is greater than 0, the second connecting edge is determined as an added connecting edge, and the generated noise weight is the weight of the second connecting edge. 如請求項14所述的方法,其中,透過以下方式為所述第二連接邊產生滿足在所述第二隱私代價下的指數分佈的雜訊權重: 透過預定的隨機演算法產生一個預定概率區間的隨機值; 將在所述第二隱私代價下的指數分佈取所述隨機值的情況下,引數的取值作為為所述第二連接邊產生的雜訊權重。The method according to claim 14, wherein the noise weight that satisfies the exponential distribution under the second privacy cost is generated for the second connecting edge in the following manner: Generate a random value with a predetermined probability interval through a predetermined random algorithm; In the case where the exponential distribution under the second privacy cost takes the random value, the value of the parameter is used as the noise weight generated for the second connection edge. 一種在多個候選使用者中判定使用者團體的方法,所述方法包括: 獲取利用請求項1的方法為所述多個候選使用者產生的基於隱私保護的關係網路; 利用預定的團體識別模型處理基於隱私保護的關係網路,得到多個複合節點集合; 從所述多個複合節點集合中判定至少一個候選複合節點集合,以供初始關係網路的資料方按照單個候選複合節點集合中的各個候選複合節點從所述多個候選使用者中判定出相應的目標使用者團體。A method for determining a user community among a plurality of candidate users, the method comprising: The method for obtaining and using request 1 is a relationship network based on privacy protection generated by the multiple candidate users; Use a predetermined group recognition model to process the relationship network based on privacy protection to obtain multiple composite node sets; At least one candidate composite node set is determined from the multiple composite node sets, so that the data party of the initial relationship network can determine the corresponding candidate composite node from the multiple candidate users according to each candidate composite node in the single candidate composite node set. The target user group. 如請求項16所述的方法,其中,所述利用預定的團體識別模型處理基於隱私保護的關係網路,得到多個複合節點集合包括: 將基於隱私保護的關係網路作為初始的當前關係網路,在初始的當前關係網路中,每個複合節點作為一個社區; 執行以下模組度最大化步驟:將每個複合節點移動到與之相鄰的複合節點所在的社區中,計算以社區為節點的當前關係網路的模組度大小,並選擇使得模組度最大的一種移動方式; 對移動後在同一個社區內的複合節點合併到同一個社區,反覆運算執行所述模組度最大化步驟,直至當前關係網路的模組度不再變化; 針對各個社區,分別產生相應的各個複合節點集合。The method according to claim 16, wherein the using a predetermined group recognition model to process a relationship network based on privacy protection to obtain multiple composite node sets includes: Regard the relationship network based on privacy protection as the initial current relationship network. In the initial current relationship network, each composite node serves as a community; Perform the following modularity maximization steps: move each composite node to the community where the adjacent composite node is located, calculate the modularity of the current relationship network with the community as the node, and select the modularity The largest type of movement; After moving, the composite nodes in the same community are merged into the same community, and the module degree maximization step is performed repeatedly until the module degree of the current relationship network no longer changes; For each community, a corresponding set of composite nodes is generated. 如請求項17所述的方法,其中,當前關係網路的模組度透過對各個社區的節點度求和得到,當前關係網路中第一社區的節點度為,以下第一項與第二項的差: 所述第一項為,所述第一社區中總的連接邊數量與當前關係網路中總的連接邊數的比值; 所述第二項為,聚類到所述第一社區的各個複合節點的總度數與當前關係網路中總的連接邊數的2倍的比值的平方。The method according to claim 17, wherein the module degree of the current relationship network is obtained by summing the node degrees of each community, and the node degree of the first community in the current relationship network is: Difference of terms: The first term is the ratio of the total number of connected edges in the first community to the total number of connected edges in the current relationship network; The second term is the square of the ratio of the total degree of each compound node clustered to the first community to twice the total number of connected edges in the current relationship network. 如請求項16-18任一所述的方法,其中,所述模組度最大化步驟透過以下方式之一判定:貪心演算法、模擬退火演算法、隨機遊走演算法、統計原理演算法、標籤傳播演算法、InfoMap演算法、Louvain演算法。The method according to any one of claim items 16-18, wherein the modularity maximization step is determined by one of the following methods: greedy algorithm, simulated annealing algorithm, random walk algorithm, statistical principle algorithm, label Propagation algorithm, InfoMap algorithm, Louvain algorithm. 如請求項16所述的方法,其中,所述從所述多個複合節點集合中判定至少一個候選複合節點集合包括: 將複合節點的數量大於預定數量臨限值的複合節點集合判定為候選複合節點集合; 從而使得初始關係網路的資料方透過以下方式按照單個候選複合節點集合中的各個候選複合節點從所述多個候選使用者中判定出相應的目標使用者團體: 按照預先設定的映射規則,將各個候選複合節點分別映射到初始關係網路的多個初始使用者; 從所述多個初始使用者中選擇所述多個候選使用者中的使用者,並將選擇出的使用者識別為所述單個候選複合節點集合對應的目標使用者團體。The method according to claim 16, wherein the determining at least one candidate composite node set from the multiple composite node sets includes: Judging a composite node set whose number of composite nodes is greater than a predetermined threshold value as a candidate composite node set; As a result, the data party of the initial relationship network determines the corresponding target user community from the multiple candidate users according to each candidate composite node in a single candidate composite node set in the following manner: According to the preset mapping rules, map each candidate composite node to multiple initial users of the initial relationship network; A user of the plurality of candidate users is selected from the plurality of initial users, and the selected user is identified as a target user community corresponding to the single candidate composite node set. 如請求項16所述的方法,其中,所述方法的執行主體為初始關係網路的資料方,所述多個複合節點集合包括第一複合節點集合,所述從所述多個複合節點集合中判定至少一個候選複合節點集合包括: 按照預先設定的映射規則,將所述第一複合節點集合中的各個複合節點分別映射到初始關係網路的多個初始使用者; 檢測所述多個初始使用者中,是否存在預定數量或預定比例的初始使用者,註冊時間短於預定的時間臨限值; 若存在,則將所述第一複合節點集合判定為候選複合節點集合。The method according to claim 16, wherein the execution subject of the method is the data party of the initial relational network, the multiple composite node sets include a first composite node set, and the subordinate composite node sets The at least one candidate composite node set in the judgment includes: Mapping each composite node in the first composite node set to multiple initial users of the initial relationship network according to a preset mapping rule; Detecting whether there are a predetermined number or a predetermined proportion of the initial users among the multiple initial users, and the registration time is shorter than a predetermined time threshold; If it exists, the first composite node set is determined as a candidate composite node set. 一種基於隱私保護的關係網路構建裝置,其中,基於隱私保護的關係網路透過多個複合節點構成,所述多個複合節點之間透過連接邊描述關聯關係,單個複合節點對應候選關係網路中的多個原始節點,各個原始節點分別對應各個使用者,原始節點之間的連接邊描述相應使用者之間的關聯關係;所述裝置包括: 獲取單元,組態為獲取所述候選關係網路; 節點構建單元,組態為將所述候選關係網路中的原始節點按照預設的複合節點容量,劃分出多個複合節點,其中,單個複合節點對應的原始節點數量不超過所述複合節點容量; 檢測單元,組態為針對所述多個複合節點,檢測兩兩之間是否存在連接邊; 邊構建單元,組態為基於檢測結果,利用差分隱私方式對所述多個複合節點添加邊和權重,從而構建基於隱私保護的關係網路。A privacy protection-based relational network construction device, wherein the privacy-protected relational network is formed by a plurality of composite nodes, and the multiple composite nodes describe the association relationship by connecting edges, and a single composite node corresponds to the candidate relational network There are multiple original nodes in, each original node corresponds to each user, and the connection edge between the original nodes describes the association relationship between the corresponding users; the device includes: An obtaining unit, configured to obtain the candidate relationship network; The node construction unit is configured to divide the original nodes in the candidate relationship network into multiple composite nodes according to a preset composite node capacity, wherein the number of original nodes corresponding to a single composite node does not exceed the composite node capacity ; The detection unit is configured to detect whether there is a connecting edge between two of the multiple composite nodes; The edge construction unit is configured to add edges and weights to the multiple composite nodes based on the detection result by using a differential privacy method, thereby constructing a relationship network based on privacy protection. 一種在多個候選使用者中判定使用者團體的裝置,所述裝置包括: 獲取單元,組態為獲取利用請求項22的裝置為所述多個候選使用者產生的基於隱私保護的關係網路; 處理單元,組態為利用預定的團體識別模型處理基於隱私保護的關係網路,得到多個複合節點集合; 判定單元,組態為從所述多個複合節點集合中判定至少一個候選複合節點集合,以供初始關係網路的資料方按照單個候選複合節點集合中的各個候選複合節點從所述多個候選使用者中判定出相應的目標使用者團體。A device for determining a user community among multiple candidate users, the device comprising: An acquiring unit configured to acquire a privacy protection-based relationship network generated by the device using the request item 22 for the multiple candidate users; The processing unit is configured to process the relational network based on privacy protection by using a predetermined group recognition model to obtain a set of multiple composite nodes; The determining unit is configured to determine at least one candidate composite node set from the multiple composite node sets, so that the data party of the initial relational network can select from the multiple candidate composite nodes according to each candidate composite node in the single candidate composite node set. The corresponding target user group is determined among the users. 一種電腦可讀儲存媒體,其上儲存有電腦程式,當所述電腦程式在電腦中執行時,令電腦執行請求項1-21中任一項的所述的方法。A computer-readable storage medium has a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in any one of the request items 1-21. 一種計算設備,包括記憶體和處理器,所述記憶體中儲存有可執行代碼,所述處理器執行所述可執行代碼時,實現請求項1-21中任一項所述的方法。A computing device includes a memory and a processor, the memory stores executable code, and when the processor executes the executable code, the method described in any one of request items 1-21 is implemented.
TW109115721A 2019-12-13 2020-05-12 Method and device for constructing relational network based on privacy protection TWI724896B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911284478.0A CN111046429B (en) 2019-12-13 2019-12-13 Method and device for establishing relationship network based on privacy protection
CN201911284478.0 2019-12-13

Publications (2)

Publication Number Publication Date
TWI724896B TWI724896B (en) 2021-04-11
TW202123118A true TW202123118A (en) 2021-06-16

Family

ID=70236206

Family Applications (1)

Application Number Title Priority Date Filing Date
TW109115721A TWI724896B (en) 2019-12-13 2020-05-12 Method and device for constructing relational network based on privacy protection

Country Status (3)

Country Link
CN (1) CN111046429B (en)
TW (1) TWI724896B (en)
WO (1) WO2021114921A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046429B (en) * 2019-12-13 2021-06-04 支付宝(杭州)信息技术有限公司 Method and device for establishing relationship network based on privacy protection
CN111626890B (en) * 2020-06-03 2023-08-01 四川大学 Remarkable community discovery method based on sales information network
CN111783996B (en) * 2020-06-18 2023-08-25 杭州海康威视数字技术股份有限公司 Data processing method, device and equipment
CN111737751B (en) * 2020-07-17 2020-11-17 支付宝(杭州)信息技术有限公司 Method and device for realizing distributed data processing of privacy protection
CN112528166A (en) * 2020-12-16 2021-03-19 平安养老保险股份有限公司 User relationship analysis method and device, computer equipment and storage medium
CN113361055B (en) * 2021-07-02 2024-03-08 京东城市(北京)数字科技有限公司 Privacy processing method, device, electronic equipment and storage medium in extended social network
CN114564752B (en) * 2022-04-28 2022-07-26 蓝象智联(杭州)科技有限公司 Blacklist propagation method based on graph federation
CN115114664B (en) * 2022-06-24 2023-05-23 浙江大学 Graph data-oriented differential privacy protection issuing method and system
CN115828312B (en) * 2023-02-17 2023-06-16 浙江浙能数字科技有限公司 Privacy protection method and system for social network of power user

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8468244B2 (en) * 2007-01-05 2013-06-18 Digital Doors, Inc. Digital information infrastructure and method for security designated data and with granular data stores
US20110105143A1 (en) * 2009-11-03 2011-05-05 Geosolutions B.V. Proximal relevancy ranking in a layered linked node database
CN104866781B (en) * 2015-05-27 2017-07-04 广西师范大学 The community network data publication method for secret protection of Community-oriented detection application
CN105376243B (en) * 2015-11-27 2018-08-21 中国人民解放军国防科学技术大学 Online community network difference method for secret protection based on stratified random figure
CN106650487B (en) * 2016-09-29 2019-04-26 广西师范大学 Multi-section figure method for secret protection based on the publication of multidimensional sensitive data
CN107689950B (en) * 2017-06-23 2019-01-29 平安科技(深圳)有限公司 Data publication method, apparatus, server and storage medium
CN109299615B (en) * 2017-08-07 2022-05-17 南京邮电大学 Differential privacy processing and publishing method for social network data
CN109639747B (en) * 2017-10-09 2020-06-26 阿里巴巴集团控股有限公司 Data request processing method, data request processing device, query message processing method, query message processing device and equipment
CN107918664B (en) * 2017-11-22 2021-07-27 广西师范大学 Social network data differential privacy protection method based on uncertain graph
KR102175167B1 (en) * 2018-05-09 2020-11-05 서강대학교 산학협력단 K-means clustering based data mining system and method using the same
CN109344643B (en) * 2018-09-03 2022-03-29 华中科技大学 Privacy protection method and system for triangle data release in facing graph
CN110032603A (en) * 2019-01-22 2019-07-19 阿里巴巴集团控股有限公司 The method and device that node in a kind of pair of relational network figure is clustered
CN109829337B (en) * 2019-03-07 2023-07-25 广东工业大学 Method, system and equipment for protecting social network privacy
CN110147996A (en) * 2019-05-21 2019-08-20 中央财经大学 A kind of data trade localization difference method for secret protection and device based on block chain
CN110288358A (en) * 2019-06-20 2019-09-27 武汉斗鱼网络科技有限公司 A kind of equipment group determines method, apparatus, equipment and medium
CN111046429B (en) * 2019-12-13 2021-06-04 支付宝(杭州)信息技术有限公司 Method and device for establishing relationship network based on privacy protection

Also Published As

Publication number Publication date
WO2021114921A1 (en) 2021-06-17
CN111046429A (en) 2020-04-21
CN111046429B (en) 2021-06-04
TWI724896B (en) 2021-04-11

Similar Documents

Publication Publication Date Title
TWI724896B (en) Method and device for constructing relational network based on privacy protection
CN110958220B (en) Network space security threat detection method and system based on heterogeneous graph embedding
Qian et al. De-anonymizing social networks and inferring private attributes using knowledge graphs
US11159556B2 (en) Predicting vulnerabilities affecting assets of an enterprise system
CN106101202B (en) It analyzes for social graph data to determine internuncial system and method in community
Ghazal et al. DDoS Intrusion Detection with Ensemble Stream Mining for IoT Smart Sensing Devices
Wang et al. Graph-based security and privacy analytics via collective classification with joint weight learning and propagation
CN107358116B (en) A kind of method for secret protection in multi-sensitive attributes data publication
CN104077723B (en) A kind of social networks commending system and method
Zamini et al. A comprehensive survey of anomaly detection in banking, wireless sensor networks, social networks, and healthcare
Zhan et al. Identification of top-K influential communities in big networks
Doyle et al. Predicting complex user behavior from CDR based social networks
Zhang et al. Graph partition based privacy-preserving scheme in social networks
Raghebi et al. A new trust evaluation method based on reliability of customer feedback for cloud computing
Marchal et al. Detecting organized eCommerce fraud using scalable categorical clustering
Lu et al. A security-assured accuracy-maximised privacy preserving collaborative filtering recommendation algorithm
Han et al. Data valuation for vertical federated learning: An information-theoretic approach
Galli et al. Group privacy for personalized federated learning
Taha et al. A system for analyzing criminal social networks
Santhana Marichamy et al. Efficient big data security analysis on HDFS based on combination of clustering and data perturbation algorithm using health care database
Farhana et al. Evaluation of Boruta algorithm in DDoS detection
Yang et al. Achieving privacy-preserving cross-silo anomaly detection using federated XGBoost
Wang et al. Supporting geospatial privacy-preserving data mining of social media
Jiang et al. A negative survey based privacy preservation method for topology of social networks
Ksibi et al. IoMT Security Model based on Machine Learning and Risk Assessment Techniques