TWI780433B - A method and device for constructing and predicting an isolated forest model based on federated learning - Google Patents

A method and device for constructing and predicting an isolated forest model based on federated learning Download PDF

Info

Publication number
TWI780433B
TWI780433B TW109115727A TW109115727A TWI780433B TW I780433 B TWI780433 B TW I780433B TW 109115727 A TW109115727 A TW 109115727A TW 109115727 A TW109115727 A TW 109115727A TW I780433 B TWI780433 B TW I780433B
Authority
TW
Taiwan
Prior art keywords
node
party
data
feature
computing
Prior art date
Application number
TW109115727A
Other languages
Chinese (zh)
Other versions
TW202123050A (en
Inventor
宋博文
葉捷明
陳帥
顧曦
Original Assignee
大陸商支付寶(杭州)信息技術有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 大陸商支付寶(杭州)信息技術有限公司 filed Critical 大陸商支付寶(杭州)信息技術有限公司
Publication of TW202123050A publication Critical patent/TW202123050A/en
Application granted granted Critical
Publication of TWI780433B publication Critical patent/TWI780433B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Devices For Executing Special Programs (AREA)
  • Image Analysis (AREA)

Abstract

本說明書實施例提供了一種基於聯邦學習構建孤立森林模型的方法和裝置,所述方法包括:獲取與第一節點對應的多個樣本標識,多個樣本標識與多個樣本分別對應,每個樣本包括m個特徵的特徵值;從m個特徵標識中隨機選擇一個特徵標識;在所述選擇的特徵標識為第一特徵標識的情況中,基於本地儲存第一特徵標識與第一資料方的對應關係,將第一節點的標識、多個樣本標識和第一特徵標識發送給第一資料方;記錄第一節點與第一資料方的對應關係;從第一資料方接收與第一節點的兩個子節點分別對應的資訊,從而在保護各資料方私有資料的同時構建孤立森林模型以用於進行業務處理。The embodiment of this specification provides a method and device for constructing an isolated forest model based on federated learning. The method includes: obtaining multiple sample identifiers corresponding to the first node, the multiple sample identifiers corresponding to multiple samples, each sample Including the feature values of m features; randomly selecting a feature ID from the m feature IDs; in the case that the selected feature ID is the first feature ID, based on the correspondence between the first feature ID and the first data party stored locally relationship, send the first node’s identifier, multiple sample identifiers, and the first feature identifier to the first data party; record the corresponding relationship between the first node and the first data party; receive the first data party’s two The information corresponding to each sub-node, so as to protect the private data of each data party while building an isolated forest model for business processing.

Description

基於聯邦學習的孤立森林模型構建和預測方法和裝置Method and device for constructing and predicting isolated forest model based on federated learning

本說明書實施例涉及機器學習技術領域,更具體地,涉及基於聯邦學習構建孤立森林模型的方法和裝置、以及基於聯邦學習通過孤立森林模型預測物件異常性的方法和裝置。The embodiment of this description relates to the field of machine learning technology, and more specifically, to a method and device for constructing an isolated forest model based on federated learning, and a method and device for predicting abnormality of objects through the isolated forest model based on federated learning.

目前,越來越多的作為資料擁有方的網際網路企業開始關注資料隱私和資料安全問題。孤立森林模型是一種預測異常物件的無監督學習模型,該模型例如可用於對用戶行為進行分析來識別異常行為,從而保護用戶資金的安全,例如盜用風險防控、詐欺風險防控等等。然而在上述場景下的資料模組化往往是在資料融合(即資料中心化儲存/可見)的條件下進行的,這樣往往要求不同來源的資料需要完全暴露給對方才能完成模組化分析工作,這在隱私資料的層面是有很大風險的。因此,需要一種更有效的保護私有資料的孤立森林模型構建和使用方案。At present, more and more Internet companies as data owners are beginning to pay attention to data privacy and data security issues. The isolation forest model is an unsupervised learning model that predicts abnormal objects. This model can be used, for example, to analyze user behavior to identify abnormal behavior, thereby protecting the security of user funds, such as theft risk prevention and fraud risk prevention and control, etc. However, data modularization in the above scenarios is often carried out under the condition of data fusion (that is, data centralized storage/visibility), which often requires that data from different sources need to be fully exposed to each other to complete the modular analysis work. This is very risky at the level of privacy data. Therefore, there is a need for a more effective isolation forest model construction and use scheme to protect private data.

本說明書實施例意於提供一種更有效的保護私有資料的孤立森林模型構建和使用方案,以解決現有技術中的不足。 為實現上述目的,本說明書一個態樣提供一種基於聯邦學習構建孤立森林模型的方法,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述方法由計算方的設備相對於所述模型中的第一樹中的第一節點執行,所述至少兩個資料方包括第一資料方,所述計算方設備中預先儲存了m個特徵標識與各個資料方的對應關係,所述m個特徵標識分別為m個特徵各自的預定標識,所述方法包括: 獲取與第一節點對應的多個樣本標識,所述多個樣本標識與多個樣本分別對應,每個樣本包括所述m個特徵的特徵值; 從所述m個特徵標識中隨機選擇一個特徵標識; 在所述選擇的特徵標識為第一特徵標識的情況中,基於本地儲存第一特徵標識與第一資料方的對應關係,將所述第一節點的標識、所述多個樣本標識和所述第一特徵標識發送給所述第一資料方; 記錄所述第一節點與所述第一資料方的對應關係; 從所述第一資料方接收與所述第一節點的兩個子節點分別對應的資訊,從而構建孤立森林模型以用於進行業務處理。 在一個實施例中,所述第一節點為根節點,其中,獲取與第一節點對應的多個樣本標識包括,獲取N個樣本標識,從所述N個樣本標識中隨機獲取n個樣本標識,其中N>n。 在一個實施例中,所述兩個子節點中包括第二節點,與所述第二節點對應的資訊包括,所述第二節點為葉子節點,所述方法還包括,記錄所述第二節點標識與所述第一資料方的對應關係。 在一個實施例中,所述兩個子節點中包括第三節點,與所述第三節點對應的資訊包括,分到所述第三節點的u個樣本標識,其中,所述u個樣本標識為所述多個樣本標識中的一部分。 在一個實施例中,所述至少一個資料方為至少一個網路平臺,所述多個樣本與網路平臺中的多個物件分別對應。 在一個實施例中,所述物件為以下任一項:消費者、交易、商家、商品。 本說明書另一態樣提供一種基於聯邦學習構建孤立森林模型的方法,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述模型的第一樹中包括第一節點,所述方法由所述至少兩個資料方中的第一資料方的設備執行,所述第一資料方的設備中擁有各個樣本的第一特徵的特徵值,並且儲存有第一特徵與預先確定的第一特徵標識的對應關係,所述方法包括: 從所述計算方的設備接收第一節點的標識、多個樣本標識和第一特徵標識,其中,所述多個樣本標識與多個樣本分別對應; 基於本地儲存第一特徵標識與第一特徵的對應關係,從所述多個樣本各自的第一特徵的特徵值中隨機選擇一個特徵值作為第一節點的分裂值; 記錄所述第一節點與所述第一特徵和所述分裂值的對應關係; 基於所述分裂值對所述多個樣本進行分組,以構建所述第一節點的兩個子節點; 分別確定所述兩個子節點是否為葉子節點; 基於所述分組和確定的結果,將與兩個子節點分別對應的資訊發送給所述計算方的設備,從而構建孤立森林模型以用於進行業務處理。 在一個實施例中,所述兩個子節點中包括第二節點,其中,與第二節點對應的資訊包括,所述第二節點為葉子節點,所述方法還包括,計算並儲存所述第二節點的節點深度。 本說明書另一態樣提供一種基於聯邦學習通過孤立森林模型預測物件異常性的方法,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述計算方的設備中儲存有所述模型中第一樹的樹結構、所述第一樹中各個節點對應的資料方,所述方法由所述計算方的設備執行,包括: 獲取第一物件的物件標識; 將所述物件標識發送給各個資料方; 從各個資料方設備接收該資料方在其對應的至少一個非葉子節點分別進行的對所述第一物件的至少一次劃分結果; 基於第一樹的樹結構、以及來自所述至少兩個資料方設備的在各個非葉子節點對所述第一物件的劃分結果,確定所述第一物件落入的第一葉子節點; 基於所述第一樹中的葉子節點各自對應的資料方,將所述第一葉子節點的標識發送給與所述第一葉子節點對應的第一資料方; 從所述第一資料方接收所述第一葉子節點的節點深度; 基於所述節點深度預測第一物件的異常性,以用於進行業務處理。 在一個實施例中,所述方法還包括,基於對所述第一物件的預測結果,獲取訓練樣本,以用於訓練監督學習模型。 在一個實施例中,所述方法還包括,基於所述訓練好的監督學習模型的參數,最佳化所述孤立森林模型的樣本特徵。 本說明書另一態樣提供一種基於聯邦學習通過孤立森林模型預測物件異常性的方法,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述至少兩個資料方中的第一資料方的設備中記錄有:其對應的所述第一樹中第一節點的第一特徵和分裂值,並且所述第一資料方的設備中儲存有各個物件的第一特徵的特徵值,所述方法由所述第一資料方的設備執行,包括: 從所述計算方的設備接收第一物件的物件標識; 基於本地儲存的第一節點的第一特徵,從本地獲取所述第一物件的第一特徵的特徵值; 基於本地儲存的所述第一物件的第一特徵的特徵值和所述第一節點的分裂值,在第一節點對所述第一物件進行劃分; 將所述劃分的結果發送給所述計算方的設備,從而用於預測所述第一物件的異常性以用於進行業務處理。 在一個實施例中,所述第一資料方的設備中記錄有所述第一樹中第二節點的節點深度,所述方法還包括,從所述計算方的設備接收所述第一物件所落入的第二節點的標識,將所述第二節點的節點深度發送給所述計算方的設備。 本說明書另一態樣提供一種基於聯邦學習構建孤立森林模型的裝置,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述裝置相對於所述模型中的第一樹中的第一節點配置於計算方的設備中,所述至少兩個資料方包括第一資料方,所述計算方設備中預先儲存了m個特徵標識與各個資料方的對應關係,所述m個特徵標識分別為m個特徵各自的預定標識,所述裝置包括: 獲取單元,配置為,獲取與第一節點對應的多個樣本標識,所述多個樣本標識與多個樣本分別對應,每個樣本包括所述m個特徵的特徵值; 選擇單元,配置為,從所述m個特徵標識中隨機選擇一個特徵標識; 發送單元,配置為,在所述選擇的特徵標識為第一特徵標識的情況中,基於本地儲存第一特徵標識與第一資料方的對應關係,將所述第一節點的標識、所述多個樣本標識和所述第一特徵標識發送給所述第一資料方; 第一記錄單元,配置為,記錄所述第一節點與所述第一資料方的對應關係; 接收單元,配置為,從所述第一資料方接收與所述第一節點的兩個子節點分別對應的資訊,從而構建孤立森林模型以用於進行業務處理。 在一個實施例中,所述第一節點為根節點,其中,所述獲取單元還配置為,獲取N個樣本標識,從所述N個樣本標識中隨機獲取n個樣本標識,其中N>n。 在一個實施例中,所述兩個子節點中包括第二節點,與所述第二節點對應的資訊包括,所述第二節點為葉子節點,所述裝置還包括,第二記錄單元,配置為,記錄所述第二節點標識與所述第一資料方的對應關係。 本說明書另一態樣提供一種基於聯邦學習構建孤立森林模型的裝置,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述模型的第一樹中包括第一節點,所述裝置配置在所述至少兩個資料方中的第一資料方的設備中,所述第一資料方的設備中擁有各個樣本的第一特徵的特徵值,並且儲存有第一特徵與預先確定的第一特徵標識的對應關係,所述裝置包括: 接收單元,配置為,從所述計算方的設備接收第一節點的標識、多個樣本標識和第一特徵標識,其中,所述多個樣本標識與多個樣本分別對應; 選擇單元,配置為,基於本地儲存第一特徵標識與第一特徵的對應關係,從所述多個樣本各自的第一特徵的特徵值中隨機選擇一個特徵值作為第一節點的分裂值; 記錄單元,配置為,記錄所述第一節點與所述第一特徵和所述分裂值的對應關係; 分組單元,配置為,基於所述分裂值對所述多個樣本進行分組,以構建所述第一節點的兩個子節點; 確定單元,配置為,分別確定所述兩個子節點是否為葉子節點; 發送單元,配置為,基於所述分組和確定的結果,將與兩個子節點分別對應的資訊發送給所述計算方的設備,從而構建孤立森林模型以用於進行業務處理。 在一個實施例中,所述兩個子節點中包括第二節點,其中,與第二節點對應的資訊包括,所述第二節點為葉子節點,所述裝置還包括,計算單元,配置為,計算並儲存所述第二節點的節點深度。 本說明書另一態樣提供一種基於聯邦學習通過孤立森林模型預測物件異常性的裝置,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述計算方的設備中儲存有所述模型中第一樹的樹結構、所述第一樹中各個節點對應的資料方,所述裝置配置於所述計算方的設備中,包括: 第一獲取單元,配置為,獲取第一物件的物件標識; 第一發送單元,配置為,將所述物件標識發送給各個資料方; 第一接收單元,配置為,從各個資料方設備接收該資料方在其對應的至少一個非葉子節點分別進行的對所述第一物件的至少一次劃分結果; 第一確定單元,配置為,基於第一樹的樹結構、以及來自所述至少兩個資料方設備的在各個非葉子節點對所述第一物件的劃分結果,確定所述第一物件落入的第一葉子節點; 第二發送單元,配置為,基於所述第一樹中的葉子節點各自對應的資料方,將所述第一葉子節點的標識發送給與所述第一葉子節點對應的第一資料方; 第二接收單元,配置為,從所述第一資料方接收所述第一葉子節點的節點深度; 預測單元,配置為,基於所述節點深度預測第一物件的異常性,以用於進行業務處理。 在一個實施例中,所述裝置還包括,第二獲取單元,配置為,基於對所述第一物件的預測結果,獲取訓練樣本,以用於訓練監督學習模型。 在一個實施例中,所述裝置還包括,第二確定單元,配置為,基於所述訓練好的監督學習模型的參數,確定所述孤立森林模型的樣本包括的特徵。 本說明書另一態樣提供一種基於聯邦學習通過孤立森林模型預測物件異常性的裝置,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述至少兩個資料方中的第一資料方的設備中記錄有:其對應的所述第一樹中第一節點的第一特徵和分裂值,並且所述第一資料方的設備中儲存有各個物件的第一特徵的特徵值,所述裝置配置於所述第一資料方的設備中,包括: 第一接收單元,配置為,從所述計算方的設備接收第一物件的物件標識; 獲取單元,配置為,基於本地儲存的第一節點的第一特徵,從本地獲取所述第一物件的第一特徵的特徵值; 劃分單元,配置為,基於本地儲存的所述第一物件的第一特徵的特徵值和所述第一節點的分裂值,在第一節點對所述第一物件進行劃分; 第一發送單元,配置為,將所述劃分的結果發送給所述計算方的設備,從而用於預測所述第一物件的異常性以用於進行業務處理。 在一個實施例中,所述第一資料方的設備中記錄有所述第一樹中第二節點的節點深度,所述裝置還包括,第二接收單元,配置為,從所述計算方的設備接收所述第一物件所落入的第二節點的標識,以及第二發送單元,配置為,將所述第二節點的節點深度發送給所述計算方的設備。 本說明書另一態樣提供一種電腦可讀儲存媒體,其上儲存有電腦程式,當所述電腦程式在電腦中執行時,令電腦執行上述任一項方法。 本說明書另一態樣提供一種計算設備,包括記憶體和處理器,其中,所述記憶體中儲存有可執行碼,所述處理器執行所述可執行碼時,實現上述任一項方法。 通過根據本說明書實施例的基於聯邦學習構建孤立森林模型並使用該模型進行異常性預測的方案,可使用多個資料方的資料共同構建孤立森林模型,並使用多個資料方的資料和該模型的資料共同對物件異常性進行預測,同時保護各個資料方的資料不洩露給其它方,在擴充了用於構建孤立森林模型的資料量、增加模型的預測準確率的同時,保護了各個資料方的資料安全。The embodiment of this specification intends to provide a more effective construction and use scheme of the isolated forest model for protecting private data, so as to solve the deficiencies in the prior art. In order to achieve the above purpose, one aspect of this specification provides a method for constructing an isolated forest model based on federated learning. The participants of the federated learning include a computing party and at least two data parties. The first node in the first tree in the above model is executed, the at least two data sources include the first data source, and the corresponding relationship between m feature identifiers and each data source is pre-stored in the computing device, and the The m feature identifiers are respectively predetermined identifiers of the m features, and the method includes: Obtaining a plurality of sample identifiers corresponding to the first node, the plurality of sample identifiers corresponding to the plurality of samples, each sample including the characteristic values of the m features; Randomly select a feature identifier from the m feature identifiers; In the case that the selected feature identifier is the first feature identifier, based on the correspondence between the locally stored first feature identifier and the first data party, the identifier of the first node, the plurality of sample identifiers and the sending the first feature identifier to the first data party; Recording the corresponding relationship between the first node and the first data party; Information corresponding to the two child nodes of the first node is received from the first data source, so as to construct an isolated forest model for business processing. In one embodiment, the first node is a root node, wherein acquiring a plurality of sample IDs corresponding to the first node includes acquiring N sample IDs, and randomly acquiring n sample IDs from the N sample IDs , where N>n. In one embodiment, the two child nodes include a second node, the information corresponding to the second node includes that the second node is a leaf node, and the method further includes recording the second node Identify the corresponding relationship with the first data party. In one embodiment, the two sub-nodes include a third node, and the information corresponding to the third node includes u sample IDs assigned to the third node, wherein the u sample IDs is a part of the plurality of sample identifiers. In one embodiment, the at least one data source is at least one network platform, and the multiple samples correspond to multiple objects in the network platform. In one embodiment, the object is any one of the following: consumer, transaction, merchant, commodity. Another aspect of this specification provides a method for constructing an isolated forest model based on federated learning, where the participants of the federated learning include a computing party and at least two data parties, the first tree of the model includes a first node, and the The method is executed by the device of the first data source among the at least two data sources. The device of the first data source has the characteristic value of the first characteristic of each sample, and stores the first characteristic and the predetermined second characteristic. A correspondence relationship of feature identifiers, the method comprising: receiving an identifier of a first node, a plurality of sample identifiers, and a first feature identifier from the computing side device, wherein the plurality of sample identifiers correspond to a plurality of samples; Based on the correspondence between the first feature identifier and the first feature stored locally, randomly select a feature value from the feature values of the first features of the plurality of samples as the split value of the first node; recording the correspondence between the first node and the first feature and the split value; grouping the plurality of samples based on the split value to construct two child nodes of the first node; Respectively determine whether the two child nodes are leaf nodes; Based on the grouping and determination results, the information corresponding to the two sub-nodes is sent to the computing device, so as to construct an isolated forest model for business processing. In one embodiment, the two child nodes include a second node, wherein the information corresponding to the second node includes that the second node is a leaf node, and the method further includes calculating and storing the second node Node depth for two nodes. Another aspect of this specification provides a method for predicting the abnormality of an object through an isolated forest model based on federated learning. The participants of the federated learning include a calculation party and at least two data parties, and the device of the calculation party stores the described The tree structure of the first tree in the model, the data party corresponding to each node in the first tree, the method is executed by the device of the computing party, including: Obtain the object identifier of the first object; sending said object identification to each data party; receiving at least one division result of the first object performed by the data party on at least one corresponding non-leaf node from each data party device; Based on the tree structure of the first tree and the division results of the first object at each non-leaf node from the at least two data-side devices, determine the first leaf node into which the first object falls; Sending the identifier of the first leaf node to the first data party corresponding to the first leaf node based on the data parties corresponding to the leaf nodes in the first tree; receiving the node depth of the first leaf node from the first data source; The abnormality of the first object is predicted based on the node depth for business processing. In one embodiment, the method further includes, based on the prediction result of the first object, acquiring training samples for training a supervised learning model. In one embodiment, the method further includes, based on the parameters of the trained supervised learning model, optimizing the sample characteristics of the isolated forest model. Another aspect of this specification provides a method for predicting the abnormality of an object through an isolated forest model based on federated learning, where the participants of the federated learning include a computing party and at least two data parties, the first of the at least two data parties The equipment of the data party records: the first characteristic and the split value of the first node corresponding to it in the first tree, and the equipment of the first data party stores the eigenvalues of the first characteristics of each object, The method is executed by the device of the first data party, including: receiving an item identification of a first item from a device of the computing party; Acquiring locally the characteristic value of the first characteristic of the first object based on the locally stored first characteristic of the first node; dividing the first object at the first node based on the locally stored eigenvalue of the first feature of the first object and the split value of the first node; The result of the division is sent to the device of the calculation side, so as to predict the abnormality of the first object for business processing. In one embodiment, the device of the first data party records the node depth of the second node in the first tree, and the method further includes receiving the information of the first object from the device of the calculation party. The identifier of the second node that falls, and the node depth of the second node is sent to the device of the calculation side. Another aspect of this specification provides a device for constructing an isolated forest model based on federated learning. The participants of the federated learning include a computing party and at least two data parties. The device is compared to the first tree in the model The first node is configured in the equipment of the computing side, the at least two data sides include the first data side, and the corresponding relationship between m feature identifiers and each data side is pre-stored in the computing side equipment, and the m feature The identifiers are respectively predetermined identifiers of the m features, and the device includes: The acquisition unit is configured to acquire a plurality of sample identifiers corresponding to the first node, the plurality of sample identifiers correspond to the plurality of samples respectively, and each sample includes the characteristic values of the m features; The selection unit is configured to randomly select a feature identifier from the m feature identifiers; The sending unit is configured to, in the case that the selected feature identifier is the first feature identifier, based on the locally stored correspondence between the first feature identifier and the first data source, the identifier of the first node, the multiple A sample ID and the first feature ID are sent to the first data party; The first recording unit is configured to record the corresponding relationship between the first node and the first data party; The receiving unit is configured to receive information respectively corresponding to two child nodes of the first node from the first data source, so as to construct an isolated forest model for business processing. In one embodiment, the first node is a root node, wherein the acquiring unit is further configured to acquire N sample IDs, and randomly acquire n sample IDs from the N sample IDs, where N>n . In one embodiment, the two sub-nodes include a second node, the information corresponding to the second node includes that the second node is a leaf node, and the device further includes a second recording unit configured to For, record the corresponding relationship between the second node identifier and the first data party. Another aspect of this specification provides an apparatus for constructing an isolated forest model based on federated learning, where the participants of the federated learning include a computing party and at least two data parties, the first tree of the model includes a first node, and the The device is configured in the equipment of the first data party among the at least two data sources, and the equipment of the first data source has the characteristic values of the first characteristics of each sample, and stores the first characteristics and the predetermined The corresponding relationship of the first feature identifier, the device includes: The receiving unit is configured to receive an identifier of a first node, a plurality of sample identifiers, and a first feature identifier from the computing device, wherein the plurality of sample identifiers correspond to a plurality of samples respectively; The selection unit is configured to randomly select a feature value from the feature values of the respective first features of the plurality of samples as the split value of the first node based on the locally stored correspondence between the first feature identifier and the first feature; a recording unit configured to record the correspondence between the first node, the first feature, and the split value; a grouping unit configured to group the plurality of samples based on the split value, so as to construct two child nodes of the first node; The determining unit is configured to respectively determine whether the two child nodes are leaf nodes; The sending unit is configured to, based on the grouping and the determined result, send the information respectively corresponding to the two sub-nodes to the device of the computing side, so as to build an isolated forest model for business processing. In one embodiment, the two sub-nodes include a second node, wherein the information corresponding to the second node includes that the second node is a leaf node, and the device further includes a computing unit configured to, Calculate and store the node depth of the second node. Another aspect of this specification provides a device for predicting the abnormality of objects through the isolated forest model based on federated learning. The participants of the federated learning include a computing party and at least two data parties, and the devices of the computing party store the described The tree structure of the first tree in the model, the data side corresponding to each node in the first tree, the device is configured in the equipment of the computing side, including: The first acquisition unit is configured to acquire the object identifier of the first object; The first sending unit is configured to send the object identifier to each data party; The first receiving unit is configured to receive at least one division result of the first object performed by the data party on at least one corresponding non-leaf node from each data source device; The first determination unit is configured to determine that the first object falls into The first leaf node of ; The second sending unit is configured to, based on the data parties corresponding to the leaf nodes in the first tree, send the identifier of the first leaf node to the first data party corresponding to the first leaf node; The second receiving unit is configured to receive the node depth of the first leaf node from the first data source; The predicting unit is configured to predict the abnormality of the first object based on the node depth for business processing. In one embodiment, the device further includes a second acquisition unit configured to acquire training samples based on the prediction result of the first object, so as to train the supervised learning model. In one embodiment, the apparatus further includes a second determination unit configured to determine the features included in the samples of the isolation forest model based on the parameters of the trained supervised learning model. Another aspect of this specification provides a device for predicting the abnormality of an object through an isolated forest model based on federated learning. The participants of the federated learning include a computing party and at least two data parties, and the first of the at least two data parties is The equipment of the data party records: the first characteristic and the split value of the first node corresponding to it in the first tree, and the equipment of the first data party stores the eigenvalues of the first characteristics of each object, The device is configured in the equipment of the first data party, including: The first receiving unit is configured to receive the item identification of the first item from the computing party's device; The acquiring unit is configured to locally acquire the feature value of the first feature of the first object based on the locally stored first feature of the first node; The division unit is configured to divide the first object at the first node based on the locally stored eigenvalue of the first feature of the first object and the split value of the first node; The first sending unit is configured to send the division result to the computing device, so as to predict the abnormality of the first object for business processing. In one embodiment, the device of the first information party records the node depth of the second node in the first tree, and the device further includes a second receiving unit configured to receive The device receives the identifier of the second node where the first object falls, and the second sending unit is configured to send the node depth of the second node to the computing device. Another aspect of the present specification provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed in the computer, the computer is instructed to execute any one of the above-mentioned methods. Another aspect of the present specification provides a computing device, including a memory and a processor, wherein executable code is stored in the memory, and any one of the above methods is implemented when the processor executes the executable code. Through the scheme of constructing an isolated forest model based on federated learning and using the model for abnormality prediction according to the embodiment of this specification, the isolated forest model can be jointly constructed using the data of multiple data sources, and the data of multiple data sources and the model can be used To predict the abnormality of the object together with the data of each data party, and at the same time protect the data of each data party from being leaked to other parties. While expanding the amount of data used to build the isolated forest model and increasing the prediction accuracy of the model, each data party is protected data security.

下面將結合附圖描述本說明書實施例。 圖1顯示根據本說明書實施例的構建和使用孤立森林模型的場景示意圖。如圖1所示,該場景中包括至少兩個資料方(圖中僅示意顯示A方和C方)和計算方(B方),下文中將以兩個資料方為例進行描述。A方和C方例如為購物平臺和支付平臺,該孤立森林模型例如可用於預測與該兩個平臺共同關聯的各個交易的異常性。其中,A方例如具有各個交易的商品特徵、用戶購買行為特徵等,C方例如具有各個交易的支付特徵、用戶支付行為特徵等,即A方和C方的資料共同構成了交易的特徵資料。從而,A方和C方可基於其各自的資料與B方一起構建孤立森林模型,其中,構建該模型的樣本包括一個交易的各個特徵的特徵值。其中,計算方B方可以為具有相應計算設備的一方,以用於進行模型構建、預測過程的計算,計算方B方也可以為A方和C方中的任一方。 在模型構建過程中,首先B方獲取與A方和C方都關聯的N個交易編號,以該N個交易編號對應的N個交易的特徵資料作為該模型的訓練樣本集,該特徵資料例如可以表示為矩陣X,其中矩陣X包括N行m列,每行與一個交易相對應,每列與交易的一個特徵相對應,即,每個交易具有m個特徵。假設A方具有該N個交易的特徵資料中的一部分XA ,C方具有該N個交易的特徵資料中的另一部分XC ,從而X=(XA XC )。從N個交易編號中隨機獲取n個交易編號,以將該n個交易編號對應的n個交易的特徵資料作為模型中的一棵樹的訓練樣本集。 在訓練開始之前,A、B、C三方可共同協商出各個特徵的特徵標識,同時使得B方不會知道A方、C方的特徵,A方和C方彼此也不知道對方擁有的特徵。例如,A方、C方分別設定本方具有的各個特徵對應的特徵標識,並將各個特徵標識發送給B方,其中A方和C方可通過協商以確定兩方的特徵標識之間沒有重複。從而B方記錄有m個特徵標識及其各自對應的資料方。在B方設備中,針對模型的根節點(節點1),從m個特徵標識中隨機選擇一個特徵標識(f1)。假設B方設備中記錄了f1與A方相對應,從而,B方記錄節點1與A方對應,並將例如“節點1,f1,n個交易編號”發送給A方。A方在接收到該資訊之後,基於本地的記錄,確定f1為特徵q1(例如商品價格)的標識,從而從本地的與n個交易編號對應的n個交易各自的特徵q1的值中隨機選擇一個值作為節點1的分裂值p1,並基於q1和p1對該n個交易進行分裂,以獲取分別落入該節點1的兩個子節點2和節點3的交易編號集合Sl 和Sr 。在基於預定規則判斷節點2和節點3都不是葉子節點之後,A方將Sl 和Sr 發送給B方,從而使得B方分別相對於節點2和節點3重複上述針對節點1的過程,從而構建出如圖中所示的孤立樹。在例如後續確定節點3與A方對應,並且A方判斷節點3的子節點7為葉子節點時,A方通知B方節點7為葉子節點,同時計算並儲存節點7的節點深度。B方在接收到節點7為葉子節點之後,構建樹中的葉子節點7,並記錄節點7與A方對應。通過同樣的方法構建多個孤立樹,從而構建孤立森林。在構建完成之後,B方記錄有各個樹的樹結構,以及樹中的各個節點對應的資料方,A方記錄有該模型的部分參數

Figure 02_image001
,其中包括:A方對應的各個非葉子節點的分裂特徵、分裂值、及A方對應的各個葉子節點的節點深度。類似地,C方記錄有該模型的部分參數
Figure 02_image003
。 圖2示意顯示了通過上述構建過程使得B方獲取的所構建的模型中的一棵樹(例如樹1)的結構圖,該結構圖中示意顯示了11個節點及各個節點之間的連接關係,其中,各個節點內部標出的數字為該節點的節點標識,各個節點外部標出的字母(如A或C)為資料方的標識,以表示該節點對應的資料方。 在構建完成上述孤立森林模型之後,可使用該模型對待預測物件進行異常性預測。例如,需要對交易1預測其異常性,則B方將交易1的編號發送給A方和C方。A方和C方基於其各自的部分模型參數和交易1的部分特徵的特徵值,在其對應的節點中對交易1進行劃分,並將劃分結果都發送給B方。B方結合A方和C方的劃分結果,從而確定交易1落入的葉子節點,並從該葉子節點對應的一方(例如A方)接收該葉子節點的節點深度。從而,B方交易1落入基於模型中各個樹中的葉子節點的節點深度,可計算交易1的平均節點深度,並基於該平均深度確定交易1的異常性。 可以理解,上文中參考圖1的描述僅僅是示意性的,而不是限定性的,例如,所述至少兩個資料方可包括更多的資料方,所述樣本不限於為交易樣本,等等,下文中將詳細描述上述模型構建過程和模型預測過程。 圖3示意顯示根據本說明書實施例的基於聯邦學習構建圖2中節點1的方法時序圖。如上文所述,所述聯邦學習的參與方例如包括上述A、B、C三方,該時序圖中顯示了在構建過程中作為資料方的A方與作為計算方的B方之間互動時序圖。可以理解,參與聯邦學習的其它資料方與B方之間的互動也是類似的。下面將結合圖2和圖3描述相對於節點1的構建過程。其中,圖3顯示和下面步驟中所述的A方和B方都表示A方設備和B方設備。 如上文參考圖1中所述,B方預先儲存了m個特徵標識與各個資料方的對應關係,例如,m個特徵中包括特徵q1(特徵q1例如為“商品價格”),該特徵的特徵資料由A方擁有,從而A方可預先確定該特徵q1對應的特徵標識為f1,在本地記錄q1與f1的對應關係,並將f1發送給B方,從而B方可記錄f1與A方相對應。通過這種方式,B方並不能知道A方具有什麼特徵。 在開始構建之後,參考圖3,首先,在步驟302,B方獲取與節點1對應的n個樣本標識。節點1為樹1的根節點,如上文所述,與該節點1對應的樣本標識為從N個樣本標識中隨機選取的n個樣本標識。如上文所述,所述N個樣本標識例如為與A方、C方都關聯的交易編號,在此不再詳述。通過從N個樣本標識中進行多次隨機選取,從而可確定多組樣本標識集合,每個集合中包括n個樣本標識,從而可使用每個集合對應的n個樣本訓練模型中的一顆樹,從而可訓練整個孤立森林。通過以這樣的方式確定多個樣本集以分別訓練森林中的各棵樹,可以以減少的資料訓練每棵樹,同時保證整個模型的預測準確率。 在步驟304,B方從m個特徵標識隨機選取一個特徵標識,例如,該隨機選取的特徵標識為f1。 在步驟306,B方基於本地儲存的對應關係,確定f1與A方對應。如上文所述,B方預先儲存了m個特徵標識與各個資料方的對應關係,其中包括f1與A方相對應。如上文所述,該對應關係通過由A方、B方、C方預先共同協商確定,並由B方獲取,在此不再詳述。 在步驟308,B方將節點1標識(即“節點1”)、n個樣本標識和“f1”發送給A方。 在步驟310,B方在本地記錄節點1與A方的對應關係。該記錄可通過多種方式進行,例如,可如圖2中所示,在圖中的樹1的節點1處標記“A”,從而指示節點1與A方對應,或者,可以以表格的形式將“節點1”與“A”關聯地記錄,從而,確定節點1與A方相對應。 在步驟312,A方在接收到B方發送的“節點1”、n個樣本標識和“f1”之後,基於本地儲存的對應關係確定f1對應於特徵q1,從而以q1作為節點1的分裂特徵。 在步驟314,A方從n個樣本標識對應的n個樣本的q1的特徵值中隨機選擇一個特徵值作為節點1的分裂值,例如,該選擇的值為p1。 在步驟316,A方在通過上述步驟確定節點1的分裂特徵q1和分裂值p1之後,記錄節點1的分裂特徵q1和分裂值p1。 在步驟318,A方基於分裂值p1將n個樣本分到節點1的兩個子節點中,即圖2中的節點2和節點3。例如,可設定,如果樣本的q1值<p1,則將該樣本分到左邊的子節點,即節點2,如果樣本的q1值≥p1,則將該樣本分到右邊的子節點,即節點3。 在步驟320,A方確定節點2和節點3是否為葉子節點。可基於預定規則確定節點2和節點3是否為子節點。例如,如果節點的節點深度達到預定深度(例如最大深度),則該節點為葉子節點,如果節點中只有一個樣本、或者節點中的多個樣本具有相同的特徵資料從而無法區分,則該節點為葉子節點。 在步驟322,A方在確定節點2和節點3都不是葉子節點之後,將節點2和節點3中各自包括的樣本標識發送給B方。從而B方具有用於構建節點2的u個節點標識和節點3的v個節點標識,從而可以分別針對節點2和節點3執行上述針對節點1執行的過程,以用於繼續構建節點2和節點3,進而構建出整棵樹。 圖4示意顯示根據本說明書實施例的基於聯邦學習構建圖2中節點2的方法時序圖。該時序圖顯示了在構建過程中作為資料方的C方與B方之間的互動時序圖。下面將結合圖2和圖4描述相對於節點2的構建過程。其中,與上文類似地,圖4中和下面描述中的C方表示C方設備。 其中,所述m個特徵中例如還包括特徵q2(特徵q2例如為“支付金額”),該特徵的特徵資料由C方擁有,從而C方可預先確定q2對應的特徵標識為f2,在本地記錄q2與f2的對應關係,並將f2發送給B方,從而B方記錄有f2與C方的對應關係。 在開始構建之後,參考圖4,在步驟402,B方獲取與節點2對應的u個樣本標識,即B方從A方接收分到節點2的u個樣本標識。在步驟404,B方從m個特徵標識隨機選擇一個特徵標識,例如f2。在步驟406,B方基於本地儲存的對應關係,確定f2與C方相對應。在步驟408,B方將“節點2”、u個樣本標識和“f2”發送給C方。在步驟410,B方記錄節點2與C方相對應。在步驟412,C方在接收到從B方發送的“節點2”、u個樣本標識和“f2”之後,基於本地儲存的對應關係,確定以f2對應的特徵q2作為節點2的分裂特徵。在步驟414,C方從u個樣本的q2的特徵值中隨機選擇一個特徵值,例如p2,作為節點2的分裂值。在步驟416,C方記錄節點的分裂特徵q2和分裂值p2。在步驟418,C方基於p2將u個樣本分到節點4和節點5中。在步驟420,C方確定節點4和節點5是否為葉子節點。其中步驟404~步驟420可參考上文對步驟304~步驟320的描述,在此不再贅述。 在步驟422,C方基於步驟420的確定,例如確定節點4不是葉子節點,節點5是葉子節點,從而C方將分到節點4的g個樣本標識發送給B方,同時通知B方“節點5為葉子節點”。 在步驟424,B方在接收到“節點5為葉子節點”之後,B方可將節點5標記為葉子節點,從而不再對節點5進行樣本分裂,同時B方在本地記錄節點5與C方相對應。 在步驟426,C方在確定節點5是葉子節點之後,計算並儲存節點5的節點深度。在一個實施例中,可通過如下的公式(1)計算節點5的節點深度: H=e+c(n)                 (1) 其中,c(n)如公式(2)所示:
Figure 02_image005
,         (2) 其中,e為節點5與根節點(節點1)之間的邊數(即,2),n為該樹的訓練樣本數,H(n)為調和級數,其可以由ln(n)+0,5772156649(歐拉常數)來估計。在孤立森林模型中,葉子節點的節點深度越小,分到該葉子節點中的樣本是異常樣本的可能性越大。 在如上所述構建了節點2之後,可通過同樣地方式構建樹1中的幾個非葉子節點,節點3、節點4、和節點6,從而構建出如圖2所示的樹1的結構。例如,通過上述隨機確定的方式,可確定節點1、節點3和節點4與A方相對應,節點2和節點6與C方相對應,從而可相應地確定,葉子節點7、8、9與A方對應,葉子節點5、10、11與C方相對應,如圖2中所示。而在A方和C方分別記錄了其對應的節點、該節點的分裂特徵和分裂值。也就是說,A方、B方和C方分別擁有該孤立森林模型的部分資料。從而,在通過該模型進行物件預測時,需要三方協同進行。 圖5示意顯示根據本說明書實施例的基於聯邦學習通過孤立森林模型預測物件異常性的方法時序圖。 如圖5所述,首先在步驟502,B方獲取待預測物件的物件標識x,與上述樣本標識類似地,該物件標識例如為交易編號,該待預測物件為待預測的一個交易,同樣地,該交易x的交易特徵資料由A方和B方的資料共同構成。B方可主動發起對交易x異常性的預測,或者B方作為伺服器接收來自客戶端的預測交易x異常性的請求,從而開始執行該方法。 在步驟504,B方將物件標識x分別發送給A方和C方,圖中雖然顯示B方在相同的時間對A方和C方進行發送,本實施例對此並不限定。 在步驟506,A方和C方分別在其對應的至少一個節點處對物件x進行劃分。由上文所述,例如A方與節點1、節點3和節點4對應,其具有節點1的特徵q1和分裂值p1,節點3的特徵q3和分裂值p3,和節點4的特徵q4和分裂值p4,並且A方具有物件的特徵q1的值v1、特徵q3的值v2、和特徵q4的值v4。從而,A方可在節點1可基於v1和p1對物件x進行劃分,例如v1<p1,從而,將物件x分到節點1的左邊的子節點中,類似地,A方基於v3和q3將物件x劃分到節點3左邊的子節點中,基於v4和q4將物件x劃分到節點4右邊的子節點中。類似地,C方與節點2和節點6對應,其在節點2將物件x劃分到左邊的子節點中,在節點6將物件x劃分到右邊的子節點中。 在步驟508,A方和C方將其在各個節點對物件x的劃分結果發送給B方。可以理解,圖中雖然顯示A方和C方在相同的時間執行該步驟,本實施例對此並不限定。 在步驟510,B方基於接收的劃分結果確定物件x落入的葉子節點,即節點9。圖6顯示了B方結合A方和C方的劃分結果確定物件x落入葉子節點的示意圖。如圖6中所示,B方合併A方和C方在各個節點對物件x的劃分,從而可從節點1開始找到物件x的劃分路徑,即,節點1→節點2→節點4→節點9,從而可確定物件x最終落入葉子節點9中。 在步驟512,B方基於本地的對應關係,確定節點9與A方相對應,從而將“節點9”發送給A方。 在步驟514,A方將節點9的節點深度發送給B方。 在步驟516,B方基於節點9的節點深度,預測物件x的異常性。在一個實施例中,可通過節點9的平均節點深度來預測物件x的異常性。B方在根據同樣的方法獲取物件x在各棵樹中的節點深度之後,可計算物件x的平均節點深度E(h(x)),該平均節點深度越大,說明物件x分到的葉子節點距離根節點越遠,從而物件x的異常性越小,反之,該平均節點深度越小,則物件x的異常性越大。 在一個實施例中,可通過公式(3)所示的異常分數來預測物件x的異常性:
Figure 02_image007
(3) 其中,c(n)如上述公式(2)所示。可驗證,s的值在0到1之間,s越小,表示該物件的異常性越小,s越大,表示該物件的異常性越大。 在獲取物件的異常性之後,可進行多種業務處理。例如,該物件為交易,在確定該交易為異常交易之後,可進行對該交易的人工核查,以防止發生詐欺事件。或者,可將該交易的資料及標籤值作為訓練樣本,用於訓練多方監督學習模型,如防詐欺的多方監督學習模型等。 圖7示意顯示了根據本說明書實施例的多方無監督學習模型與多方監督學習模型之間的相互最佳化過程。如圖7中所示,結合人工(例如專家)標註的樣本和通過根據本說明書的孤立森林標註的樣本,可半自動地獲取訓練樣本集,從而訓練多方監督學習模型;結合人工確定的特徵、和基於多方監督學習模型參數確定的特徵,可半自動地確定用於訓練孤立森林模型的樣本特徵,從而最佳化孤立森林模型的訓練。具體是,確定用於訓練孤立森林模型的樣本的多個特徵之後,可將該多個特徵分別對應的多個特徵標識發送給B方,從而使得B方在再次進行對該多方孤立森林模型的訓練時,基於所述多個特徵標識執行圖3或圖4所示方法。同時,可通過訓練的多方監督學習模型對物件異常性進行自動預測,例如基於待預測物件的異常性進行風險識別等。 圖8顯示本說明書實施例的一種基於聯邦學習構建孤立森林模型的裝置800,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述裝置相對於所述模型中的第一樹中的第一節點配置於計算方的設備中,所述至少兩個資料方包括第一資料方,所述計算方設備中預先儲存了m個特徵標識與各個資料方的對應關係,所述m個特徵標識分別為m個特徵各自的預定標識,所述裝置包括: 獲取單元81,配置為,獲取與第一節點對應的多個樣本標識,所述多個樣本標識與多個樣本分別對應,每個樣本包括所述m個特徵的特徵值; 選擇單元82,配置為,從所述m個特徵標識中隨機選擇一個特徵標識; 發送單元83,配置為,在所述選擇的特徵標識為第一特徵標識的情況中,基於本地儲存第一特徵標識與第一資料方的對應關係,將所述第一節點的標識、所述多個樣本標識和所述第一特徵標識發送給所述第一資料方; 第一記錄單元84,配置為,記錄所述第一節點與所述第一資料方的對應關係; 接收單元85,配置為,從所述第一資料方接收與所述第一節點的兩個子節點分別對應的資訊,從而構建孤立森林模型以用於進行業務處理。 在一個實施例中,所述第一節點為根節點,其中,所述獲取單元81還配置為,獲取N個樣本標識,從所述N個樣本標識中隨機獲取n個樣本標識,其中N>n。 在一個實施例中,所述兩個子節點中包括第二節點,與所述第二節點對應的資訊包括,所述第二節點為葉子節點,所述裝置還包括,第二記錄單元86,配置為,記錄所述第二節點標識與所述第一資料方的對應關係。 圖9顯示根據本說明書實施例的一種基於聯邦學習構建孤立森林模型的裝置900,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述模型的第一樹中包括第一節點,所述裝置配置在所述至少兩個資料方中的第一資料方的設備中,所述第一資料方的設備中擁有各個樣本的第一特徵的特徵值,並且儲存有第一特徵與預先確定的第一特徵標識的對應關係,所述裝置包括: 接收單元91,配置為,從所述計算方的設備接收第一節點的標識、多個樣本標識和第一特徵標識,其中,所述多個樣本標識與多個樣本分別對應; 選擇單元92,配置為,基於本地儲存第一特徵標識與第一特徵的對應關係,從所述多個樣本各自的第一特徵的特徵值中隨機選擇一個特徵值作為第一節點的分裂值; 記錄單元93,配置為,記錄所述第一節點與所述第一特徵和所述分裂值的對應關係; 分組單元94,配置為,基於所述分裂值對所述多個樣本進行分組,以構建所述第一節點的兩個子節點; 確定單元95,配置為,分別確定所述兩個子節點是否為葉子節點; 發送單元96,配置為,基於所述分組和確定的結果,將與兩個子節點分別對應的資訊發送給所述計算方的設備,從而構建孤立森林模型以用於進行業務處理。 在一個實施例中,所述兩個子節點中包括第二節點,其中,與第二節點對應的資訊包括,所述第二節點為葉子節點,所述裝置還包括,計算單元97,配置為,計算並儲存所述第二節點的節點深度。 圖10顯示根據本說明書實施例的一種基於聯邦學習通過孤立森林模型預測物件異常性的裝置1000,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述計算方的設備中儲存有所述模型中第一樹的樹結構、所述第一樹中各個節點對應的資料方,所述裝置配置於所述計算方的設備中,包括: 第一獲取單元101,配置為,獲取第一物件的物件標識; 第一發送單元102,配置為,將所述物件標識發送給各個資料方; 第一接收單元103,配置為,從各個資料方設備接收該資料方在其對應的至少一個非葉子節點分別進行的對所述第一物件的至少一次劃分結果; 第一確定單元104,配置為,基於第一樹的樹結構、以及來自所述至少兩個資料方設備的在各個非葉子節點對所述第一物件的劃分結果,確定所述第一物件落入的第一葉子節點; 第二發送單元105,配置為,基於所述第一樹中的葉子節點各自對應的資料方,將所述第一葉子節點的標識發送給與所述第一葉子節點對應的第一資料方; 第二接收單元106,配置為,從所述第一資料方接收所述第一葉子節點的節點深度; 預測單元107,配置為,基於所述節點深度預測第一物件的異常性,以用於進行業務處理。 在一個實施例中,所述裝置還包括,第二獲取單元108,配置為,基於對所述第一物件的預測結果,獲取訓練樣本,以用於訓練監督學習模型。 在一個實施例中,所述裝置還包括,第二確定單元109,配置為,基於所述訓練好的監督學習模型的參數,確定所述孤立森林模型的樣本包括的特徵。 圖11顯示根據本說明書實施例的一種基於聯邦學習通過孤立森林模型預測物件異常性的裝置1100,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述至少兩個資料方中的第一資料方的設備中記錄有:其對應的所述第一樹中第一節點的第一特徵和分裂值,並且所述第一資料方的設備中儲存有各個物件的第一特徵的特徵值,所述裝置配置於所述第一資料方的設備中,包括: 第一接收單元111,配置為,從所述計算方的設備接收第一物件的物件標識; 獲取單元112,配置為,基於本地儲存的第一節點的第一特徵,從本地獲取所述第一物件的第一特徵的特徵值; 劃分單元113,配置為,基於本地儲存的所述第一物件的第一特徵的特徵值和所述第一節點的分裂值,在第一節點對所述第一物件進行劃分; 第一發送單元114,配置為,將所述劃分的結果發送給所述計算方的設備,從而用於預測所述第一物件的異常性以用於進行業務處理。 在一個實施例中,所述第一資料方的設備中記錄有所述第一樹中第二節點的節點深度,所述裝置還包括,第二接收單元115,配置為,從所述計算方的設備接收所述第一物件所落入的第二節點的標識,以及第二發送單元116,配置為,將所述第二節點的節點深度發送給所述計算方的設備。 本說明書另一態樣提供一種電腦可讀儲存媒體,其上儲存有電腦程式,當所述電腦程式在電腦中執行時,令電腦執行上述任一項方法。 本說明書另一態樣提供一種計算設備,包括記憶體和處理器,其中,所述記憶體中儲存有可執行碼,所述處理器執行所述可執行碼時,實現上述任一項方法。 通過根據本說明書實施例的基於聯邦學習構建孤立森林模型並使用該模型進行異常性預測的方案,可使用多個資料方的資料共同構建孤立森林模型,並使用多個資料方的資料和該模型的資料共同對物件異常性進行預測,同時保護各個資料方的資料不洩露給其它方,在擴充了用於構建孤立森林模型的資料量、增加模型的預測準確率的同時,保護了各個資料方的資料安全。 需要理解,本文中的“第一”,“第二”等描述,僅僅為了描述的簡單而對相似概念進行區分,並不具有其他限定作用。 本說明書中的各個實施例均採用遞進的方式描述,各個實施例之間相同相似的部分互相參見即可,每個實施例重點說明的都是與其他實施例的不同之處。尤其,對於系統實施例而言,由於其基本相似於方法實施例,所以描述的比較簡單,相關之處參見方法實施例的部分說明即可。 上述對本說明書特定實施例進行了描述。其它實施例在所附申請專利範圍的範圍內。在一些情況下,在申請專利範圍中記載的動作或步驟可以按照不同於實施例中的順序來執行並且仍然可以實現期望的結果。另外,在附圖中描繪的過程不一定要求顯示的特定順序或者連續順序才能實現期望的結果。在某些實施方式中,多任務處理和並行處理也是可以的或者可能是有利的。 本領域普通技術人員應該還可以進一步意識到,結合本文中所公開的實施例描述的各範例的單元及演算法步驟,能夠以電子硬體、電腦軟體或者二者的結合來實現,為了清楚地說明硬體和軟體的可互換性,在上述說明中已經按照功能一般性地描述了各範例的組成及步驟。這些功能究竟以硬體還是軟體方式來執軌道,取決於技術方案的特定應用和設計約束條件。本領域普通技術人員可以對每個特定的應用來使用不同方法來實現所描述的功能,但是這種實現不應認為超出本發明的範圍。 結合本文中所公開的實施例描述的方法或演算法的步驟可以用硬體、處理器執軌道的軟體模組,或者二者的結合來實施。軟體模組可以置於隨機記憶體(RAM)、記憶體、唯讀記憶體(ROM)、電可程式化ROM、電可抹除可程式化ROM、暫存器、硬碟、可移動磁碟、CD-ROM、或技術領域內所眾所皆知的任意其它形式的儲存媒體中。 以上所述的具體實施方式,對本發明的目的、技術方案和有益效果進行了進一步詳細說明,所應理解的是,以上所述僅為本發明的具體實施方式而已,並不用於限定本發明的保護範圍,凡在本發明的精神和原則之內,所做的任何修改、等同替換、改進等,均應包含在本發明的保護範圍之內。Embodiments of this specification will be described below with reference to the accompanying drawings. Fig. 1 shows a schematic diagram of a scenario of constructing and using an isolated forest model according to an embodiment of the present specification. As shown in Figure 1, this scenario includes at least two data parties (only A and C parties are schematically shown in the figure) and a calculation party (B party). The following two data parties will be used as examples for description. Party A and Party C are, for example, a shopping platform and a payment platform, and the isolated forest model, for example, can be used to predict the abnormality of transactions associated with the two platforms. Among them, party A has, for example, the commodity characteristics of each transaction, user purchase behavior characteristics, etc., and party C, for example, has the payment characteristics of each transaction, user payment behavior characteristics, etc., that is, the information of party A and party C together constitute the characteristic information of the transaction. Therefore, Party A and Party C can construct an isolation forest model together with Party B based on their respective data, wherein the samples for constructing the model include the characteristic values of each characteristic of a transaction. Wherein, the calculation party B can be a party with corresponding calculation equipment for the calculation of model building and prediction process, and the calculation party B can also be any one of A party and C party. In the process of model building, first, party B obtains N transaction numbers associated with both party A and party C, and uses the feature data of N transactions corresponding to the N transaction numbers as the training sample set of the model. The feature data is, for example, It can be expressed as a matrix X, where the matrix X includes N rows and m columns, each row corresponds to a transaction, and each column corresponds to a feature of the transaction, that is, each transaction has m features. Assume that party A has a part X A of the characteristic data of the N transactions, and party C has another part X C of the characteristic data of the N transactions, so X=(X A X C ). Randomly obtain n transaction numbers from the N transaction numbers, and use the characteristic data of n transactions corresponding to the n transaction numbers as a training sample set of a tree in the model. Before the training starts, parties A, B, and C can jointly negotiate the feature identification of each feature, and at the same time, party B will not know the features of party A and party C, and party A and party C will not know each other's features. For example, Party A and Party C respectively set the characteristic identifiers corresponding to each characteristic of their own party, and send each characteristic identifier to Party B, where Party A and Party C can negotiate to determine that there is no duplication between the characteristic identifiers of the two parties . Therefore, party B records m feature identifiers and their corresponding data parties. In the B-party device, for the root node (node 1) of the model, a feature identifier (f1) is randomly selected from m feature identifiers. Assume that party B records that f1 corresponds to party A, so party B records that node 1 corresponds to party A, and sends, for example, "node 1, f1, n transaction numbers" to party A. After receiving the information, Party A determines f1 as the identifier of feature q1 (such as commodity price) based on local records, and randomly selects from the values of feature q1 of each of n transactions corresponding to n transaction numbers locally A value is used as the split value p1 of node 1, and the n transactions are split based on q1 and p1 to obtain the transaction number sets S l and S r falling into the two child nodes 2 and 3 of node 1 respectively. After judging that neither node 2 nor node 3 is a leaf node based on a predetermined rule, party A sends S1 and Sr to party B, so that party B repeats the above-mentioned process for node 1 with respect to node 2 and node 3 respectively, thus An isolated tree as shown in the figure is constructed. For example, when it is later determined that node 3 corresponds to party A, and party A judges that child node 7 of node 3 is a leaf node, party A notifies party B that node 7 is a leaf node, and calculates and stores the node depth of node 7 at the same time. After receiving node 7 as a leaf node, party B constructs leaf node 7 in the tree, and records that node 7 corresponds to party A. Build multiple isolated trees by the same method to build an isolated forest. After the construction is completed, party B records the tree structure of each tree and the data party corresponding to each node in the tree, and party A records some parameters of the model
Figure 02_image001
, including: the split feature and split value of each non-leaf node corresponding to party A, and the node depth of each leaf node corresponding to party A. Similarly, party C records some parameters of the model
Figure 02_image003
. Fig. 2 schematically shows a structure diagram of a tree (such as tree 1) in the constructed model obtained by party B through the above construction process, and the structure diagram schematically shows 11 nodes and the connection relationship between each node , wherein, the number marked inside each node is the node identifier of the node, and the letter (such as A or C) marked outside each node is the identifier of the data party to indicate the data party corresponding to the node. After constructing the isolated forest model above, the model can be used to predict the abnormality of the object to be predicted. For example, if it is necessary to predict the abnormality of transaction 1, party B sends the number of transaction 1 to party A and party C. Party A and Party C divide transaction 1 in their corresponding nodes based on their respective model parameters and some characteristic values of transaction 1, and send the division results to party B. Party B combines the division results of Party A and Party C to determine the leaf node where transaction 1 falls, and receives the node depth of the leaf node from the party corresponding to the leaf node (for example, Party A). Therefore, transaction 1 of party B falls into the node depth based on the leaf nodes in each tree in the model, the average node depth of transaction 1 can be calculated, and the abnormality of transaction 1 can be determined based on the average depth. It can be understood that the above description with reference to FIG. 1 is only illustrative and not limiting. For example, the at least two data parties may include more data parties, and the samples are not limited to transaction samples, etc. , the above-mentioned model building process and model prediction process will be described in detail below. FIG. 3 schematically shows a sequence diagram of a method for constructing node 1 in FIG. 2 based on federated learning according to an embodiment of the present specification. As mentioned above, the participants of the federated learning include, for example, the above-mentioned parties A, B, and C. The sequence diagram shows a sequence diagram of the interaction between party A as the data party and party B as the computing party during the construction process . It can be understood that the interaction between other data parties participating in federated learning and party B is also similar. The construction process relative to node 1 will be described below with reference to FIG. 2 and FIG. 3 . Wherein, the A party and the B party shown in FIG. 3 and described in the following steps both represent the A party device and the B party device. As mentioned above with reference to Figure 1, party B pre-stores the correspondence between m feature identifiers and each data party, for example, m features include feature q1 (feature q1 is, for example, "commodity price"), and the feature of this feature The data is owned by party A, so that party A can pre-determine that the feature identifier corresponding to the feature q1 is f1, record the corresponding relationship between q1 and f1 locally, and send f1 to party B, so that party B can record the relationship between f1 and party A correspond. In this way, party B cannot know what characteristics party A has. After starting construction, referring to FIG. 3 , first, in step 302, party B acquires n sample identifiers corresponding to node 1 . Node 1 is the root node of tree 1. As mentioned above, the sample IDs corresponding to node 1 are n sample IDs randomly selected from N sample IDs. As mentioned above, the N sample identifiers are, for example, transaction numbers associated with both party A and party C, which will not be described in detail here. By performing multiple random selections from N sample IDs, multiple sets of sample ID sets can be determined, and each set includes n sample IDs, so that a tree in the model can be trained using n samples corresponding to each set , so that the entire isolation forest can be trained. By determining multiple sample sets in this way to train each tree in the forest separately, each tree can be trained with reduced data while ensuring the prediction accuracy of the entire model. In step 304, Party B randomly selects a feature identifier from the m feature identifiers, for example, the randomly selected feature identifier is f1. In step 306, party B determines that f1 corresponds to party A based on the locally stored correspondence. As mentioned above, party B pre-stores the correspondence between m feature identifiers and various data parties, including the correspondence between f1 and party A. As mentioned above, the corresponding relationship is determined through pre-negotiation by party A, party B, and party C, and is obtained by party B, which will not be described in detail here. In step 308, party B sends the node 1 identification (ie "node 1"), n sample identifications and "f1" to party A. In step 310, party B locally records the corresponding relationship between node 1 and party A. This record can be done in many ways, for example, as shown in Figure 2, mark "A" at node 1 of tree 1 in the figure, thereby indicating that node 1 corresponds to party A, or, can be in the form of a table "Node 1" is recorded in association with "A", and thus, it is determined that Node 1 corresponds to the A side. In step 312, after receiving "node 1", n sample identifiers and "f1" sent by party B, party A determines that f1 corresponds to feature q1 based on the locally stored correspondence, so that q1 is used as the splitting feature of node 1 . In step 314, party A randomly selects an eigenvalue from the eigenvalues of q1 of n samples corresponding to n sample identifiers as the split value of node 1, for example, the selected value is p1. In step 316, party A records the split feature q1 and split value p1 of node 1 after determining the split feature q1 and split value p1 of node 1 through the above steps. In step 318, party A divides n samples into two child nodes of node 1 based on the split value p1, namely node 2 and node 3 in FIG. 2 . For example, it can be set that if the value of q1 of the sample is < p1, then the sample will be assigned to the left child node, that is, node 2, and if the value of q1 of the sample ≥ p1, then the sample will be assigned to the right child node, that is, node 3 . In step 320, Party A determines whether Node 2 and Node 3 are leaf nodes. Whether the node 2 and the node 3 are child nodes may be determined based on a predetermined rule. For example, if the node depth of the node reaches a predetermined depth (such as the maximum depth), the node is a leaf node, and if there is only one sample in the node, or multiple samples in the node have the same feature data and cannot be distinguished, the node is leaf nodes. In step 322, after determining that neither node 2 nor node 3 is a leaf node, party A sends the respective sample identifiers included in node 2 and node 3 to party B. Thus party B has u node IDs for building node 2 and v node IDs for node 3, so that the above-mentioned process performed for node 1 can be performed for node 2 and node 3 respectively, so as to continue to build node 2 and node 3 3, and then build the whole tree. FIG. 4 schematically shows a sequence diagram of a method for constructing node 2 in FIG. 2 based on federated learning according to an embodiment of the present specification. The sequence diagram shows the interaction sequence diagram between party C and party B as the data party during the construction process. The construction process relative to node 2 will be described below with reference to FIG. 2 and FIG. 4 . Wherein, similar to the above, the C-party in FIG. 4 and in the following description represents the C-party device. Wherein, the m features also include, for example, feature q2 (feature q2 is, for example, "payment amount"), and the feature data of this feature is owned by party C, so that party C can predetermine that the feature identifier corresponding to q2 is f2, and locally Record the corresponding relationship between q2 and f2, and send f2 to party B, so that party B records the corresponding relationship between f2 and party C. After starting the construction, referring to FIG. 4 , in step 402, party B acquires u sample IDs corresponding to node 2, that is, party B receives u sample IDs assigned to node 2 from party A. In step 404, Party B randomly selects a feature identifier, such as f2, from the m feature identifiers. In step 406, party B determines that f2 corresponds to party C based on the locally stored correspondence. In step 408, party B sends "node 2", u sample identifiers and "f2" to party C. At step 410, the B-party records that node 2 corresponds to the C-party. In step 412, after receiving "node 2", u sample identifiers and "f2" sent from party B, party C determines the feature q2 corresponding to f2 as the splitting feature of node 2 based on the locally stored correspondence. In step 414, party C randomly selects an eigenvalue, such as p2, from the eigenvalues of q2 of u samples, as the split value of node 2. In step 416, party C records the split feature q2 and split value p2 of the node. In step 418, party C divides u samples into nodes 4 and 5 based on p2. In step 420, Party C determines whether Node 4 and Node 5 are leaf nodes. For steps 404 to 420, reference may be made to the description of steps 304 to 320 above, and details will not be repeated here. In step 422, based on the determination in step 420, party C, for example, determines that node 4 is not a leaf node and node 5 is a leaf node, so that party C sends the g sample identifiers assigned to node 4 to party B, and at the same time notifies party B that "node 5 is a leaf node". In step 424, after party B receives "node 5 is a leaf node", party B can mark node 5 as a leaf node, so as not to perform sample splitting on node 5, and party B locally records node 5 and party C Corresponding. In step 426, party C calculates and stores the node depth of node 5 after determining that node 5 is a leaf node. In one embodiment, the node depth of node 5 can be calculated by the following formula (1): H=e+c(n) (1) wherein, c(n) is as shown in formula (2):
Figure 02_image005
, (2) Among them, e is the number of edges (ie, 2) between node 5 and the root node (node 1), n is the number of training samples of the tree, H(n) is the harmonic series, which can be determined by ln (n)+0,5772156649 (Euler's constant) to estimate. In the isolated forest model, the smaller the node depth of the leaf node, the greater the possibility that the samples assigned to the leaf node are abnormal samples. After the node 2 is constructed as described above, several non-leaf nodes in the tree 1, node 3, node 4, and node 6 can be constructed in the same manner, so as to construct the structure of the tree 1 as shown in FIG. 2 . For example, through the above random determination method, it can be determined that node 1, node 3 and node 4 correspond to party A, and nodes 2 and 6 correspond to party C, so that it can be determined accordingly that leaf nodes 7, 8, 9 and Party A corresponds, and leaf nodes 5, 10, and 11 correspond to Party C, as shown in FIG. 2 . In the A side and the C side, the corresponding node, the split feature and the split value of the node are respectively recorded. That is to say, party A, party B and party C respectively own part of the data of the isolated forest model. Therefore, when predicting objects through this model, three parties are required to cooperate. FIG. 5 schematically shows a sequence diagram of a method for predicting abnormality of an object through an isolated forest model based on federated learning according to an embodiment of the present specification. As shown in Figure 5, first at step 502, Party B obtains the object identifier x of the object to be predicted. Similar to the above-mentioned sample identifier, the object identifier is, for example, a transaction number, and the object to be predicted is a transaction to be predicted. Similarly , the transaction feature data of the transaction x is composed of the data of party A and party B. Party B can actively initiate the prediction of the abnormality of transaction x, or party B, as a server, receives a request from the client to predict the abnormality of transaction x, thereby starting to execute the method. In step 504, party B sends the object identifier x to party A and party C respectively. Although it is shown in the figure that party B sends the object identifier x to party A and party C at the same time, this embodiment does not limit this. In step 506, Party A and Party C respectively divide the object x at at least one corresponding node. From the above, for example, party A corresponds to node 1, node 3 and node 4, which has the feature q1 and split value p1 of node 1, the feature q3 and split value p3 of node 3, and the feature q4 and split value of node 4 value p4, and party A has the value v1 of the feature q1, the value v2 of the feature q3, and the value v4 of the feature q4 of the object. Therefore, party A can divide object x based on v1 and p1 at node 1, for example, v1<p1, thus, divide object x into the left child node of node 1, similarly, party A can divide object x based on v3 and q3 Object x is divided into the left child node of node 3, and object x is divided into the right child node of node 4 based on v4 and q4. Similarly, the C side corresponds to nodes 2 and 6, which divides the object x into the left child node at node 2, and divides the object x into the right child node at node 6. In step 508, party A and party C send the division results of object x at each node to party B. It can be understood that although it is shown in the figure that party A and party C perform this step at the same time, this embodiment does not limit it. In step 510, party B determines the leaf node into which object x falls, ie node 9, based on the received division result. Fig. 6 shows a schematic diagram of party B combining the division results of party A and party C to determine that object x falls into a leaf node. As shown in Figure 6, party B merges the division of object x by party A and party C at each node, so that the division path of object x can be found from node 1, that is, node 1→node 2→node 4→node 9 , so that it can be determined that the object x finally falls into the leaf node 9. In step 512, party B determines that node 9 corresponds to party A based on the local correspondence, and sends "node 9" to party A. In step 514, party A sends the node depth of node 9 to party B. In step 516, party B predicts the abnormality of object x based on the node depth of node 9. In one embodiment, the abnormality of object x can be predicted by the average node depth of node 9 . Party B can calculate the average node depth E(h(x)) of object x after obtaining the node depth of object x in each tree according to the same method. The larger the average node depth is, the leaf assigned to object x The farther the node is from the root node, the smaller the abnormality of the object x is. Conversely, the smaller the average node depth is, the greater the abnormality of the object x is. In one embodiment, the abnormality of object x can be predicted by the abnormality score shown in formula (3):
Figure 02_image007
(3) Among them, c(n) is shown in the above formula (2). It can be verified that the value of s is between 0 and 1, the smaller s means the less abnormality of the object, and the larger s means the greater abnormality of the object. After obtaining the abnormality of the object, various business processes can be performed. For example, the object is a transaction. After the transaction is determined to be an abnormal transaction, manual verification of the transaction can be performed to prevent fraudulent incidents. Alternatively, the data and label values of the transaction can be used as training samples for training a multi-party supervised learning model, such as a multi-party supervised learning model for fraud prevention. Fig. 7 schematically shows the mutual optimization process between the multi-party unsupervised learning model and the multi-party supervised learning model according to the embodiment of the present specification. As shown in Figure 7, the training sample set can be obtained semi-automatically by combining the samples marked by humans (such as experts) and the samples marked by the isolated forest according to this specification, so as to train the multi-party supervised learning model; combining the manually determined features, and Based on the characteristics determined by the parameters of the multi-party supervised learning model, the sample characteristics for training the isolated forest model can be determined semi-automatically, thereby optimizing the training of the isolated forest model. Specifically, after determining the multiple features of the samples used to train the isolated forest model, multiple feature identifiers corresponding to the multiple features can be sent to party B, so that party B can conduct the training of the multi-party isolated forest model again. During training, the method shown in FIG. 3 or FIG. 4 is executed based on the multiple feature identifiers. At the same time, the abnormality of the object can be automatically predicted through the trained multi-party supervised learning model, such as risk identification based on the abnormality of the object to be predicted. Figure 8 shows a device 800 for constructing an isolated forest model based on federated learning according to an embodiment of this specification. The participants of the federated learning include a computing party and at least two data parties. The device is relative to the first tree in the model The first node in the computing side is configured in the equipment of the computing side, the at least two data sides include the first data side, and the corresponding relationship between m feature identifiers and each data side is pre-stored in the computing side equipment, and the m The feature identifiers are respectively predetermined identifiers of the m features, and the device includes: an acquisition unit 81 configured to acquire a plurality of sample identifiers corresponding to the first node, the plurality of sample identifiers corresponding to the plurality of samples, Each sample includes feature values of the m features; the selection unit 82 is configured to randomly select a feature identifier from the m feature identifiers; the sending unit 83 is configured to, when the selected feature identifier is the first In the case of a feature identifier, based on the locally stored correspondence between the first feature identifier and the first data source, the identifier of the first node, the plurality of sample identifiers, and the first feature identifier are sent to the second A data party; the first recording unit 84 is configured to record the corresponding relationship between the first node and the first data party; the receiving unit 85 is configured to receive from the first data party and the first data party The two sub-nodes of the node correspond to the information respectively, so as to construct the isolated forest model for business processing. In one embodiment, the first node is a root node, wherein the acquiring unit 81 is further configured to acquire N sample IDs, and randomly acquire n sample IDs from the N sample IDs, where N> n. In one embodiment, the two child nodes include a second node, the information corresponding to the second node includes that the second node is a leaf node, and the device further includes a second recording unit 86, It is configured to record the corresponding relationship between the second node identifier and the first data party. FIG. 9 shows an apparatus 900 for constructing an isolated forest model based on federated learning according to an embodiment of this specification. The participants of the federated learning include a computing party and at least two data parties, and the first tree of the model includes a first node. , the device is configured in the equipment of the first data party among the at least two data parties, the equipment of the first data party has the characteristic values of the first characteristics of each sample, and stores the first characteristic and A predetermined correspondence between the first feature identifiers, the apparatus includes: a receiving unit 91 configured to receive the first node identifier, multiple sample identifiers, and the first feature identifier from the computing device, wherein the The plurality of sample identifiers correspond to the plurality of samples respectively; the selection unit 92 is configured to, based on the correspondence between the locally stored first feature identifier and the first feature, randomly select from the feature values of the first features of the plurality of samples Select a feature value as the split value of the first node; the recording unit 93 is configured to record the correspondence between the first node and the first feature and the split value; the grouping unit 94 is configured to, based on the The split value groups the plurality of samples to construct two sub-nodes of the first node; the determining unit 95 is configured to determine whether the two sub-nodes are leaf nodes respectively; the sending unit 96 is configured to , based on the grouping and determination results, sending the information respectively corresponding to the two sub-nodes to the computing device, so as to build an isolated forest model for business processing. In one embodiment, the two sub-nodes include a second node, wherein the information corresponding to the second node includes that the second node is a leaf node, and the device further includes a calculation unit 97 configured to , calculating and storing the node depth of the second node. Fig. 10 shows an apparatus 1000 for predicting abnormality of an object through an isolated forest model based on federated learning according to an embodiment of this specification. The participants of the federated learning include a computing party and at least two data parties, and the devices of the computing party store There is a tree structure of the first tree in the model, and a data side corresponding to each node in the first tree, and the device is configured in the equipment of the computing side, including: a first acquisition unit 101 configured to acquire The object identifier of the first object; the first sending unit 102 is configured to send the object identifier to each data party; the first receiving unit 103 is configured to receive the data party from each data party device in its corresponding at least A result of at least one division of the first object by a non-leaf node; the first determining unit 104 is configured to, based on the tree structure of the first tree, and information from the at least two data-side devices in each non-leaf node The result of the division of the first object by the leaf node is to determine the first leaf node where the first object falls into; the second sending unit 105 is configured to, based on the respective data parties corresponding to the leaf nodes in the first tree , sending the identifier of the first leaf node to the first data party corresponding to the first leaf node; the second receiving unit 106 is configured to receive the ID of the first leaf node from the first data party Node depth; the predicting unit 107 is configured to predict the abnormality of the first object based on the node depth for business processing. In one embodiment, the apparatus further includes a second acquiring unit 108 configured to acquire training samples based on the prediction result of the first object, so as to train a supervised learning model. In one embodiment, the apparatus further includes a second determination unit 109 configured to determine the features included in the samples of the isolated forest model based on the parameters of the trained supervised learning model. Fig. 11 shows an apparatus 1100 for predicting abnormality of an object through an isolated forest model based on federated learning according to an embodiment of the present specification. The participants of the federated learning include a computing party and at least two data parties, and among the at least two data parties The device of the first data party records: the corresponding first feature and split value of the first node in the first tree, and the device of the first data party stores the first feature of each object The feature value, the device is configured in the equipment of the first data party, including: a first receiving unit 111 configured to receive the object identifier of the first object from the equipment of the computing party; an acquisition unit 112 configured to , based on the locally stored first feature of the first node, obtain locally the feature value of the first feature of the first object; the dividing unit 113 is configured to, based on the locally stored first feature of the first object The eigenvalue and the split value of the first node are used to divide the first object at the first node; the first sending unit 114 is configured to send the result of the division to the computing device, so that It is used to predict the abnormality of the first object for business processing. In one embodiment, the device of the first information party records the node depth of the second node in the first tree, and the device further includes a second receiving unit 115 configured to receive from the computing party The device of the device receives the identifier of the second node where the first object falls, and the second sending unit 116 is configured to send the node depth of the second node to the device of the calculation side. Another aspect of the present specification provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed in the computer, the computer is instructed to execute any one of the above-mentioned methods. Another aspect of the present specification provides a computing device, including a memory and a processor, wherein executable code is stored in the memory, and any one of the above methods is implemented when the processor executes the executable code. Through the scheme of constructing an isolated forest model based on federated learning and using the model for abnormality prediction according to the embodiment of this specification, the isolated forest model can be jointly constructed using the data of multiple data sources, and the data of multiple data sources and the model can be used To predict the abnormality of the object together with the data of each data party, and at the same time protect the data of each data party from being leaked to other parties. While expanding the amount of data used to build the isolated forest model and increasing the prediction accuracy of the model, each data party is protected data security. It should be understood that descriptions such as "first" and "second" in this document are only used to distinguish similar concepts for the sake of simplicity of description, and do not have other limiting functions. Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, refer to part of the description of the method embodiment. The foregoing describes specific embodiments of the present specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the examples and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments. Those of ordinary skill in the art should further realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two. For clarity To illustrate the interchangeability of hardware and software, the composition and steps of each example have been generally described in terms of functions in the above description. Whether these functions are executed in the form of hardware or software depends on the specific application and design constraints of the technical solution. Those of ordinary skill in the art may use different methods to implement the described functions for each particular application, but such implementation should not be regarded as exceeding the scope of the present invention. The steps of the method or algorithm described in connection with the embodiments disclosed herein may be implemented by hardware, a software module executed by a processor, or a combination of both. Software modules can reside in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, scratchpad, hard disk, removable disk , CD-ROM, or any other form of storage medium known in the art. The specific embodiments described above have further described the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above descriptions are only specific embodiments of the present invention and are not intended to limit the scope of the present invention. Protection scope, within the spirit and principles of the present invention, any modification, equivalent replacement, improvement, etc., shall be included in the protection scope of the present invention.

302-322:步驟 402-426:步驟 502-516:步驟 800:裝置 81:獲取單元 82:選擇單元 83:發送單元 84:第一記錄單元 85:接收單元 86:第二記錄單元 900:裝置 91:接收單元 92:選擇單元 93:記錄單元 94:分組單元 95:確定單元 96:發送單元 97:計算單元 1000:裝置 101:第一獲取單元 102:第一發送單元 103:第一接收單元 104:第一確定單元 105:第二發送單元 106:第二接收單元 107:預測單元 108:第二獲取單元 109:第二確定單元 1100:裝置 111:第一接收單元 112:獲取單元 113:劃分單元 114:第一發送單元 115:第二接收單元 116:第二發送單元302-322: Steps 402-426: Steps 502-516: steps 800: device 81: Get unit 82:Select unit 83: Sending unit 84: The first recording unit 85: Receiving unit 86: Second recording unit 900: device 91: Receiving unit 92:Select unit 93:Record unit 94: grouping unit 95: determine unit 96: Sending unit 97: Calculation unit 1000: device 101: The first acquisition unit 102: the first sending unit 103: The first receiving unit 104: The first determination unit 105: the second sending unit 106: the second receiving unit 107: Prediction unit 108: The second acquisition unit 109: The second determination unit 1100: device 111: the first receiving unit 112: Get unit 113: Division unit 114: the first sending unit 115: the second receiving unit 116: the second sending unit

通過結合附圖描述本說明書實施例,可以使得本說明書實施例更加清楚: [圖1]顯示根據本說明書實施例的構建和使用孤立森林模型的場景示意圖; [圖2]示意顯示了通過上述構建過程使得B方獲取的所構建的模型中的樹1的結構圖; [圖3]示意顯示根據本說明書實施例的基於聯邦學習構建圖2中節點1的方法時序圖; [圖4]示意顯示根據本說明書實施例的基於聯邦學習構建圖2中節點2的方法時序圖; [圖5]示意顯示根據本說明書實施例的基於聯邦學習通過孤立森林模型預測物件異常性的方法時序圖; [圖6]顯示了B方結合A方和C方的劃分結果確定物件x落入葉子節點的示意圖 [圖7]示意顯示了根據本說明書實施例的多方無監督學習模型與監督學習模型之間的相互最佳化過程; [圖8]顯示本說明書實施例的一種基於聯邦學習構建孤立森林模型的裝置800; [圖9]顯示根據本說明書實施例的一種基於聯邦學習構建孤立森林模型的裝置900; [圖10]顯示根據本說明書實施例的一種基於聯邦學習通過孤立森林模型預測物件異常性的裝置1000; [圖11]顯示根據本說明書實施例的一種基於聯邦學習通過孤立森林模型預測物件異常性的裝置1100。By describing the embodiments of this specification in conjunction with the accompanying drawings, the embodiments of this specification can be made clearer: [FIG. 1] A schematic diagram showing a scenario of constructing and using an isolated forest model according to an embodiment of the present specification; [Fig. 2] schematically shows the structure diagram of tree 1 in the constructed model obtained by Party B through the above construction process; [Fig. 3] schematically shows a sequence diagram of a method for constructing node 1 in Fig. 2 based on federated learning according to an embodiment of the specification; [Fig. 4] schematically shows a sequence diagram of a method for constructing node 2 in Fig. 2 based on federated learning according to an embodiment of the present specification; [Fig. 5] schematically shows a sequence diagram of a method for predicting abnormality of an object through an isolated forest model based on federated learning according to an embodiment of the present specification; [Figure 6] A schematic diagram showing that party B combines the division results of party A and party C to determine that object x falls into a leaf node [Fig. 7] schematically shows the mutual optimization process between the multi-party unsupervised learning model and the supervised learning model according to the embodiment of the specification; [FIG. 8] shows an apparatus 800 for constructing an isolated forest model based on federated learning according to an embodiment of this specification; [FIG. 9] shows an apparatus 900 for constructing an isolated forest model based on federated learning according to an embodiment of this specification; [FIG. 10] shows a device 1000 for predicting object abnormality through the isolated forest model based on federated learning according to an embodiment of this specification; [ FIG. 11 ] shows an apparatus 1100 for predicting abnormality of an object through an isolated forest model based on federated learning according to an embodiment of the present specification.

Claims (28)

一種基於聯邦學習構建孤立森林模型的方法,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述方法由計算方的設備執行,所述至少兩個資料方包括第一資料方,所述計算方的設備中預先儲存了m個特徵標識與各個資料方的對應關係,所述m個特徵標識分別為m個特徵各自的預定標識,所述方法包括:獲取與所述模型中的第一樹的第一節點對應的多個樣本標識,所述多個樣本標識與多個樣本分別對應,每個樣本包括所述m個特徵的特徵值;從所述m個特徵標識中隨機選擇一個特徵標識;在所述選擇的特徵標識為第一特徵標識的情況中,基於本地儲存第一特徵標識與第一資料方的對應關係,將所述第一節點的標識、所述多個樣本標識和所述第一特徵標識發送給所述第一資料方;記錄所述第一節點與所述第一資料方的對應關係;從所述第一資料方接收與所述第一節點的兩個子節點分別對應的資訊,從而在保護各資料方私有資料的同時構建孤立森林模型以用於進行業務處理;其中,所述計算方的設備不會知道所述至少兩個資料方中任一者的特徵。 A method for constructing an isolated forest model based on federated learning, the participants of the federated learning include a computing party and at least two data parties, the method is executed by a device of the computing party, and the at least two data parties include a first data party The corresponding relationship between m feature identifiers and each data source is pre-stored in the device of the computing party, and the m feature identifiers are respectively predetermined identifiers of the m features, and the method includes: obtaining the corresponding relationship with the model A plurality of sample identifiers corresponding to the first node of the first tree, the plurality of sample identifiers correspond to a plurality of samples respectively, each sample includes the feature values of the m features; random from the m feature identifiers Selecting a feature identifier; in the case that the selected feature identifier is the first feature identifier, based on the correspondence between the locally stored first feature identifier and the first data party, the identifier of the first node, the multiple Sending the sample ID and the first feature ID to the first data party; recording the corresponding relationship between the first node and the first data party; receiving the first data party from the first data party Information corresponding to the two sub-nodes, so as to protect the private data of each data party while constructing an isolated forest model for business processing; wherein, the equipment of the computing party will not know any of the at least two data parties characteristics of one. 根據請求項1所述的方法,所述第一節點為根節點,其中,獲取與第一節點對應的多個樣本標識包括,獲取N個樣本標識,從所述N個樣本標識中隨機獲取n 個樣本標識,其中N>n。 According to the method described in claim 1, the first node is a root node, wherein acquiring a plurality of sample IDs corresponding to the first node includes acquiring N sample IDs, and randomly acquiring n from the N sample IDs. Sample IDs, where N>n. 根據請求項1所述的方法,其中,所述兩個子節點中包括第二節點,與所述第二節點對應的資訊包括,所述第二節點為葉子節點,所述方法還包括,記錄所述第二節點標識與所述第一資料方的對應關係。 According to the method described in claim 1, wherein the two child nodes include a second node, the information corresponding to the second node includes that the second node is a leaf node, and the method further includes recording The correspondence between the second node identifier and the first data party. 根據請求項3所述的方法,其中,所述兩個子節點中包括第三節點,與所述第三節點對應的資訊包括,分到所述第三節點的u個樣本標識,其中,所述u個樣本標識為所述多個樣本標識中的一部分。 The method according to claim 3, wherein the two child nodes include a third node, and the information corresponding to the third node includes u sample identifiers assigned to the third node, wherein the The u sample IDs are part of the plurality of sample IDs. 根據請求項1所述的方法,其中,所述至少一個資料方為至少一個網路平臺,所述多個樣本與網路平臺中的多個物件分別對應。 The method according to claim 1, wherein the at least one data source is at least one network platform, and the multiple samples correspond to multiple objects on the network platform. 根據請求項5所述的方法,其中,所述物件為以下任一項:消費者、交易、商家、商品。 The method according to claim 5, wherein the object is any one of the following: consumers, transactions, merchants, and commodities. 一種基於聯邦學習構建孤立森林模型的方法,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述模型的第一樹中包括第一節點,所述方法由所述至少兩個資料方中的第一資料方的設備執行,所述第一資料方的設備中擁有各個樣本的第一特徵的特徵值,並且儲存有第一特徵與預先確定的第一特徵標識的對應關係,所述方法包括:從所述計算方的設備接收第一節點的標識、多個樣本標識和第一特徵標識,其中,所述多個樣本標識與多個樣本分別對應; 基於本地儲存第一特徵標識與第一特徵的對應關係,從所述多個樣本各自的第一特徵的特徵值中隨機選擇一個特徵值作為第一節點的分裂值;記錄所述第一節點與所述第一特徵和所述分裂值的對應關係;基於所述分裂值對所述多個樣本進行分組,以構建所述第一節點的兩個子節點;分別確定所述兩個子節點是否為葉子節點;基於所述分組和確定的結果,將與兩個子節點分別對應的資訊發送給所述計算方的設備,從而在保護各資料方私有資料的同時構建孤立森林模型以用於進行業務處理;其中,所述計算方的設備不會知道所述至少兩個資料方中任一者的特徵。 A method for constructing an isolated forest model based on federated learning, the participants of the federated learning include a computing party and at least two data parties, the first tree of the model includes a first node, and the method consists of the at least two Executed by the equipment of the first data party among the data parties, the equipment of the first data party has the feature values of the first features of each sample, and stores the correspondence between the first features and the predetermined first feature identifiers, The method includes: receiving an identifier of a first node, a plurality of sample identifiers, and a first feature identifier from the device of the calculating party, wherein the plurality of sample identifiers correspond to a plurality of samples respectively; Based on the corresponding relationship between the first feature identifier and the first feature stored locally, randomly select a feature value from the feature values of the first features of the plurality of samples as the split value of the first node; record the first node and The corresponding relationship between the first feature and the split value; grouping the plurality of samples based on the split value to construct two child nodes of the first node; respectively determining whether the two child nodes It is a leaf node; based on the grouping and the determined results, the information corresponding to the two sub-nodes is sent to the equipment of the computing party, so as to protect the private data of each data party while constructing an isolated forest model for performing Business processing; wherein, the computing party's equipment will not know the characteristics of any one of the at least two data parties. 根據請求項7所述的方法,其中,所述兩個子節點中包括第二節點,其中,與第二節點對應的資訊包括,所述第二節點為葉子節點,所述方法還包括,計算並儲存所述第二節點的節點深度。 According to the method described in claim 7, wherein the two child nodes include a second node, wherein the information corresponding to the second node includes that the second node is a leaf node, the method further includes calculating And store the node depth of the second node. 一種基於聯邦學習通過孤立森林模型預測物件異常性的方法,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述計算方的設備中儲存有所述模型中第一樹的樹結構、所述第一樹中各個節點對應的資料方,所述方法由所述計算方的設備執行,包括:獲取第一物件的物件標識;將所述物件標識發送給各個資料方; 從各個資料方設備接收該資料方在其對應的至少一個非葉子節點分別進行的對所述第一物件的至少一次劃分結果;基於第一樹的樹結構、以及來自所述至少兩個資料方設備的在各個非葉子節點對所述第一物件的劃分結果,確定所述第一物件落入的第一葉子節點;基於所述第一樹中的葉子節點各自對應的資料方,將所述第一葉子節點的標識發送給與所述第一葉子節點對應的第一資料方;從所述第一資料方接收所述第一葉子節點的節點深度;基於所述節點深度預測第一物件的異常性,以用於進行業務處理;其中,所述計算方的設備不會知道所述至少兩個資料方中任一者的特徵。 A method for predicting the abnormality of an object through an isolated forest model based on federated learning, the participants of the federated learning include a computing party and at least two data parties, and the device of the computing party stores the tree of the first tree in the model Structure, the data party corresponding to each node in the first tree, the method is executed by the device of the computing party, including: obtaining the object identifier of the first object; sending the object identifier to each data party; Receive at least one division result of the first object performed by the data party on at least one corresponding non-leaf node from each data party device; based on the tree structure of the first tree, and from the at least two data parties Determining the first leaf node where the first object falls into based on the division result of the first object at each non-leaf node of the device; based on the data parties corresponding to the leaf nodes in the first tree, the The identifier of the first leaf node is sent to the first data party corresponding to the first leaf node; the node depth of the first leaf node is received from the first data party; and the node depth of the first object is predicted based on the node depth Abnormality for business processing; wherein, the device of the computing party will not know the characteristics of any one of the at least two data parties. 根據請求項9所述的方法,還包括,基於對所述第一物件的預測結果,獲取訓練樣本,以用於訓練監督學習模型。 The method according to claim 9, further comprising, based on the prediction result of the first object, obtaining training samples for training a supervised learning model. 根據請求項10所述的方法,還包括,基於所述訓練好的監督學習模型的參數,最佳化所述孤立森林模型的樣本特徵。 According to the method described in claim 10, further comprising, based on the parameters of the trained supervised learning model, optimizing the sample characteristics of the isolated forest model. 一種基於聯邦學習通過孤立森林模型預測物件異常性的方法,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述至少兩個資料方中的第一資料方 的設備中記錄有:其對應的第一樹中第一節點的第一特徵和分裂值,並且所述第一資料方的設備中儲存有各個物件的第一特徵的特徵值,所述方法由所述第一資料方的設備執行,包括:從所述計算方的設備接收第一物件的物件標識;基於本地儲存的第一節點的第一特徵,從本地獲取所述第一物件的第一特徵的特徵值;基於本地儲存的所述第一物件的第一特徵的特徵值和所述第一節點的分裂值,在第一節點對所述第一物件進行劃分;將所述劃分的結果發送給所述計算方的設備,從而用於預測所述第一物件的異常性以用於進行業務處理;其中,所述計算方的設備不會知道所述至少兩個資料方中任一者的特徵。 A method for predicting the abnormality of an object through an isolated forest model based on federated learning, the participants of the federated learning include a computing party and at least two data parties, the first data party in the at least two data parties Recorded in the device: the first characteristic and split value of the first node in the corresponding first tree, and the first data party’s device stores the characteristic value of the first characteristic of each object, the method consists of The execution of the device of the first data party includes: receiving the object identifier of the first object from the device of the computing party; based on the first characteristic of the first node stored locally, obtaining the first object identifier of the first object from the local The feature value of the feature; based on the locally stored feature value of the first feature of the first object and the split value of the first node, the first object is divided at the first node; the result of the division sent to the computing party's equipment, so as to predict the abnormality of the first object for business processing; wherein, the computing party's equipment will not know any of the at least two data parties Characteristics. 根據請求項12所述的方法,其中,所述第一資料方的設備中記錄有所述第一樹中第二節點的節點深度,所述方法還包括,從所述計算方的設備接收所述第一物件所落入的第二節點的標識,將所述第二節點的節點深度發送給所述計算方的設備。 The method according to claim 12, wherein the device of the first material party records the node depth of the second node in the first tree, and the method further includes receiving the The identifier of the second node where the first object falls, and the node depth of the second node is sent to the computing device. 一種基於聯邦學習構建孤立森林模型的裝置,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述裝置配置於計算方的設備中,所述至少兩個資料方包括第一資料方,所述計算方的設備中預先儲存了m個特徵標識與各個資料方的對應關係,所述m個特徵標識分 別為m個特徵各自的預定標識,其中,所述計算方的設備不會知道所述至少兩個資料方中任一者的特徵,所述裝置包括:獲取單元,配置為,獲取與所述模型中的第一樹的第一節點對應的多個樣本標識,所述多個樣本標識與多個樣本分別對應,每個樣本包括所述m個特徵的特徵值;選擇單元,配置為,從所述m個特徵標識中隨機選擇一個特徵標識;發送單元,配置為,在所述選擇的特徵標識為第一特徵標識的情況中,基於本地儲存第一特徵標識與第一資料方的對應關係,將所述第一節點的標識、所述多個樣本標識和所述第一特徵標識發送給所述第一資料方;第一記錄單元,配置為,記錄所述第一節點與所述第一資料方的對應關係;接收單元,配置為,從所述第一資料方接收與所述第一節點的兩個子節點分別對應的資訊,從而構建孤立森林模型以用於進行業務處理。 A device for constructing an isolated forest model based on federated learning, the participants of the federated learning include a computing party and at least two data parties, the device is configured in the equipment of the computing party, and the at least two data parties include a first data party, the computing party’s equipment pre-stores the corresponding relationship between m feature identifiers and each data party, and the m feature identifiers are divided into are predetermined identifiers of each of the m features, wherein the device of the calculation party will not know the features of any one of the at least two data parties, the device includes: an acquisition unit configured to acquire the same A plurality of sample identifiers corresponding to the first node of the first tree in the model, the plurality of sample identifiers correspond to a plurality of samples respectively, and each sample includes the feature values of the m features; the selection unit is configured to, from Randomly select one of the m signatures; the sending unit is configured to, in the case that the selected signature is the first signature, based on the locally stored correspondence between the first signature and the first data party , sending the identifier of the first node, the plurality of sample identifiers, and the first feature identifier to the first data party; the first recording unit is configured to record the first node and the second A corresponding relationship of a data party; the receiving unit is configured to receive information corresponding to two sub-nodes of the first node from the first data party, so as to construct an isolated forest model for business processing. 根據請求項14所述的裝置,所述第一節點為根節點,其中,所述獲取單元還配置為,獲取N個樣本標識,從所述N個樣本標識中隨機獲取n個樣本標識,其中N>n。 According to the device described in claim 14, the first node is a root node, wherein the acquiring unit is further configured to acquire N sample IDs, and randomly acquire n sample IDs from the N sample IDs, wherein N>n. 根據請求項14所述的裝置,其中,所述兩個子節點中包括第二節點,與所述第二節點對應的資訊包括,所述第二節點為葉子節點,所述裝置還包括,第二 記錄單元,配置為,記錄所述第二節點標識與所述第一資料方的對應關係。 The device according to claim 14, wherein the two child nodes include a second node, the information corresponding to the second node includes that the second node is a leaf node, and the device further includes, two The recording unit is configured to record the correspondence between the second node identifier and the first data party. 根據請求項16所述的裝置,其中,所述兩個子節點中包括第三節點,與所述第三節點對應的資訊包括,分到所述第三節點的u個樣本標識,其中,所述u個樣本標識為所述多個樣本標識中的一部分。 The device according to claim 16, wherein the two child nodes include a third node, and the information corresponding to the third node includes u sample identifiers assigned to the third node, wherein the The u sample IDs are part of the plurality of sample IDs. 根據請求項14所述的裝置,其中,所述至少一個資料方為至少一個網路平臺,所述多個樣本與網路平臺中的多個物件分別對應。 The device according to claim 14, wherein the at least one data source is at least one network platform, and the multiple samples correspond to multiple objects on the network platform. 根據請求項18所述的裝置,其中,所述物件為以下任一項:消費者、交易、商家、商品。 The device according to claim 18, wherein the object is any one of the following: consumers, transactions, merchants, and commodities. 一種基於聯邦學習構建孤立森林模型的裝置,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述模型的第一樹中包括第一節點,所述裝置配置在所述至少兩個資料方中的第一資料方的設備中,所述第一資料方的設備中擁有各個樣本的第一特徵的特徵值,並且儲存有第一特徵與預先確定的第一特徵標識的對應關係,其中,所述計算方的設備不會知道所述至少兩個資料方中任一者的特徵,所述裝置包括:接收單元,配置為,從所述計算方的設備接收第一節點的標識、多個樣本標識和第一特徵標識,其中,所述多個樣本標識與多個樣本分別對應;選擇單元,配置為,基於本地儲存第一特徵標識與第一特徵的對應關係,從所述多個樣本各自的第一特徵的特 徵值中隨機選擇一個特徵值作為第一節點的分裂值;記錄單元,配置為,記錄所述第一節點與所述第一特徵和所述分裂值的對應關係;分組單元,配置為,基於所述分裂值對所述多個樣本進行分組,以構建所述第一節點的兩個子節點;確定單元,配置為,分別確定所述兩個子節點是否為葉子節點;發送單元,配置為,基於所述分組和確定的結果,將與兩個子節點分別對應的資訊發送給所述計算方的設備,從而構建孤立森林模型以用於進行業務處理。 A device for constructing an isolated forest model based on federated learning, the participants of the federated learning include a computing party and at least two data parties, the first tree of the model includes a first node, and the device is configured in the at least two In the equipment of the first data party among the three data parties, the equipment of the first data party has the feature values of the first features of each sample, and stores the correspondence between the first features and the predetermined first feature identifiers , wherein the computing party's device will not know the characteristics of any one of the at least two data parties, the apparatus includes: a receiving unit configured to receive the first node's identification from the computing party's device , a plurality of sample identifiers and a first feature identifier, wherein the plurality of sample identifiers correspond to a plurality of samples respectively; the selection unit is configured to, based on locally stored correspondence between the first feature identifier and the first feature, select from the The characteristics of the first features of each of the multiple samples In the eigenvalues, a eigenvalue is randomly selected as the split value of the first node; the recording unit is configured to record the correspondence between the first node and the first feature and the split value; the grouping unit is configured to, based on The split value groups the plurality of samples to construct two child nodes of the first node; the determining unit is configured to determine whether the two child nodes are leaf nodes respectively; the sending unit is configured to , based on the grouping and determination results, sending the information respectively corresponding to the two sub-nodes to the computing device, so as to build an isolated forest model for business processing. 根據請求項20所述的裝置,其中,所述兩個子節點中包括第二節點,其中,與第二節點對應的資訊包括,所述第二節點為葉子節點,所述裝置還包括,計算單元,配置為,計算並儲存所述第二節點的節點深度。 The device according to claim 20, wherein the two child nodes include a second node, wherein the information corresponding to the second node includes that the second node is a leaf node, and the device further includes calculating The unit is configured to calculate and store the node depth of the second node. 一種基於聯邦學習通過孤立森林模型預測物件異常性的裝置,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述計算方的設備中儲存有所述模型中第一樹的樹結構、所述第一樹中各個節點對應的資料方,其中,所述計算方的設備不會知道所述至少兩個資料方中任一者的特徵,所述裝置配置於所述計算方的設備中,包括:第一獲取單元,配置為,獲取第一物件的物件標識;第一發送單元,配置為,將所述物件標識發送給各個資料方; 第一接收單元,配置為,從各個資料方設備接收該資料方在其對應的至少一個非葉子節點分別進行的對所述第一物件的至少一次劃分結果;第一確定單元,配置為,基於第一樹的樹結構、以及來自所述至少兩個資料方設備的在各個非葉子節點對所述第一物件的劃分結果,確定所述第一物件落入的第一葉子節點;第二發送單元,配置為,基於所述第一樹中的葉子節點各自對應的資料方,將所述第一葉子節點的標識發送給與所述第一葉子節點對應的第一資料方;第二接收單元,配置為,從所述第一資料方接收所述第一葉子節點的節點深度;預測單元,配置為,基於所述節點深度預測第一物件的異常性,以用於進行業務處理。 A device for predicting the abnormality of an object through an isolated forest model based on federated learning, the participants of the federated learning include a computing party and at least two data parties, and the device of the computing party stores the tree of the first tree in the model Structure, the data party corresponding to each node in the first tree, wherein the equipment of the computing party will not know the characteristics of any one of the at least two data parties, and the device is configured on the computing party The device includes: a first acquiring unit configured to acquire the object identifier of the first object; a first sending unit configured to send the object identifier to each data party; The first receiving unit is configured to receive at least one division result of the first object performed by the data party on at least one corresponding non-leaf node from each data source device; the first determining unit is configured to, based on The tree structure of the first tree, and the division results of the first object at each non-leaf node from the at least two data-side devices determine the first leaf node where the first object falls into; the second sending A unit configured to send the identifier of the first leaf node to the first data party corresponding to the first leaf node based on the data parties corresponding to the leaf nodes in the first tree; the second receiving unit , configured to receive the node depth of the first leaf node from the first data source; the predicting unit is configured to predict the abnormality of the first object based on the node depth for business processing. 根據請求項22所述的裝置,還包括,第二獲取單元,配置為,基於對所述第一物件的預測結果,獲取訓練樣本,以用於訓練監督學習模型。 The device according to claim 22, further comprising a second acquiring unit configured to acquire training samples based on the prediction result of the first object, so as to train the supervised learning model. 根據請求項23所述的裝置,還包括,第二確定單元,配置為,基於所述訓練好的監督學習模型的參數,確定所述孤立森林模型的樣本包括的特徵。 The device according to claim 23, further comprising a second determination unit configured to determine the features included in the samples of the isolated forest model based on the parameters of the trained supervised learning model. 一種基於聯邦學習通過孤立森林模型預測物件異常性的裝置,所述聯邦學習的參與方包括計算方和至少兩個資料方,所述至少兩個資料方中的第一資料方的設備中記錄有:其對應的第一樹中第一節點的第一特徵 和分裂值,並且所述第一資料方的設備中儲存有各個物件的第一特徵的特徵值,其中,所述計算方的設備不會知道所述至少兩個資料方中任一者的特徵,所述裝置配置於所述第一資料方的設備中,包括:第一接收單元,配置為,從所述計算方的設備接收第一物件的物件標識;獲取單元,配置為,基於本地儲存的第一節點的第一特徵,從本地獲取所述第一物件的第一特徵的特徵值;劃分單元,配置為,基於本地儲存的所述第一物件的第一特徵的特徵值和所述第一節點的分裂值,在第一節點對所述第一物件進行劃分;第一發送單元,配置為,將所述劃分的結果發送給所述計算方的設備,從而用於預測所述第一物件的異常性以用於進行業務處理。 A device for predicting the abnormality of an object through an isolated forest model based on federated learning, the participants of the federated learning include a computing party and at least two data parties, and the equipment of the first data party in the at least two data parties is recorded with : the corresponding first feature of the first node in the first tree and the split value, and the first feature value of each object is stored in the device of the first data party, wherein, the device of the calculation party will not know the feature of any one of the at least two data parties , the device is configured in the equipment of the first data party, including: a first receiving unit configured to receive the object identifier of the first object from the equipment of the computing party; an obtaining unit configured to store based on the local The first characteristic of the first node of the first object, the characteristic value of the first characteristic of the first object is obtained locally; the division unit is configured to, based on the characteristic value of the first characteristic of the first object stored locally and the The split value of the first node is used to divide the first object at the first node; the first sending unit is configured to send the result of the division to the computing device, so as to predict the first object Anomaly of an object for business processing. 根據請求項25所述的裝置,其中,所述第一資料方的設備中記錄有所述第一樹中第二節點的節點深度,所述裝置還包括,第二接收單元,配置為,從所述計算方的設備接收所述第一物件所落入的第二節點的標識,以及第二發送單元,配置為,將所述第二節點的節點深度發送給所述計算方的設備。 According to the device described in claim 25, wherein the device of the first material party records the node depth of the second node in the first tree, the device further includes a second receiving unit configured to, from The computing side's device receives the identifier of the second node where the first object falls, and the second sending unit is configured to send the node depth of the second node to the computing side's device. 一種電腦可讀儲存媒體,其上儲存有電腦程式,當所述電腦程式在電腦中執行時,令電腦執行如請求項1至13中任一項的所述的方法。 A computer-readable storage medium, on which a computer program is stored, and when the computer program is executed in the computer, the computer is made to execute the method described in any one of claims 1 to 13. 一種計算設備,包括記憶體和處理器, 其中,所述記憶體中儲存有可執行碼,所述處理器執行所述可執行碼時,實現如請求項1至13中任一項所述的方法。 A computing device, including memory and a processor, Wherein, executable codes are stored in the memory, and when the processor executes the executable codes, the method according to any one of claims 1 to 13 is implemented.
TW109115727A 2019-12-12 2020-05-12 A method and device for constructing and predicting an isolated forest model based on federated learning TWI780433B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911288850.5 2019-12-12
CN201911288850.5A CN110991552B (en) 2019-12-12 2019-12-12 Isolated forest model construction and prediction method and device based on federal learning

Publications (2)

Publication Number Publication Date
TW202123050A TW202123050A (en) 2021-06-16
TWI780433B true TWI780433B (en) 2022-10-11

Family

ID=70093746

Family Applications (1)

Application Number Title Priority Date Filing Date
TW109115727A TWI780433B (en) 2019-12-12 2020-05-12 A method and device for constructing and predicting an isolated forest model based on federated learning

Country Status (3)

Country Link
CN (2) CN110991552B (en)
TW (1) TWI780433B (en)
WO (1) WO2021114821A1 (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991552B (en) * 2019-12-12 2021-03-12 支付宝(杭州)信息技术有限公司 Isolated forest model construction and prediction method and device based on federal learning
CN111695675B (en) * 2020-05-14 2024-05-07 平安科技(深圳)有限公司 Federal learning model training method and related equipment
CN112231768B (en) * 2020-10-27 2021-06-18 腾讯科技(深圳)有限公司 Data processing method and device, computer equipment and storage medium
CN112529102B (en) * 2020-12-24 2024-03-12 深圳前海微众银行股份有限公司 Feature expansion method, device, medium and computer program product
CN113807544B (en) * 2020-12-31 2023-09-26 京东科技控股股份有限公司 Training method and device of federal learning model and electronic equipment
CN112862057B (en) * 2021-04-07 2023-11-03 京东科技控股股份有限公司 Modeling method, modeling device, electronic equipment and readable medium
CN113420072B (en) * 2021-06-24 2024-04-05 深圳前海微众银行股份有限公司 Data processing method, device, equipment and storage medium
CN113537361B (en) * 2021-07-20 2024-04-02 同盾科技有限公司 Cross-sample feature selection method in federal learning system and federal learning system
CN113554182B (en) * 2021-07-27 2023-09-19 西安电子科技大学 Detection method and system for Bayesian court node in transverse federal learning system
CN113723477B (en) * 2021-08-16 2024-04-30 同盾科技有限公司 Cross-feature federal abnormal data detection method based on isolated forest
CN113506163B (en) * 2021-09-07 2021-11-23 百融云创科技股份有限公司 Isolated forest training and predicting method and system based on longitudinal federation
CN114611616B (en) * 2022-03-16 2023-02-07 吕少岚 Unmanned aerial vehicle intelligent fault detection method and system based on integrated isolated forest
CN114785810B (en) * 2022-03-31 2023-05-16 海南师范大学 Tree-like broadcast data synchronization method suitable for federal learning
TWI812293B (en) * 2022-06-20 2023-08-11 英業達股份有限公司 Fedrated learning system and method using data digest
CN114996749B (en) * 2022-08-05 2022-11-25 蓝象智联(杭州)科技有限公司 Feature filtering method for federal learning
TWI807961B (en) * 2022-08-11 2023-07-01 財團法人亞洲大學 Multi-layer federated learning system and methodology based on distributed clustering
CN115907029B (en) * 2022-11-08 2023-07-21 北京交通大学 Method and system for defending against federal learning poisoning attack
TWI829558B (en) * 2023-03-17 2024-01-11 英業達股份有限公司 Fedrated learning system and method protecting data digest
CN117077067B (en) * 2023-10-18 2023-12-22 北京亚康万玮信息技术股份有限公司 Information system automatic deployment planning method based on intelligent matching

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684311A (en) * 2018-12-06 2019-04-26 中科恒运股份有限公司 Abnormal deviation data examination method and device
TW201917618A (en) * 2017-10-24 2019-05-01 香港商阿里巴巴集團服務有限公司 Model training method and device and method and device of detecting URL
US10430727B1 (en) * 2019-04-03 2019-10-01 NFL Enterprises LLC Systems and methods for privacy-preserving generation of models for estimating consumer behavior
CN110309587A (en) * 2019-06-28 2019-10-08 京东城市(北京)数字科技有限公司 Decision model construction method, decision-making technique and decision model
CN110363305A (en) * 2019-07-17 2019-10-22 深圳前海微众银行股份有限公司 Federal learning method, system, terminal device and storage medium
CN110414555A (en) * 2019-06-20 2019-11-05 阿里巴巴集团控股有限公司 Detect the method and device of exceptional sample

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10346981B2 (en) * 2016-11-04 2019-07-09 Eric Kenneth Anderson System and method for non-invasive tissue characterization and classification
JP6782679B2 (en) * 2016-12-06 2020-11-11 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America Information processing equipment, information processing methods and programs
US10638411B2 (en) * 2017-10-27 2020-04-28 LGS Innovations LLC Rogue base station router detection with machine learning algorithms
US11494667B2 (en) * 2018-01-18 2022-11-08 Google Llc Systems and methods for improved adversarial training of machine-learned models
JP6879239B2 (en) * 2018-03-14 2021-06-02 オムロン株式会社 Anomaly detection system, support device and model generation method
US10685159B2 (en) * 2018-06-27 2020-06-16 Intel Corporation Analog functional safety with anomaly detection
CN109002861B (en) * 2018-08-10 2021-11-09 深圳前海微众银行股份有限公司 Federal modeling method, device and storage medium
CN109299728B (en) * 2018-08-10 2023-06-27 深圳前海微众银行股份有限公司 Sample joint prediction method, system and medium based on construction of gradient tree model
CN109859029A (en) * 2019-01-04 2019-06-07 深圳壹账通智能科技有限公司 Abnormal application detection method, device, computer equipment and storage medium
CN109902721A (en) * 2019-01-28 2019-06-18 平安科技(深圳)有限公司 Outlier detection model verification method, device, computer equipment and storage medium
CN110084377B (en) * 2019-04-30 2023-09-29 京东城市(南京)科技有限公司 Method and device for constructing decision tree
CN110191110B (en) * 2019-05-20 2020-05-19 山西大学 Social network abnormal account detection method and system based on network representation learning
CN110517154A (en) * 2019-07-23 2019-11-29 平安科技(深圳)有限公司 Data model training method, system and computer equipment
CN110991552B (en) * 2019-12-12 2021-03-12 支付宝(杭州)信息技术有限公司 Isolated forest model construction and prediction method and device based on federal learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201917618A (en) * 2017-10-24 2019-05-01 香港商阿里巴巴集團服務有限公司 Model training method and device and method and device of detecting URL
CN109684311A (en) * 2018-12-06 2019-04-26 中科恒运股份有限公司 Abnormal deviation data examination method and device
US10430727B1 (en) * 2019-04-03 2019-10-01 NFL Enterprises LLC Systems and methods for privacy-preserving generation of models for estimating consumer behavior
CN110414555A (en) * 2019-06-20 2019-11-05 阿里巴巴集团控股有限公司 Detect the method and device of exceptional sample
CN110309587A (en) * 2019-06-28 2019-10-08 京东城市(北京)数字科技有限公司 Decision model construction method, decision-making technique and decision model
CN110363305A (en) * 2019-07-17 2019-10-22 深圳前海微众银行股份有限公司 Federal learning method, system, terminal device and storage medium

Also Published As

Publication number Publication date
CN113065610A (en) 2021-07-02
WO2021114821A1 (en) 2021-06-17
CN110991552A (en) 2020-04-10
TW202123050A (en) 2021-06-16
CN110991552B (en) 2021-03-12
CN113065610B (en) 2022-05-17

Similar Documents

Publication Publication Date Title
TWI780433B (en) A method and device for constructing and predicting an isolated forest model based on federated learning
Garrard et al. Blockchain for trustworthy provenances: A case study in the Australian aquaculture industry
CN106779975B (en) Tamper-proof method of reputation information based on block chain
Dou et al. Robust spammer detection by nash reinforcement learning
WO2019100084A1 (en) Decentralized autonomous evaluation engine for intellectual property assets
CN109074562A (en) Block chain-based combined data transmission control method and system
CN108932348A (en) Merging treatment method, apparatus, block chain node and the storage medium of block chain
CN111401700B (en) Data analysis method, device, computer system and readable storage medium
Da Rocha et al. Identifying bank frauds using CRISP-DM and decision trees
CN109255056A (en) Data referencing processing method, device, equipment and the storage medium of block chain
Vashistha et al. eChain: A blockchain-enabled ecosystem for electronic device authenticity verification
CN114492605A (en) Federal learning feature selection method, device and system and electronic equipment
CN112200382A (en) Training method and device of risk prediction model
CN111178912A (en) Grain quality tracing method and system based on block chain
CN114118681A (en) Expert selection method and device
JP7257172B2 (en) COMMUNICATION PROGRAM, COMMUNICATION DEVICE, AND COMMUNICATION METHOD
Lopez-Rojas et al. Using the RetSim simulator for fraud detection research
Gopalakrishnan et al. A conceptual framework for using videogrammetry in blockchain platforms for food supply chain traceability
CN111881147B (en) Processing method and device of computing task, storage medium and processor
KR102279258B1 (en) Blockchain based project evaluation method and system
Lee et al. Preserving liberty and fairness in combinatorial double auction games based on blockchain
CN115455457B (en) Chain data management method, system and storage medium based on intelligent big data
JP7046970B2 (en) Systems and methods for identifying leaked data and assigning guilty to suspicious leakers
US9235616B2 (en) Systems and methods for partial workflow matching
CN114417394A (en) Block chain-based data storage method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent