TW202022716A - Clustering result interpretation method and device - Google Patents

Clustering result interpretation method and device Download PDF

Info

Publication number
TW202022716A
TW202022716A TW108133385A TW108133385A TW202022716A TW 202022716 A TW202022716 A TW 202022716A TW 108133385 A TW108133385 A TW 108133385A TW 108133385 A TW108133385 A TW 108133385A TW 202022716 A TW202022716 A TW 202022716A
Authority
TW
Taiwan
Prior art keywords
feature
category
embedded object
model
interpretation
Prior art date
Application number
TW108133385A
Other languages
Chinese (zh)
Other versions
TWI726420B (en
Inventor
王力
向彪
周俊
Original Assignee
香港商阿里巴巴集團服務有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 香港商阿里巴巴集團服務有限公司 filed Critical 香港商阿里巴巴集團服務有限公司
Publication of TW202022716A publication Critical patent/TW202022716A/en
Application granted granted Critical
Publication of TWI726420B publication Critical patent/TWI726420B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A clustering result interpretation method and device. The method comprises: embedding embedded objects by using an embedding algorithm to obtain an embedding result of each embedded object; clustering the embedding results by using a clustering model to obtain a category label of each embedded object; training an interpretation model by using the characteristics and category labels of the embedded objects; extracting multiple embedded objects from each category; determining interpretation characteristics of the embedded objects belonging to the described category according to the characteristic of each extracted embedded object and the trained interpretation model; and summarizing an interpretation characteristic of each extracted embedded object in the same category to obtain interpretation characteristics of the clustering model in the described category.

Description

叢集結果的解釋方法和裝置Interpretation method and device of cluster result

本說明書關於機器學習技術領域,尤其關於一種叢集結果的解釋方法和裝置。This specification relates to the field of machine learning technology, especially to a method and device for interpreting cluster results.

嵌入(Embedding)在數學上表示一種映射,可將一個空間映射到另一個空間,並保留基本屬性。利用嵌入算法可將一些複雜的難以表達的特徵轉換成易計算的形式,例如:向量、矩陣等,便於機器學習模型進行處理。然而,嵌入算法並不具有解釋性,這就導致對嵌入結果進行叢集的叢集模型不具有解釋性,無法滿足業務場景的需求。Embedding (Embedding) mathematically represents a kind of mapping, which can map one space to another space while retaining basic properties. The embedding algorithm can convert some complex features that are difficult to express into easy-to-compute forms, such as vectors, matrices, etc., which are convenient for machine learning models to process. However, the embedding algorithm is not explanatory, which results in the clustering model that clusters the embedding results is not explanatory and cannot meet the needs of business scenarios.

有鑑於此,本說明書提供一種叢集結果的解釋方法和裝置。 具體地,本說明書是透過如下技術方案實現的: 一種叢集結果的解釋方法,包括: 採用嵌入算法對嵌入對象進行嵌入處理,得到每個嵌入對象的嵌入結果; 採用叢集模型對所述嵌入結果進行叢集,得到每個嵌入對象的類別標籤; 採用所述嵌入對象的特徵和類別標籤對解釋模型進行訓練; 針對每個類別,從所述類別中提取若干嵌入對象; 基於提取的每個嵌入對象的特徵和已訓練的解釋模型確定所述嵌入對象屬於所述類別的解釋特徵; 匯總同一類別下提取的每個嵌入對象的解釋特徵,得到所述叢集模型在該類別下的解釋特徵。 一種風險團夥識別模型的識別結果解釋方法,包括: 採用嵌入算法對使用者節點進行嵌入處理,得到每個使用者節點的嵌入結果; 採用風險團夥識別模型對所述嵌入結果進行識別,得到每個使用者節點所屬的風險團夥標籤; 採用所述使用者節點的特徵和所述風險團夥標籤對解釋模型進行訓練; 針對每個風險團夥,從所述風險團夥中提取若干使用者節點; 基於提取的每個使用者節點的特徵和已訓練的解釋模型確定所述使用者節點屬於所述風險團夥的解釋特徵; 匯總同一風險團夥中提取的每個使用者節點的解釋特徵,得到所述風險團夥識別模型對應應該風險團夥的解釋特徵。 一種文本叢集模型的叢集結果解釋方法,包括: 採用嵌入算法對待叢集文本進行嵌入處理,得到每個文本的嵌入結果; 採用文本叢集模型對所述嵌入結果進行叢集,得到每個文本的類別標籤; 採用所述文本的特徵和所述類別標籤對解釋模型進行訓練; 針對每個類別,從所述類別中提取若干文本; 基於提取的每個文本的特徵和已訓練的解釋模型確定所述文本屬於所述類別的解釋特徵; 匯總同一類別中提取的每個文本的解釋特徵,得到所述文本叢集模型在該類別下的解釋特徵。 一種叢集結果的解釋裝置,包括: 嵌入處理單元,採用嵌入算法對嵌入對象進行嵌入處理,得到每個嵌入對象的嵌入結果; 對象叢集單元,採用叢集模型對所述嵌入結果進行叢集,得到每個嵌入對象的類別標籤; 模型訓練單元,採用所述嵌入對象的特徵和類別標籤對解釋模型進行訓練; 對象提取單元,針對每個類別,從所述類別中提取若干嵌入對象; 特徵確定單元,基於提取的每個嵌入對象的特徵和已訓練的解釋模型確定所述嵌入對象屬於所述類別的解釋特徵; 特徵匯總單元,匯總同一類別下提取的每個嵌入對象的解釋特徵,得到所述叢集模型在該類別下的解釋特徵。 一種叢集結果的解釋裝置,包括: 處理器; 用於儲存機器可執行指令的記憶體; 其中,透過讀取並執行所述記憶體儲存的與叢集結果的解釋邏輯對應的機器可執行指令,所述處理器被促使: 採用嵌入算法對嵌入對象進行嵌入處理,得到每個嵌入對象的嵌入結果; 採用叢集模型對所述嵌入結果進行叢集,得到每個嵌入對象的類別標籤; 採用所述嵌入對象的特徵和類別標籤對解釋模型進行訓練; 針對每個類別,從所述類別中提取若干嵌入對象; 基於提取的每個嵌入對象的特徵和已訓練的解釋模型確定所述嵌入對象屬於所述類別的解釋特徵; 匯總同一類別下提取的每個嵌入對象的解釋特徵,得到所述叢集模型在該類別下的解釋特徵。 由以上描述可以看出,本說明書可採用嵌入對象的特徵和類別標籤對具有解釋性的解釋模型進行訓練,並可基於已訓練的解釋模型確定每個類別下各嵌入對象類別劃分的解釋特徵,然後可匯總同一分類中嵌入對象的解釋特徵,得到叢集模型在該類別下的解釋特徵,實現對叢集結果的解釋,從而為開發者修復叢集模型的偏差提供依據,有助於提升模型的泛化能力和性能,並且有助於規避法律風險和道德風險。In view of this, this specification provides a method and device for interpreting cluster results. Specifically, this specification is implemented through the following technical solutions: A method of interpreting cluster results, including: Use the embedding algorithm to embed the embedded objects to obtain the embedding result of each embedded object; Clustering the embedding results using a clustering model to obtain the category label of each embedding object; Training the interpretation model by using the features and category labels of the embedded object; For each category, extract several embedded objects from the category; Based on the extracted features of each embedded object and the trained interpretation model, determining that the embedded object belongs to the interpretation feature of the category; Summarize the explanatory characteristics of each embedded object extracted under the same category, and obtain the explanatory characteristics of the cluster model under the category. A method for interpreting the identification results of a risk group identification model, including: Use the embedding algorithm to embed the user nodes to obtain the embedding result of each user node; Use the risk group identification model to identify the embedding result, and obtain the risk group label to which each user node belongs; Training the interpretation model by using the characteristics of the user node and the label of the risk group; For each risk group, extract several user nodes from the risk group; Based on the extracted features of each user node and the trained interpretation model, it is determined that the user node belongs to the interpretation feature of the risk group; The explanatory characteristics of each user node extracted from the same risk group are summarized, and the explanatory characteristics of the risk group identification model corresponding to the risk group are obtained. A method for interpreting cluster results of a text cluster model, including: Use the embedding algorithm to embed the text in the cluster to obtain the embedding result of each text; Clustering the embedding results using a text clustering model to obtain the category label of each text; Training the interpretation model by using the feature of the text and the category label; For each category, extract several texts from the category; Determine that the text belongs to the interpretation feature of the category based on the extracted features of each text and the trained interpretation model; Summarize the explanatory characteristics of each text extracted in the same category, and obtain the explanatory characteristics of the text cluster model in the category. A cluster result interpretation device, including: The embedding processing unit uses an embedding algorithm to embed the embedded objects to obtain the embedding result of each embedded object; The object clustering unit uses a clustering model to cluster the embedding results to obtain the category label of each embedded object; The model training unit uses the features and category labels of the embedded object to train the interpretation model; The object extraction unit extracts several embedded objects from the category for each category; A feature determining unit, based on the extracted features of each embedded object and the trained interpretation model, determining that the embedded object belongs to the interpretation feature of the category; The feature summary unit summarizes the explanatory features of each embedded object extracted under the same category to obtain the explanatory features of the cluster model under the category. A cluster result interpretation device, including: processor; Memory used to store machine executable instructions; Wherein, by reading and executing the machine executable instructions corresponding to the interpretation logic of the cluster result stored in the memory, the processor is prompted to: Use the embedding algorithm to embed the embedded objects to obtain the embedding result of each embedded object; Clustering the embedding results using a clustering model to obtain the category label of each embedding object; Training the interpretation model by using the features and category labels of the embedded object; For each category, extract several embedded objects from the category; Based on the extracted features of each embedded object and the trained interpretation model, determining that the embedded object belongs to the interpretation feature of the category; Summarize the explanatory characteristics of each embedded object extracted under the same category, and obtain the explanatory characteristics of the cluster model under the category. As can be seen from the above description, this specification can use the features and category labels of embedded objects to train explanatory interpretation models, and can determine the explanatory features of each embedded object category in each category based on the trained interpretation model. Then the interpretation features of the embedded objects in the same category can be summarized to obtain the interpretation features of the cluster model under this category, and the interpretation of the cluster results can be realized, thereby providing a basis for developers to repair the deviation of the cluster model and helping to improve the generalization of the model Ability and performance, and help avoid legal and moral hazards.

這裡將詳細地對示例性實施例進行說明,其示例表示在圖式中。下面的描述關於圖式時,除非另有表示,不同圖式中的相同數字表示相同或相似的要素。以下示例性實施例中所描述的實施方式並不代表與本說明書相一致的所有實施方式。相反,它們僅是與如所附申請專利範圍中所詳述的、本說明書的一些方面相一致的裝置和方法的例子。 在本說明書使用的術語是僅僅出於描述特定實施例的目的,而非旨在限制本說明書。在本說明書和所附申請專利範圍中所使用的單數形式的“一種”、“所述”和“該”也旨在包括多數形式,除非上下文清楚地表示其他含義。還應當理解,本文中使用的術語“和/或”是指並包含一個或多個相關聯的列出項目的任何或所有可能組合。 應當理解,儘管在本說明書可能採用術語第一、第二、第三等來描述各種資訊,但這些資訊不應限於這些術語。這些術語僅用來將同一類型的資訊彼此區分開。例如,在不脫離本說明書範圍的情況下,第一資訊也可以被稱為第二資訊,類似地,第二資訊也可以被稱為第一資訊。取決於語境,如在此所使用的詞語“如果”可以被解釋成為“在……時”或“當……時”或“響應於確定”。 本說明書提供一種叢集結果的解釋方案,一方面可採用叢集模型對嵌入對象的嵌入結果進行叢集,得到每個嵌入對象的類別標籤;另一方面可採用嵌入對象的特徵和類別標籤對具有解釋性的解釋模型進行訓練,並可基於已訓練的解釋模型確定在每個類別中提取的嵌入對象屬於所述類別的解釋特徵,然後再匯總同一類別中提取的每個嵌入對象的解釋特徵,得到上述叢集模型在該類別下的解釋特徵,從而實現叢集模型的解釋。 圖1和圖2是本說明書一示例性實施例示出的叢集結果的解釋方法的流程示意圖。 請參考圖1和圖2,所述叢集結果的解釋方法可包括以下步驟: 步驟102,採用嵌入算法對嵌入對象進行嵌入處理,得到每個嵌入對象的嵌入結果。 步驟104,採用叢集模型對所述嵌入結果進行叢集,得到每個嵌入對象的類別標籤。 在一個例子中,所述嵌入對象可以是圖結構中的圖節點。 例如,所述嵌入對象可以是使用者網路圖中的使用者節點。所述使用者網路圖可基於使用者的支付資料、好友關係資料等建立。 採用嵌入算法對使用者網路圖中的使用者節點進行嵌入處理後,可得到每個使用者節點對應的向量。 將各個使用者節點對應的向量作為入參輸入叢集模型,可得到每個使用者節點的類別標籤。 在另一個例子中,所述嵌入對象可以是待叢集的文本,例如:新聞、資訊等。 採用嵌入算法對每個文本所包括的詞彙進行嵌入處理,可得到每個文本中各個詞彙對應的向量,即可得到每個文本對應的向量集。 將每個文本對應的向量集作為入參輸入叢集模型,可得到每個文本的類別標籤。 例如,文本1對應科技類別標籤1,文本2對應體育類別標籤2等,可表示文本1屬於科技類文本,文本2屬於體育類文本等。 在本實施例中,為便於描述,可將嵌入對象經嵌入算法處理後得到的向量、矩陣等統稱為嵌入結果。採用嵌入結果作為入參進行機器學習計算,可有效提高機器處理效率。 在其他例子中,嵌入結果的計算和叢集模型的叢集可同時進行,例如,可將嵌入算法和叢集模型結合,將嵌入對象作為入參輸入結合後的模型,由結合後的模型進行嵌入結果的計算以及嵌入對象的叢集,本說明書對此不作特殊限制。 步驟106,採用所述嵌入對象的特徵和類別標籤對解釋模型進行訓練。 在本實施例中,可採用具有解釋性的多分類模型作為所述解釋模型,例如線性模型、決策樹等,本說明書對此不作特殊限制。 所述嵌入對象的特徵可包括嵌入對象的原始特徵和拓撲特徵。 其中,所述原始特徵通常是嵌入對象自身已有的特徵。 例如,使用者節點的原始特徵可包括使用者的年齡、性別、職業、收入等。 再例如,文本的原始特徵可包括詞彙的詞性、詞頻等。 所述拓撲特徵可用於表示嵌入對象的拓撲結構。 以嵌入對象是圖節點為例,所述拓撲特徵可包括:一階鄰居資料、二階鄰居數量、一階鄰居的平均鄰居數量、一階鄰居在指定原始特徵維度下的統計值等。 仍以風險團夥識別為例,所述一階鄰居在指定原始特徵維度下的統計值可以是一階鄰居的平均年齡、一階鄰居的年齡最大值、一階鄰居的平均年收入、一階鄰居的年收入最小值等。 以嵌入對象是文本所包括的詞彙為例,所述拓撲特徵可包括:最常出現在該詞彙前面的詞彙、經常和該詞彙搭配出現的詞彙個數等。 在本實施例中,採用拓撲特徵對原始特徵進行補充,一方面可解決部分嵌入對象沒有原始特徵的問題,另一方面還可將嵌入對象的拓撲結構補充到特徵中,從而提高解釋模型訓練結果的準確性。 步驟108,針對每個類別,從所述類別中提取若干嵌入對象。 在本實施例中,針對前述叢集模型輸出的每個類別,可從該類別中提取若干嵌入對象。其中,提取的嵌入對象數量可預先設置,例如5000、3000等;提取的嵌入對象數量還可是對應類別下嵌入對象總數量的百分比,例如百分之50、百分之30等,本說明書對此不作特殊限制。 步驟110,基於提取的每個嵌入對象的特徵和已訓練的解釋模型確定所述嵌入對象屬於所述類別的解釋特徵。 在本實施例中,針對提取的每個嵌入對象,可基於已訓練的解釋模型計算所述嵌入對象的每個特徵對嵌入對象類別劃分結果的貢獻值,然後可將貢獻值滿足預定條件的特徵作為該嵌入對象屬於所述類別的解釋特徵。 例如,可將所述嵌入對象的各個特徵按照貢獻值從大到小的順序進行排序,可將排列在前5位、前8位的特徵作為該嵌入對象屬於所述類別的解釋特徵,本說明書對此不作特殊限制。 步驟112,匯總同一類別下提取的每個嵌入對象的解釋特徵,得到所述叢集模型在該類別下的解釋特徵。 在一個例子中,針對同一類別,在進行匯總時,可計算各個解釋特徵出現的總次數,然後選取總次數最多的若干個解釋特徵作為所述叢集模型在該類別下的解釋特徵。

Figure 108133385-A0304-0001
表1 請參考表1的示例,假設某個類別中有5個嵌入對象,分別為嵌入對象1至嵌入對象5,嵌入對象1屬於其類別劃分結果的解釋特徵是特徵1-特徵5,嵌入對象2屬於其類別劃分結果的解釋特徵是特徵2-特徵6,則可匯總所述類別中各個特徵出現的次數,得到表2所示的統計結果。
Figure 108133385-A0304-0002
表2 請參考表2的示例,透過計算可得特徵1和特徵4均出現3次,特徵2和特徵3均出現4次等。 在本例中,假設選取出現次數最多的5個解釋特徵,則可選取出特徵1-特徵5,並將特徵1-特徵5作為所述叢集模型在該類別下的解釋特徵。 在另一個例子中,針對同一類別,在進行匯總時,可計算該類別下各個解釋特徵的貢獻值之和,然後選取貢獻值之和最多的若干個解釋特徵作為所述叢集模型在該類別下的解釋特徵。 請繼續參考表1和表2的示例,特徵1的貢獻值之和等於特徵1在嵌入對象1中的貢獻值加上特徵1在嵌入對象4中的貢獻值再加上特徵1在嵌入對象5中的貢獻值。類似的,可計算表2所示的各個特徵的貢獻值之和,然後可選取貢獻值之和排列在前5位的解釋特徵作為叢集模型在該類別下的解釋特徵。 在本實施例中,透過匯總各類別下提取的每個嵌入對象的解釋特徵,可得到所述叢集模型在該類別下的解釋特徵,實現叢集模型的結果解釋。 由以上描述可以看出,本說明書可採用嵌入對象的特徵和類別標籤對具有解釋性的解釋模型進行訓練,並可基於已訓練的解釋模型確定每個類別下各嵌入對象類別劃分的解釋特徵,然後可匯總同一分類中嵌入對象的解釋特徵,得到叢集模型在該類別下的解釋特徵,實現對叢集結果的解釋,從而為開發者修復叢集模型的偏差提供依據,有助於提升模型的泛化能力和性能,並且有助於規避法律風險和道德風險。 下面分別以解釋模型是線性模型和決策樹為例,對特徵貢獻值的計算方法進行詳細描述。 一、線性模型 在本實施例中,當解釋模型是線性模型時,在採用嵌入對象的特徵和類別標籤對該線性模型進行訓練後,可得到每個類別下各個嵌入對象特徵的權重。
Figure 108133385-A0304-0003
表3 請參考表3的示例,假設在類別1中,特徵1的權重是W1,特徵2的權重是W2,依次類推。在計算某嵌入對象各特徵對類別劃分結果的貢獻值時,可先獲取在該嵌入對象所屬的類別下各特徵的權重,然後計算嵌入對象特徵值與對應權重的乘積,並將該乘積作為所述貢獻值。 例如,特徵1對嵌入對象1的類別劃分結果的貢獻值等於嵌入對象1的特徵1的特徵值乘以W1;特徵2對嵌入對象1的類別劃分結果的貢獻值等於嵌入對象1的特徵2的特徵值乘以W2等,本說明書在此不再一一贅述。 二、決策樹 在本實施例中,當解釋模型是決策樹時,在嵌入對象的特徵和類別標籤對該決策樹進行訓練後,可得到決策樹中各特徵的分裂點。 請參考圖3所示的決策樹,圖3所示的決策樹中的各個樹節點都可代表唯一的一個特徵,例如樹節點1代表使用者年齡、樹節點2代表使用者年收入等。該決策樹中各特徵的分裂點通常指對應特徵的特徵閾值,例如,年齡樹節點的分裂點是50,當使用者年齡小於等於50時,可確定選擇分叉路徑12,當使用者年齡大於50時,可確定選擇分叉路徑13等。 在本實施例中,在確定嵌入對象特徵的貢獻值時,可先將嵌入對象輸入已訓練的決策樹,然後可在決策樹對該嵌入對象進行類別劃分的過程中,確定該嵌入對象在所述決策樹中經過的路徑,並獲取該路徑上的各個特徵及所述特徵的分裂點。 仍以圖3為例,假設某嵌入對象在圖3所示的決策樹中經過的路徑是樹節點1->樹節點2->樹節點4,則可獲取樹節點1、樹節點2和樹節點4這3個樹節點所代表的特徵以及所述特徵的分裂點。 針對獲取到的每個特徵及其分裂點,計算該嵌入對象對應的特徵值和所述分裂點之間的距離,並可將該距離作為所述特徵對該嵌入對象類別劃分結果的貢獻值。 仍以樹節點1代表使用者年齡,其分裂點是50為例,假設某嵌入對象的使用者年齡是20歲,則特徵使用者年齡的貢獻值是50與20的差值,即30。當然,在實際應用中,在計算得到上述距離後,還可對距離進行歸一化處理,並可將歸一化結果作為對應的貢獻值,本說明書對此不作特殊限制。 本說明書還提供一種對風險團夥識別模型的識別結果進行解釋的方法。 一方面,可採用嵌入算法對使用者網路圖中的使用者節點進行嵌入處理,得到每個使用者節點的嵌入結果,然後採用風險團夥識別模型對所述嵌入結果進行識別,得到每個使用者節點所屬的風險團夥標籤。 另一方面,可採用使用者節點的特徵和所述的風險團夥標籤對具有解釋性的解釋模型進行訓練。在訓練完畢後,針對每個風險團夥,可從該風險團夥中提取若干使用者節點,並基於提取的每個使用者節點的特徵和已訓練的解釋模型確定所述使用者節點屬於所述風險團夥的解釋特徵,然後可匯總同一風險團夥中提取的每個使用者節點的解釋特徵,得到所述風險團夥識別模型對應該風險團夥的解釋特徵。 在本實施例中,可得到風險團夥識別模型對應識別出的各個風險團夥的解釋特徵。 例如,風險團夥1的解釋特徵可包括:無固定職業、年收入低於8萬、常住地廣西、年齡18-25周歲等。可表示風險團夥識別模型透過這些使用者特徵識別出風險團夥1。 再例如,風險團夥2的解釋特徵可包括:無固定職業、年收入低於10萬、常住地雲南、年齡20-28周歲、使用Wi-Fi網路的SSID是12345等。可表示風險團夥識別模型透過這些使用者特徵識別出風險團夥2。 本說明書還提供一種文本叢集模型的叢集結果解釋方法。 一方面,可採用嵌入算法對待叢集的文本中各詞彙進行嵌入處理,得到每個文本的嵌入結果,然後採用文本叢集模型對所述嵌入結果進行叢集,得到每個文本所屬的類別標籤。 另一方面,可採用所述文本的特徵和所述類別標籤對具有解釋性的解釋模型進行訓練。在訓練完畢後,針對每個類別,可從該類別中提取若干文本,並基於提取的每個文本的特徵和已訓練的解釋模型確定所提取文本屬於所述類別的解釋特徵,然後可匯總同一類別中提取的每個文本的解釋特徵,得到所述文本叢集模型在該類別下的解釋特徵。 在本實施例中,可得到所述文本叢集模型叢集出的各個文本類別的解釋特徵。 例如,科技類文本的解釋特徵可包括:計算機、人工智慧、技術、創新、技術的詞頻大於0.01等。可表示文本叢集模型透過這些特徵確定出屬於科技類別的文本。 再例如,體育類文本的解釋特徵可包括:足球、籃球、運動、游泳、記錄等。可表示文本叢集模型透過這些特徵確定出屬於體育類別的文本。 與前述叢集結果的解釋方法的實施例相對應,本說明書還提供了叢集結果的解釋裝置的實施例。 本說明書叢集結果的解釋裝置的實施例可以應用在伺服器上。裝置實施例可以透過軟體實現,也可以透過硬體或者軟硬體結合的方式實現。以軟體實現為例,作為一個邏輯意義上的裝置,是透過其所在伺服器的處理器將非揮發性記憶體中對應的計算機程序指令讀取到內存中運行形成的。從硬體層面而言,如圖4所示,為本說明書叢集結果的解釋裝置所在伺服器的一種硬體結構圖,除了圖4所示的處理器、內存、網路介面、以及非揮發性記憶體之外,實施例中裝置所在的伺服器通常根據該伺服器的實際功能,還可以包括其他硬體,對此不再贅述。 圖5是本說明書一示例性實施例示出的一種叢集結果的解釋裝置的方塊圖。 請參考圖5,所述叢集結果的解釋裝置400可以應用在前述圖4所示的伺服器中,包括有:嵌入處理單元401、對象叢集單元402、模型訓練單元403、對象提取單元404、特徵確定單元405以及特徵匯總單元406。 其中,嵌入處理單元401,採用嵌入算法對嵌入對象進行嵌入處理,得到每個嵌入對象的嵌入結果; 對象叢集單元402,採用叢集模型對所述嵌入結果進行叢集,得到每個嵌入對象的類別標籤; 模型訓練單元403,採用所述嵌入對象的特徵和類別標籤對解釋模型進行訓練; 對象提取單元404,針對每個類別,從所述類別中提取若干嵌入對象; 特徵確定單元405,基於提取的每個嵌入對象的特徵和已訓練的解釋模型確定所述嵌入對象屬於所述類別的解釋特徵; 特徵匯總單元406,匯總同一類別下提取的每個嵌入對象的解釋特徵,得到所述叢集模型在該類別下的解釋特徵。 可選的,所述特徵確定單元405: 針對每個嵌入對象,基於已訓練的解釋模型計算所述嵌入對象的每個特徵對類別劃分結果的貢獻值; 提取貢獻值滿足預定條件的特徵作為所述嵌入對象屬於所述類別的解釋特徵。 可選的,當所述解釋模型是線性模型時,所述特徵確定單元405: 獲取已訓練的線性模型中的各特徵在所述嵌入對象所屬類別下的權重; 計算所述嵌入對象的特徵值與對應權重的乘積,作為所述特徵對嵌入對象類別劃分結果的貢獻值。 可選的,當所述解釋模型是決策樹時,所述特徵確定單元405: 在採用已訓練的決策樹對所述嵌入對象進行類別劃分的過程中,獲取所述嵌入對象經過的路徑上各特徵的分裂點; 計算所述特徵的分裂點與對應的嵌入對象特徵值之間的距離,作為所述特徵對嵌入對象類別劃分結果的貢獻值。 可選的,所述特徵確定單元405: 按照貢獻值從大到小的順序對特徵進行排序; 提取排列在前N位的特徵作為所述嵌入對象屬於所述類別的解釋特徵,N為大於等於1的自然數。 可選的,所述特徵包括:原始特徵和拓撲特徵。 可選的,所述拓撲特徵包括以下一種或多種: 一階鄰居數量、二階鄰居數量、一階鄰居的平均鄰居數量、一階鄰居在指定原始特徵維度下的統計值。 上述裝置中各個單元的功能和作用的實現過程具體詳見上述方法中對應步驟的實現過程,在此不再贅述。 對於裝置實施例而言,由於其基本對應於方法實施例,所以相關之處參見方法實施例的部分說明即可。以上所描述的裝置實施例僅僅是示意性的,其中所述作為分離部件說明的單元可以是或者也可以不是物理上分開的,作為單元顯示的部件可以是或者也可以不是物理單元,即可以位於一個地方,或者也可以分佈到多個網路單元上。可以根據實際的需要選擇其中的部分或者全部模塊來實現本說明書方案的目的。本領域普通技術人員在不付出創造性勞動的情況下,即可以理解並實施。 上述實施例闡明的系統、裝置、模塊或單元,具體可以由計算機晶片或實體實現,或者由具有某種功能的產品來實現。一種典型的實現設備為計算機,計算機的具體形式可以是個人計算機、膝上型計算機、蜂巢式電話、相機電話、智慧型電話、個人數位助理、媒體播放器、導航設備、電子郵件收發設備、遊戲控制台、平板計算機、可穿戴設備或者這些設備中的任意幾種設備的組合。 與前述叢集結果的解釋方法的實施例相對應,本說明書還提供一種叢集結果的解釋裝置,該裝置包括:處理器以及用於儲存機器可執行指令的記憶體。其中,處理器和記憶體通常借由內部總線相互連接。在其他可能的實現方式中,所述設備還可能包括外部介面,以能夠與其他設備或者部件進行通訊。 在本實施例中,透過讀取並執行所述記憶體儲存的與叢集結果的解釋邏輯對應的機器可執行指令,所述處理器被促使: 採用嵌入算法對嵌入對象進行嵌入處理,得到每個嵌入對象的嵌入結果; 採用叢集模型對所述嵌入結果進行叢集,得到每個嵌入對象的類別標籤; 採用所述嵌入對象的特徵和類別標籤對解釋模型進行訓練; 針對每個類別,從所述類別中提取若干嵌入對象; 基於提取的每個嵌入對象的特徵和已訓練的解釋模型確定所述嵌入對象屬於所述類別的解釋特徵; 匯總同一類別下提取的每個嵌入對象的解釋特徵,得到所述叢集模型在該類別下的解釋特徵。 可選的,在基於提取的每個嵌入對象的特徵和已訓練的解釋模型確定所述嵌入對象屬於所述類別的解釋特徵時,所述處理器被促使: 針對每個嵌入對象,基於已訓練的解釋模型計算所述嵌入對象的每個特徵對類別劃分結果的貢獻值; 提取貢獻值滿足預定條件的特徵作為所述嵌入對象的解釋特徵。 可選的,當所述解釋模型是線性模型時,在基於已訓練的解釋模型計算所述嵌入對象的每個特徵對類別劃分結果的貢獻值時,所述處理器被促使: 獲取已訓練的線性模型中的各特徵在所述嵌入對象所屬類別下的權重; 計算所述嵌入對象的特徵值與對應權重的乘積,作為所述特徵對嵌入對象類別劃分結果的貢獻值。 可選的,當所述解釋模型是決策樹時,在基於已訓練的解釋模型計算所述嵌入對象的每個特徵對類別劃分結果的貢獻值時,所述處理器被促使: 在採用已訓練的決策樹對所述嵌入對象進行類別劃分的過程中,獲取所述嵌入對象經過的路徑上各特徵的分裂點; 計算所述特徵的分裂點與對應的嵌入對象特徵值之間的距離,作為所述特徵對嵌入對象類別劃分結果的貢獻值。 可選的,在提取貢獻值滿足預定條件的特徵作為所述嵌入對象屬於所述類別的解釋特徵時,所述處理器被促使: 按照貢獻值從大到小的順序對特徵進行排序; 提取排列在前N位的特徵作為所述嵌入對象屬於所述類別的解釋特徵,N為大於等於1的自然數。 可選的,所述特徵包括:原始特徵和拓撲特徵。 可選的,所述拓撲特徵包括以下一種或多種: 一階鄰居數量、二階鄰居數量、一階鄰居的平均鄰居數量、一階鄰居在指定原始特徵維度下的統計值。 與前述叢集結果的解釋方法的實施例相對應,本說明書還提供一種計算機可讀儲存媒體,所述計算機可讀儲存媒體上儲存有計算機程序,該程序被處理器執行時實現以下步驟: 採用嵌入算法對嵌入對象進行嵌入處理,得到每個嵌入對象的嵌入結果; 採用叢集模型對所述嵌入結果進行叢集,得到每個嵌入對象的類別標籤; 採用所述嵌入對象的特徵和類別標籤對解釋模型進行訓練; 針對每個類別,從所述類別中提取若干嵌入對象; 基於提取的每個嵌入對象的特徵和已訓練的解釋模型確定所述嵌入對象屬於所述類別的解釋特徵; 匯總同一類別下提取的每個嵌入對象的解釋特徵,得到所述叢集模型在該類別下的解釋特徵。 可選的,所述基於提取的每個嵌入對象的特徵和已訓練的解釋模型確定所述嵌入對象屬於所述類別的解釋特徵,包括: 針對每個嵌入對象,基於已訓練的解釋模型計算所述嵌入對象的每個特徵對類別劃分結果的貢獻值; 提取貢獻值滿足預定條件的特徵作為所述嵌入對象屬於所述類別的解釋特徵。 可選的,當所述解釋模型是線性模型時,所述基於已訓練的解釋模型計算所述嵌入對象的每個特徵對類別劃分結果的貢獻值,包括: 獲取已訓練的線性模型中的各特徵在所述嵌入對象所屬類別下的權重; 計算所述嵌入對象的特徵值與對應權重的乘積,作為所述特徵對嵌入對象類別劃分結果的貢獻值。 可選的,當所述解釋模型是決策樹時,所述基於已訓練的解釋模型計算所述嵌入對象的每個特徵對類別劃分結果的貢獻值,包括: 在採用已訓練的決策樹對所述嵌入對象進行類別劃分的過程中,獲取所述嵌入對象經過的路徑上各特徵的分裂點; 計算所述特徵的分裂點與對應的嵌入對象特徵值之間的距離,作為所述特徵對嵌入對象類別劃分結果的貢獻值。 可選的,所述提取貢獻值滿足預定條件的特徵作為所述嵌入對象屬於所述類別的解釋特徵,包括: 按照貢獻值從大到小的順序對特徵進行排序; 提取排列在前N位的特徵作為所述嵌入對象屬於所述類別的解釋特徵,N為大於等於1的自然數。 可選的,所述特徵包括:原始特徵和拓撲特徵。 可選的,所述拓撲特徵包括以下一種或多種: 一階鄰居數量、二階鄰居數量、一階鄰居的平均鄰居數量、一階鄰居在指定原始特徵維度下的統計值。 上述對本說明書特定實施例進行了描述。其它實施例在所附申請專利範圍的範圍內。在一些情況下,在申請專利範圍中記載的動作或步驟可以按照不同於實施例中的順序來執行並且仍然可以實現期望的結果。另外,在圖式中描繪的過程不一定要求示出的特定順序或者連續順序才能實現期望的結果。在某些實施方式中,多任務處理和並行處理也是可以的或者可能是有利的。 以上所述僅為本說明書的較佳實施例而已,並不用以限制本說明書,凡在本說明書的精神和原則之內,所做的任何修改、等同替換、改進等,均應包含在本說明書保護的範圍之內。The exemplary embodiments will be described in detail here, and examples thereof are shown in the drawings. When the following description refers to the drawings, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with this specification. On the contrary, they are merely examples of devices and methods consistent with some aspects of this specification as detailed in the scope of the appended application. The terms used in this specification are only for the purpose of describing specific embodiments, and are not intended to limit the specification. The singular forms of "a", "the" and "the" used in this specification and the scope of the appended applications are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" used herein refers to and includes any or all possible combinations of one or more associated listed items. It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of this specification, the first information can also be referred to as second information, and similarly, the second information can also be referred to as first information. Depending on the context, the word "if" as used herein can be interpreted as "when" or "when" or "in response to determination". This specification provides an explanation solution for clustering results. On the one hand, the cluster model can be used to cluster the embedded results of embedded objects to obtain the category label of each embedded object; on the other hand, the features and category label pairs of the embedded objects can be used to be explanatory Interpretation model for training, and based on the trained interpretation model, it can be determined that the embedded objects extracted in each category belong to the interpretation features of the category, and then the interpretation features of each embedded object extracted in the same category are summarized to obtain the above The explanatory characteristics of the cluster model in this category, so as to realize the explanation of the cluster model. FIG. 1 and FIG. 2 are schematic flowcharts of a method for interpreting cluster results shown in an exemplary embodiment of this specification. Please refer to FIG. 1 and FIG. 2. The method for explaining the cluster result may include the following steps: Step 102: Embedding the embedded objects using an embedding algorithm to obtain the embedding result of each embedded object. Step 104: Use a clustering model to cluster the embedding results to obtain a category label of each embedded object. In an example, the embedded object may be a graph node in a graph structure. For example, the embedded object may be a user node in a user network graph. The user network map can be established based on the user's payment data, friend relationship data, and so on. After embedding the user nodes in the user network graph using the embedding algorithm, the vector corresponding to each user node can be obtained. The vector corresponding to each user node is input into the cluster model as an input parameter, and the category label of each user node can be obtained. In another example, the embedded object may be the text to be clustered, such as news, information, etc. Using the embedding algorithm to embed the words included in each text, the vector corresponding to each word in each text can be obtained, and then the vector set corresponding to each text can be obtained. The vector set corresponding to each text is input into the cluster model as the input parameter, and the category label of each text can be obtained. For example, text 1 corresponds to science and technology category label 1, text 2 corresponds to sports category label 2, etc., which may indicate that text 1 belongs to science and technology text, and text 2 belongs to sports text. In this embodiment, for ease of description, the vectors, matrices, etc. obtained after the embedded objects are processed by the embedding algorithm may be collectively referred to as embedding results. Using embedded results as input parameters for machine learning calculations can effectively improve machine processing efficiency. In other examples, the calculation of the embedding result and the clustering of the cluster model can be performed at the same time. For example, the embedding algorithm and the cluster model can be combined, and the embedded object can be input as the input parameter to the combined model, and the combined model can perform the embedding result The calculation and the cluster of embedded objects are not limited in this manual. Step 106: Use the features and category labels of the embedded object to train the interpretation model. In this embodiment, an explanatory multi-class model can be used as the explanatory model, such as a linear model, a decision tree, etc., which is not particularly limited in this specification. The features of the embedded object may include original features and topological features of the embedded object. Wherein, the original feature is usually an existing feature of the embedded object itself. For example, the original characteristics of the user node may include the user's age, gender, occupation, and income. For another example, the original features of the text may include the part of speech and word frequency of the vocabulary. The topological feature can be used to represent the topological structure of the embedded object. Taking the embedding object as a graph node as an example, the topological features may include: first-order neighbor information, the number of second-order neighbors, the average number of neighbors of the first-order neighbor, and the statistical value of the first-order neighbor under the specified original feature dimension. Still taking risk group identification as an example, the statistical values of the first-order neighbors under the specified original feature dimensions can be the average age of the first-order neighbors, the maximum age of the first-order neighbors, the average annual income of the first-order neighbors, and the first-order neighbors. Minimum annual income, etc. Taking the words included in the text as the embedded object as an example, the topological features may include: the words that appear most often before the words, the number of words that often appear in conjunction with the words, and so on. In this embodiment, topological features are used to supplement the original features. On the one hand, it can solve the problem that some embedded objects have no original features. On the other hand, it can also add the topological structure of the embedded objects to the features, thereby improving the interpretation model training results. Accuracy. Step 108: For each category, extract several embedded objects from the category. In this embodiment, for each category output by the aforementioned cluster model, several embedded objects can be extracted from the category. Among them, the number of extracted embedded objects can be preset, such as 5000, 3000, etc.; the number of extracted embedded objects can also be a percentage of the total number of embedded objects in the corresponding category, such as 50%, 30%, etc. No special restrictions. Step 110: Based on the extracted features of each embedded object and the trained interpretation model, it is determined that the embedded object belongs to the interpretation feature of the category. In this embodiment, for each extracted embedded object, the contribution value of each feature of the embedded object to the classification result of the embedded object category can be calculated based on the trained interpretation model, and then the contribution value can be the feature whose contribution value meets the predetermined condition As an explanatory feature that the embedded object belongs to the category. For example, each feature of the embedded object can be sorted in the order of contribution value from largest to smallest, and the features arranged in the top 5 and the top 8 can be used as the explanatory features of the embedded object belonging to the category. There are no special restrictions on this. Step 112: Summarize the explanatory features of each embedded object extracted under the same category to obtain the explanatory features of the cluster model under the category. In one example, for the same category, when summarizing, the total number of occurrences of each explanatory feature can be calculated, and then several explanatory features with the largest total number of times can be selected as the explanatory features of the cluster model in this category.
Figure 108133385-A0304-0001
Table 1 Please refer to the example in Table 1. Suppose there are 5 embedded objects in a certain category, namely embedded object 1 to embedded object 5. Embedded object 1 belongs to its category classification result. The explanatory features are feature 1-feature 5, embedded object 2 The explanatory features belonging to the classification results of its category are feature 2-feature 6, then the number of occurrences of each feature in the category can be summarized to obtain the statistical results shown in Table 2.
Figure 108133385-A0304-0002
Table 2 Please refer to the example in Table 2. Through calculation, it can be found that both Feature 1 and Feature 4 appear 3 times, and both Feature 2 and Feature 3 appear 4 times. In this example, assuming that the five explanatory features with the most occurrences are selected, then feature 1 to feature 5 can be selected, and feature 1 to feature 5 are used as the explanatory feature of the cluster model in this category. In another example, for the same category, when summarizing, the sum of the contribution values of each explanatory feature under the category can be calculated, and then several explanatory features with the largest sum of contribution values can be selected as the cluster model under the category Explanatory characteristics. Please continue to refer to the examples in Table 1 and Table 2. The sum of the contribution value of feature 1 is equal to the contribution value of feature 1 in embedded object 1 plus the contribution value of feature 1 in embedded object 4 plus the contribution of feature 1 in embedded object 5. Contribution value in. Similarly, the sum of the contribution values of each feature shown in Table 2 can be calculated, and then the top 5 explanatory features with the sum of contribution values can be selected as the explanatory features of the cluster model in this category. In this embodiment, by summarizing the explanatory characteristics of each embedded object extracted under each category, the explanatory characteristics of the cluster model under the category can be obtained, and the result interpretation of the cluster model can be realized. As can be seen from the above description, this specification can use the features and category labels of embedded objects to train explanatory interpretation models, and can determine the explanatory features of each embedded object category in each category based on the trained interpretation model. Then the interpretation features of the embedded objects in the same category can be summarized to obtain the interpretation features of the cluster model under this category, and the interpretation of the cluster results can be realized, thereby providing a basis for developers to repair the deviation of the cluster model and helping to improve the generalization of the model Ability and performance, and help avoid legal and moral hazards. The following is a detailed description of the calculation method of the feature contribution value by taking the explanation model as a linear model and a decision tree as examples. 1. Linear model In this embodiment, when the interpretation model is a linear model, after the linear model is trained using the features and category labels of the embedded object, the weight of each embedded object feature in each category can be obtained.
Figure 108133385-A0304-0003
Table 3 Please refer to the example in Table 3. Assume that in category 1, the weight of feature 1 is W1, the weight of feature 2 is W2, and so on. When calculating the contribution value of each feature of a certain embedded object to the classification result, the weight of each feature under the category to which the embedded object belongs can be obtained first, and then the product of the feature value of the embedded object and the corresponding weight can be calculated, and the product is taken as the total value. The contribution value. For example, the contribution value of feature 1 to the category classification result of embedded object 1 is equal to the feature value of feature 1 of embedded object 1 multiplied by W1; the contribution value of feature 2 to the category classification result of embedded object 1 is equal to that of feature 2 of embedded object 1 The characteristic value is multiplied by W2, etc., which will not be repeated here in this specification. 2. Decision tree In this embodiment, when the interpretation model is a decision tree, after the decision tree is trained on the feature and category label of the embedded object, the split point of each feature in the decision tree can be obtained. Please refer to the decision tree shown in FIG. 3, each tree node in the decision tree shown in FIG. 3 can represent a unique feature, for example, tree node 1 represents the user's age, tree node 2 represents the user's annual income, etc. The split point of each feature in the decision tree usually refers to the feature threshold of the corresponding feature. For example, the split point of the age tree node is 50. When the user’s age is less than or equal to 50, the bifurcation path 12 can be determined to be selected. When the user’s age is greater than At 50 o'clock, you can decide to choose the branch path 13 and so on. In this embodiment, when determining the contribution value of the features of the embedded object, the embedded object can be input into the trained decision tree first, and then the embedded object can be determined in the process of classifying the embedded object in the decision tree. Describe the path passed in the decision tree, and obtain each feature on the path and the split point of the feature. Still taking Figure 3 as an example, assuming that the path taken by an embedded object in the decision tree shown in Figure 3 is tree node 1->tree node 2->tree node 4, tree node 1, tree node 2, and tree node can be obtained 4 The features represented by these three tree nodes and the split points of the features. For each acquired feature and its split point, the distance between the feature value corresponding to the embedded object and the split point is calculated, and the distance can be used as the contribution value of the feature to the classification result of the embedded object. Still taking tree node 1 to represent the user's age, and its split point is 50 as an example, assuming that the user's age of an embedded object is 20 years old, the contribution value of the characteristic user's age is the difference between 50 and 20, that is, 30. Of course, in practical applications, after the above distance is calculated, the distance can also be normalized, and the normalized result can be used as the corresponding contribution value, which is not particularly limited in this specification. This manual also provides a method to explain the identification results of the risk group identification model. On the one hand, the embedding algorithm can be used to embed the user nodes in the user network graph to obtain the embedding results of each user node, and then use the risk group identification model to identify the embedding results to obtain each user node. The label of the risk group to which the person node belongs. On the other hand, the characteristics of the user node and the risk group label can be used to train an explanatory interpretation model. After the training is completed, for each risk group, several user nodes can be extracted from the risk group, and based on the extracted characteristics of each user node and the trained interpretation model, it is determined that the user node belongs to the risk The interpretive characteristics of the group, and then the interpretive characteristics of each user node extracted from the same risk group can be summarized to obtain the interpretive characteristics of the risk group identification model corresponding to the risk group. In this embodiment, the interpretation characteristics of each identified risk group corresponding to the risk group identification model can be obtained. For example, the explanatory characteristics of risk group 1 may include: no fixed occupation, annual income of less than 80,000, habitual residence in Guangxi, age 18-25, etc. It can be said that the risk group identification model identifies risk group 1 through these user characteristics. For another example, the explanatory characteristics of risk group 2 may include: no permanent occupation, annual income of less than 100,000, permanent residence in Yunnan, age 20-28, and Wi-Fi network SSID 12345. It can be said that the risk group identification model recognizes the risk group 2 through these user characteristics. This manual also provides a method for interpreting the clustering results of the text clustering model. On the one hand, the embedding algorithm can be used to embed each word in the text to be clustered to obtain the embedding result of each text, and then use the text clustering model to cluster the embedding result to obtain the category label to which each text belongs. On the other hand, the feature of the text and the category label can be used to train an explanatory model. After the training is completed, for each category, several texts can be extracted from the category, and based on the extracted features of each text and the trained interpretation model, it is determined that the extracted text belongs to the interpretation features of the category, and then the same The interpretation feature of each text extracted from the category is obtained, and the interpretation feature of the text cluster model under the category is obtained. In this embodiment, the interpretation features of each text category clustered by the text cluster model can be obtained. For example, the explanatory characteristics of science and technology texts may include: computer, artificial intelligence, technology, innovation, and the word frequency of technology is greater than 0.01. It can be expressed that the text cluster model determines the text belonging to the technology category through these characteristics. For another example, the explanatory features of sports texts may include: football, basketball, sports, swimming, records, and so on. It can be expressed that the text cluster model determines the text belonging to the sports category through these characteristics. Corresponding to the foregoing embodiment of the method for interpreting cluster results, this specification also provides an embodiment of a device for interpreting cluster results. The embodiment of the interpretation device for the cluster results of this specification can be applied to the server. The device embodiments can be implemented through software, or through hardware or a combination of software and hardware. Take software implementation as an example. As a logical device, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory through the processor of the server where it is located. From the perspective of hardware, as shown in Figure 4, a hardware structure diagram of the server where the interpretation device of the cluster results of this manual is located, except for the processor, memory, network interface, and non-volatile memory shown in Figure 4 In addition to the memory, the server where the device is located in the embodiment usually includes other hardware according to the actual function of the server, which will not be repeated here. Fig. 5 is a block diagram of a clustering result interpretation device shown in an exemplary embodiment of this specification. Referring to FIG. 5, the cluster result interpretation device 400 can be applied to the server shown in FIG. 4, and includes: an embedding processing unit 401, an object clustering unit 402, a model training unit 403, an object extraction unit 404, and features The determination unit 405 and the feature summary unit 406. Wherein, the embedding processing unit 401 uses an embedding algorithm to perform embedding processing on the embedded objects to obtain the embedding result of each embedded object; the object clustering unit 402 uses a clustering model to cluster the embedding results to obtain the category label of each embedded object Model training unit 403, which uses the features and category labels of the embedded objects to train the interpretation model; Object extraction unit 404, for each category, extracts several embedded objects from the category; Feature determination unit 405, based on the extracted The feature of each embedded object and the trained interpretation model determine that the embedded object belongs to the interpretation feature of the category; the feature summary unit 406 summarizes the interpretation features of each embedded object extracted under the same category to obtain the cluster model in The explanatory characteristics under this category. Optionally, the feature determining unit 405: For each embedded object, calculate the contribution value of each feature of the embedded object to the category division result based on the trained interpretation model; and extract the feature whose contribution value meets a predetermined condition as the The embedded object belongs to the explanatory feature of the category. Optionally, when the interpretation model is a linear model, the feature determination unit 405: obtains the weight of each feature in the trained linear model in the category to which the embedded object belongs; calculates the feature value of the embedded object The product of the corresponding weight is used as the contribution value of the feature to the classification result of the embedded object. Optionally, when the interpretation model is a decision tree, the feature determination unit 405: in the process of classifying the embedded object by using the trained decision tree, obtain each path along the path that the embedded object passes. The split point of the feature; the distance between the split point of the feature and the corresponding feature value of the embedded object is calculated as the contribution value of the feature to the classification result of the embedded object. Optionally, the feature determining unit 405: sorts the features in the order of the contribution value from the largest to the smallest; extracts the top N features as the interpretive features that the embedded object belongs to the category, and N is greater than or equal to The natural number of 1. Optionally, the features include: original features and topological features. Optionally, the topological feature includes one or more of the following: the number of first-order neighbors, the number of second-order neighbors, the average number of neighbors of the first-order neighbors, and the statistical value of the first-order neighbors in a specified original feature dimension. For the implementation process of the functions and roles of each unit in the above-mentioned device, please refer to the implementation process of the corresponding steps in the above-mentioned method for details, which will not be repeated here. As for the device embodiment, since it basically corresponds to the method embodiment, the relevant part can refer to the part of the description of the method embodiment. The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. Those of ordinary skill in the art can understand and implement it without creative work. The systems, devices, modules, or units explained in the above embodiments may be implemented by computer chips or entities, or implemented by products with certain functions. A typical implementation device is a computer. The specific form of the computer can be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email receiving and sending device, and a game. A console, a tablet computer, a wearable device, or a combination of any of these devices. Corresponding to the foregoing embodiment of the method for interpreting cluster results, this specification also provides a device for interpreting cluster results. The device includes a processor and a memory for storing machine executable instructions. Among them, the processor and the memory are usually connected to each other via an internal bus. In other possible implementation manners, the device may also include an external interface to be able to communicate with other devices or components. In this embodiment, by reading and executing the machine executable instructions corresponding to the interpretation logic of the cluster result stored in the memory, the processor is prompted to: use an embedding algorithm to embed the embedded object to obtain each The embedding result of the embedded object; clustering the embedding result using a clustering model to obtain the category label of each embedded object; using the features and category label of the embedded object to train the interpretation model; for each category, from the Extract a number of embedded objects from the category; Determine the interpretive features of the embedded objects belonging to the category based on the extracted features of each embedded object and the trained interpretation model; Summarize the interpretive features of each embedded object extracted in the same category to obtain The explanatory characteristics of the cluster model in this category. Optionally, when determining that the embedded object belongs to the interpretation feature of the category based on the extracted feature of each embedded object and the trained interpretation model, the processor is prompted to: For each embedded object, based on the trained interpretation model The interpretation model calculates the contribution value of each feature of the embedded object to the classification result; extracts the feature whose contribution value satisfies a predetermined condition as the interpretation feature of the embedded object. Optionally, when the interpretation model is a linear model, when calculating the contribution value of each feature of the embedded object to the category division result based on the trained interpretation model, the processor is prompted to: obtain the trained interpretation model The weight of each feature in the linear model under the category to which the embedded object belongs; calculating the product of the feature value of the embedded object and the corresponding weight as the contribution value of the feature to the result of the classification of the embedded object. Optionally, when the interpretation model is a decision tree, when calculating the contribution value of each feature of the embedded object to the classification result based on the trained interpretation model, the processor is prompted to: In the process of classifying the embedded object in the decision tree of, obtain the split point of each feature on the path that the embedded object passes; calculate the distance between the split point of the feature and the corresponding feature value of the embedded object as The contribution value of the feature to the classification result of the embedded object category. Optionally, when extracting the feature whose contribution value meets a predetermined condition as the interpretation feature that the embedded object belongs to the category, the processor is prompted to: sort the features in descending order of the contribution value; extract the arrangement The features in the top N bits are used as explanatory features for the embedded object to belong to the category, and N is a natural number greater than or equal to 1. Optionally, the features include: original features and topological features. Optionally, the topological feature includes one or more of the following: the number of first-order neighbors, the number of second-order neighbors, the average number of neighbors of the first-order neighbors, and the statistical value of the first-order neighbors in a specified original feature dimension. Corresponding to the foregoing embodiment of the method for interpreting cluster results, this specification also provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the following steps are implemented: The algorithm embeds the embedded objects to obtain the embedding results of each embedded object; clusters the embedding results using a clustering model to obtain the category label of each embedded object; uses the features and category labels of the embedded object to interpret the model Perform training; For each category, extract several embedded objects from the category; Based on the extracted features of each embedded object and the trained interpretation model, determine that the embedded object belongs to the interpretation features of the category; Summarize the interpretation features of the category The extracted explanatory features of each embedded object are obtained to obtain explanatory features of the cluster model in this category. Optionally, the determining that the embedded object belongs to the category based on the extracted features of each embedded object and the trained interpretation model includes: for each embedded object, calculating the data based on the trained interpretation model The contribution value of each feature of the embedded object to the category division result; and the feature whose contribution value satisfies a predetermined condition is extracted as the interpretive feature that the embedded object belongs to the category. Optionally, when the interpretation model is a linear model, the calculation of the contribution value of each feature of the embedded object to the classification result based on the trained interpretation model includes: obtaining each of the trained linear models The weight of the feature under the category to which the embedded object belongs; calculating the product of the feature value of the embedded object and the corresponding weight as the contribution value of the feature to the result of the classification of the embedded object category. Optionally, when the interpretation model is a decision tree, the calculation of the contribution value of each feature of the embedded object to the result of the classification based on the trained interpretation model includes: In the process of classifying the embedded object, the split point of each feature on the path that the embedded object passes through is obtained; the distance between the split point of the feature and the corresponding feature value of the embedded object is calculated as the feature pair embedding Contribution value of the result of object classification. Optionally, the extracting the feature whose contribution value satisfies a predetermined condition as the interpretive feature that the embedded object belongs to the category includes: sorting the features according to the contribution value from large to small; extracting the top N The feature is used as an explanatory feature that the embedded object belongs to the category, and N is a natural number greater than or equal to 1. Optionally, the features include: original features and topological features. Optionally, the topological feature includes one or more of the following: the number of first-order neighbors, the number of second-order neighbors, the average number of neighbors of the first-order neighbors, and the statistical value of the first-order neighbors in a specified original feature dimension. The foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the attached patent application. In some cases, the actions or steps described in the scope of the patent application may be performed in a different order from the embodiment and still achieve desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or sequential order shown in order to achieve the desired result. In some embodiments, multitasking and parallel processing are also possible or may be advantageous. The above descriptions are only the preferred embodiments of this specification, and are not intended to limit this specification. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this specification shall be included in this specification Within the scope of protection.

102:步驟 104:步驟 106:步驟 108:步驟 110:步驟 112:步驟 400:解釋裝置 401:嵌入處理單元 402:對象叢集單元 403:模型訓練單元 404:對象提取單元 405:特徵確定單元 406:特徵匯總單元102: Step 104: Step 106: Step 108: Step 110: Step 112: Step 400: interpretation device 401: Embedded processing unit 402: Object Cluster Unit 403: Model Training Unit 404: Object extraction unit 405: Feature determination unit 406: Feature Summary Unit

圖1是本說明書一示例性實施例示出的一種叢集結果的解釋方法的流程示意圖。 圖2是本說明書一示例性實施例示出的另一種叢集結果的解釋方法的流程示意圖。 圖3是本說明書一示例性實施例示出的一種決策樹示意圖。 圖4是本說明書一示例性實施例示出的一種用於叢集結果的解釋裝置的一結構示意圖。 圖5是本說明書一示例性實施例示出的一種叢集結果的解釋裝置的方塊圖。Fig. 1 is a schematic flowchart of a method for interpreting cluster results according to an exemplary embodiment of the present specification. Fig. 2 is a schematic flowchart of another method for explaining clustering results according to an exemplary embodiment of the present specification. Fig. 3 is a schematic diagram of a decision tree shown in an exemplary embodiment of this specification. Fig. 4 is a schematic structural diagram of an interpreting device for clustering results according to an exemplary embodiment of the present specification. Fig. 5 is a block diagram of a clustering result interpretation device shown in an exemplary embodiment of this specification.

Claims (17)

一種叢集結果的解釋方法,包括: 採用嵌入算法對嵌入對象進行嵌入處理,得到每個嵌入對象的嵌入結果; 採用叢集模型對該嵌入結果進行叢集,得到每個嵌入對象的類別標籤; 採用該嵌入對象的特徵和類別標籤對解釋模型進行訓練; 針對每個類別,從該類別中提取若干嵌入對象; 基於提取的每個嵌入對象的特徵和已訓練的解釋模型確定該嵌入對象屬於該類別的解釋特徵; 匯總同一類別下提取的每個嵌入對象的解釋特徵,得到該叢集模型在該類別下的解釋特徵。A method of interpreting cluster results, including: Use the embedding algorithm to embed the embedded objects to obtain the embedding result of each embedded object; Use a clustering model to cluster the embedded result to obtain the category label of each embedded object; Use the features and category labels of the embedded object to train the interpretation model; For each category, extract several embedded objects from the category; Based on the extracted features of each embedded object and the trained interpretation model, it is determined that the embedded object belongs to the interpretation feature of the category; Summarize the explanatory characteristics of each embedded object extracted in the same category, and obtain the explanatory characteristics of the cluster model in that category. 根據申請專利範圍第1項之方法,該基於提取的每個嵌入對象的特徵和已訓練的解釋模型確定該嵌入對象屬於該類別的解釋特徵,包括: 針對每個嵌入對象,基於已訓練的解釋模型計算該嵌入對象的每個特徵對類別劃分結果的貢獻值; 提取貢獻值滿足預定條件的特徵作為該嵌入對象屬於該類別的解釋特徵。According to the method in item 1 of the scope of patent application, based on the extracted features of each embedded object and the trained interpretation model, the interpretation features that determine that the embedded object belongs to the category include: For each embedded object, calculate the contribution value of each feature of the embedded object to the classification result based on the trained interpretation model; The feature whose contribution value satisfies the predetermined condition is extracted as the explanatory feature that the embedded object belongs to the category. 根據申請專利範圍第2項之方法,當該解釋模型是線性模型時,該基於已訓練的解釋模型計算該嵌入對象的每個特徵對類別劃分結果的貢獻值,包括: 獲取已訓練的線性模型中的各特徵在該嵌入對象所屬類別下的權重; 計算該嵌入對象的特徵值與對應權重的乘積,作為該特徵對嵌入對象類別劃分結果的貢獻值。According to the method in item 2 of the scope of patent application, when the interpretation model is a linear model, the contribution value of each feature of the embedded object to the classification result is calculated based on the trained interpretation model, including: Obtain the weight of each feature in the trained linear model under the category of the embedded object; The product of the feature value of the embedded object and the corresponding weight is calculated as the contribution value of the feature to the classification result of the embedded object. 根據申請專利範圍第2項之方法,當該解釋模型是決策樹時,該基於已訓練的解釋模型計算該嵌入對象的每個特徵對類別劃分結果的貢獻值,包括: 在採用已訓練的決策樹對該嵌入對象進行類別劃分的過程中,獲取該嵌入對象經過的路徑上各特徵的分裂點; 計算該特徵的分裂點與對應的嵌入對象特徵值之間的距離,作為該特徵對嵌入對象類別劃分結果的貢獻值。According to the method in item 2 of the scope of patent application, when the interpretation model is a decision tree, the contribution value of each feature of the embedded object to the classification result is calculated based on the trained interpretation model, including: In the process of using the trained decision tree to classify the embedded object, obtain the split point of each feature on the path that the embedded object passes; The distance between the split point of the feature and the corresponding feature value of the embedded object is calculated as the contribution value of the feature to the classification result of the embedded object. 根據申請專利範圍第2項之方法,該提取貢獻值滿足預定條件的特徵作為該嵌入對象屬於該類別的解釋特徵,包括: 按照貢獻值從大到小的順序對特徵進行排序; 提取排列在前N位的特徵作為該嵌入對象屬於該類別的解釋特徵,N為大於等於1的自然數。According to the method in item 2 of the scope of patent application, the feature whose contribution value meets the predetermined condition is used as the interpretive feature that the embedded object belongs to the category, including: Sort the features in the order of contribution value from largest to smallest; The features arranged in the top N positions are extracted as the explanatory features that the embedded object belongs to the category, and N is a natural number greater than or equal to 1. 根據申請專利範圍第1項之方法, 該特徵包括:原始特徵和拓撲特徵。According to the method in item 1 of the scope of patent application, This feature includes: original feature and topological feature. 根據申請專利範圍第6項之方法,該拓撲特徵包括以下一種或多種: 一階鄰居數量、二階鄰居數量、一階鄰居的平均鄰居數量、一階鄰居在指定原始特徵維度下的統計值。According to the method in item 6 of the scope of patent application, the topological feature includes one or more of the following: The number of first-order neighbors, the number of second-order neighbors, the average number of first-order neighbors, and the statistical value of first-order neighbors under the specified original feature dimensions. 一種風險團夥識別模型的識別結果解釋方法,包括: 採用嵌入算法對使用者節點進行嵌入處理,得到每個使用者節點的嵌入結果; 採用風險團夥識別模型對該嵌入結果進行識別,得到每個使用者節點所屬的風險團夥標籤; 採用該使用者節點的特徵和該風險團夥標籤對解釋模型進行訓練; 針對每個風險團夥,從該風險團夥中提取若干使用者節點; 基於提取的每個使用者節點的特徵和已訓練的解釋模型確定該使用者節點屬於該風險團夥的解釋特徵; 匯總同一風險團夥中提取的每個使用者節點的解釋特徵,得到該風險團夥識別模型對應應該風險團夥的解釋特徵。A method for interpreting the identification results of a risk group identification model, including: Use the embedding algorithm to embed the user nodes to obtain the embedding result of each user node; Use the risk group identification model to identify the embedded result, and obtain the risk group label to which each user node belongs; Use the characteristics of the user node and the label of the risk group to train the interpretation model; For each risk group, extract a number of user nodes from the risk group; Based on the extracted features of each user node and the trained interpretation model, it is determined that the user node belongs to the interpretation feature of the risk group; Summarize the explanatory characteristics of each user node extracted from the same risk group, and obtain the explanatory characteristics of the risk group identification model corresponding to the risk group. 一種文本叢集模型的叢集結果解釋方法,包括: 採用嵌入算法對待叢集文本進行嵌入處理,得到每個文本的嵌入結果; 採用文本叢集模型對該嵌入結果進行叢集,得到每個文本的類別標籤; 採用該文本的特徵和該類別標籤對解釋模型進行訓練; 針對每個類別,從該類別中提取若干文本; 基於提取的每個文本的特徵和已訓練的解釋模型確定該文本屬於該類別的解釋特徵; 匯總同一類別中提取的每個文本的解釋特徵,得到該文本叢集模型在該類別下的解釋特徵。A method for interpreting cluster results of a text cluster model, including: Use the embedding algorithm to embed the text in the cluster to obtain the embedding result of each text; Use the text clustering model to cluster the embedded results to obtain the category label of each text; Use the features of the text and the category label to train the interpretation model; For each category, extract several texts from the category; Based on the extracted features of each text and the trained interpretation model, it is determined that the text belongs to the interpretation feature of the category; Summarize the explanatory characteristics of each text extracted in the same category, and obtain the explanatory characteristics of the text cluster model in the category. 一種叢集結果的解釋裝置,包括: 嵌入處理單元,採用嵌入算法對嵌入對象進行嵌入處理,得到每個嵌入對象的嵌入結果; 對象叢集單元,採用叢集模型對該嵌入結果進行叢集,得到每個嵌入對象的類別標籤; 模型訓練單元,採用該嵌入對象的特徵和類別標籤對解釋模型進行訓練; 對象提取單元,針對每個類別,從該類別中提取若干嵌入對象; 特徵確定單元,基於提取的每個嵌入對象的特徵和已訓練的解釋模型確定該嵌入對象屬於該類別的解釋特徵; 特徵匯總單元,匯總同一類別下提取的每個嵌入對象的解釋特徵,得到該叢集模型在該類別下的解釋特徵。A cluster result interpretation device, including: The embedding processing unit uses an embedding algorithm to embed the embedded objects to obtain the embedding result of each embedded object; The object cluster unit uses the cluster model to cluster the embedded result to obtain the category label of each embedded object; The model training unit uses the features and category labels of the embedded object to train the interpretation model; The object extraction unit, for each category, extracts several embedded objects from the category; A feature determination unit, based on the extracted features of each embedded object and the trained interpretation model, determines that the embedded object belongs to the interpretation feature of the category; The feature summary unit summarizes the explanatory features of each embedded object extracted under the same category, and obtains the explanatory features of the cluster model under the category. 根據申請專利範圍第10項之裝置,該特徵確定單元: 針對每個嵌入對象,基於已訓練的解釋模型計算該嵌入對象的每個特徵對類別劃分結果的貢獻值; 提取貢獻值滿足預定條件的特徵作為該嵌入對象屬於該類別的解釋特徵。According to the device in item 10 of the scope of patent application, the feature determination unit: For each embedded object, calculate the contribution value of each feature of the embedded object to the classification result based on the trained interpretation model; The feature whose contribution value satisfies the predetermined condition is extracted as the explanatory feature that the embedded object belongs to the category. 根據申請專利範圍第11項之裝置,當該解釋模型是線性模型時,該特徵確定單元: 獲取已訓練的線性模型中的各特徵在該嵌入對象所屬類別下的權重; 計算該嵌入對象的特徵值與對應權重的乘積,作為該特徵對嵌入對象類別劃分結果的貢獻值。According to the device of item 11 of the scope of patent application, when the interpretation model is a linear model, the feature determination unit: Obtain the weight of each feature in the trained linear model under the category of the embedded object; The product of the feature value of the embedded object and the corresponding weight is calculated as the contribution value of the feature to the classification result of the embedded object. 根據申請專利範圍第11項之裝置,當該解釋模型是決策樹時,該特徵確定單元: 在採用已訓練的決策樹對該嵌入對象進行類別劃分的過程中,獲取該嵌入對象經過的路徑上各特徵的分裂點; 計算該特徵的分裂點與對應的嵌入對象特徵值之間的距離,作為該特徵對嵌入對象類別劃分結果的貢獻值。According to the device in item 11 of the scope of patent application, when the interpretation model is a decision tree, the feature determination unit: In the process of using the trained decision tree to classify the embedded object, obtain the split point of each feature on the path that the embedded object passes; The distance between the split point of the feature and the corresponding feature value of the embedded object is calculated as the contribution value of the feature to the classification result of the embedded object. 根據申請專利範圍第11項之裝置,該特徵確定單元: 按照貢獻值從大到小的順序對特徵進行排序; 提取排列在前N位的特徵作為該嵌入對象屬於該類別的解釋特徵,N為大於等於1的自然數。According to the device in item 11 of the scope of patent application, the feature determination unit: Sort the features in the order of contribution value from largest to smallest; The features arranged in the top N positions are extracted as the explanatory features that the embedded object belongs to the category, and N is a natural number greater than or equal to 1. 根據申請專利範圍第10項之裝置, 該特徵包括:原始特徵和拓撲特徵。According to the device of item 10 of the scope of patent application, This feature includes: original feature and topological feature. 根據申請專利範圍第15項之裝置,該拓撲特徵包括以下一種或多種: 一階鄰居數量、二階鄰居數量、一階鄰居的平均鄰居數量、一階鄰居在指定原始特徵維度下的統計值。According to the 15th device in the scope of patent application, the topological features include one or more of the following: The number of first-order neighbors, the number of second-order neighbors, the average number of first-order neighbors, and the statistical value of first-order neighbors under the specified original feature dimensions. 一種叢集結果的解釋裝置,包括: 處理器; 用於儲存機器可執行指令的記憶體; 其中,透過讀取並執行該記憶體儲存的與叢集結果的解釋邏輯對應的機器可執行指令,該處理器被促使: 採用嵌入算法對嵌入對象進行嵌入處理,得到每個嵌入對象的嵌入結果; 採用叢集模型對該嵌入結果進行叢集,得到每個嵌入對象的類別標籤; 採用該嵌入對象的特徵和類別標籤對解釋模型進行訓練; 針對每個類別,從該類別中提取若干嵌入對象; 基於提取的每個嵌入對象的特徵和已訓練的解釋模型確定該嵌入對象屬於該類別的解釋特徵; 匯總同一類別下提取的每個嵌入對象的解釋特徵,得到該叢集模型在該類別下的解釋特徵。A cluster result interpretation device, including: processor; Memory used to store machine executable instructions; Wherein, by reading and executing the machine executable instructions corresponding to the interpretation logic of the cluster result stored in the memory, the processor is prompted to: Use the embedding algorithm to embed the embedded objects to obtain the embedding result of each embedded object; Use a clustering model to cluster the embedded result to obtain the category label of each embedded object; Use the features and category labels of the embedded object to train the interpretation model; For each category, extract several embedded objects from the category; Based on the extracted features of each embedded object and the trained interpretation model, it is determined that the embedded object belongs to the interpretation feature of the category; Summarize the explanatory characteristics of each embedded object extracted in the same category, and obtain the explanatory characteristics of the cluster model in that category.
TW108133385A 2018-12-04 2019-09-17 Interpretation method and device of cluster result TWI726420B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811471749.9 2018-12-04
CN201811471749.9A CN110046634B (en) 2018-12-04 2018-12-04 Interpretation method and device of clustering result

Publications (2)

Publication Number Publication Date
TW202022716A true TW202022716A (en) 2020-06-16
TWI726420B TWI726420B (en) 2021-05-01

Family

ID=67273278

Family Applications (1)

Application Number Title Priority Date Filing Date
TW108133385A TWI726420B (en) 2018-12-04 2019-09-17 Interpretation method and device of cluster result

Country Status (3)

Country Link
CN (1) CN110046634B (en)
TW (1) TWI726420B (en)
WO (1) WO2020114108A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046634B (en) * 2018-12-04 2021-04-27 创新先进技术有限公司 Interpretation method and device of clustering result
CN110766040B (en) * 2019-09-03 2024-02-06 创新先进技术有限公司 Method and device for risk clustering of transaction risk data
CN111126442B (en) * 2019-11-26 2021-04-30 北京京邦达贸易有限公司 Method for generating key attribute of article, method and device for classifying article
CN111401570B (en) * 2020-04-10 2022-04-12 支付宝(杭州)信息技术有限公司 Interpretation method and device for privacy tree model
CN111784181B (en) * 2020-07-13 2023-09-19 南京大学 Evaluation result interpretation method for criminal reconstruction quality evaluation system
CN112116028B (en) * 2020-09-29 2024-04-26 联想(北京)有限公司 Model decision interpretation realization method and device and computer equipment
CN112395500B (en) * 2020-11-17 2023-09-05 平安科技(深圳)有限公司 Content data recommendation method, device, computer equipment and storage medium
CN113284027B (en) * 2021-06-10 2023-05-09 支付宝(杭州)信息技术有限公司 Training method of partner recognition model, abnormal partner recognition method and device

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004054680A (en) * 2002-07-22 2004-02-19 Fujitsu Ltd Parallel efficiency calculation method
US9507858B1 (en) * 2007-02-28 2016-11-29 Google Inc. Selectively merging clusters of conceptually related words in a generative model for text
CN102081627B (en) * 2009-11-27 2014-09-17 北京金山办公软件有限公司 Method and system for determining contribution degree of word in text
US20130091150A1 (en) * 2010-06-30 2013-04-11 Jian-Ming Jin Determiining similarity between elements of an electronic document
CN103164713B (en) * 2011-12-12 2016-04-06 阿里巴巴集团控股有限公司 Image classification method and device
CN104239338A (en) * 2013-06-19 2014-12-24 阿里巴巴集团控股有限公司 Information recommendation method and information recommendation device
CN104346336A (en) * 2013-07-23 2015-02-11 广州华久信息科技有限公司 Machine text mutual-curse based emotional venting method and system
CN105022754B (en) * 2014-04-29 2020-05-12 腾讯科技(深圳)有限公司 Object classification method and device based on social network
JP6371870B2 (en) * 2014-06-30 2018-08-08 アマゾン・テクノロジーズ・インコーポレーテッド Machine learning service
CN104346459B (en) * 2014-11-10 2017-10-27 南京信息工程大学 A kind of text classification feature selection approach based on term frequency and chi
US9697236B2 (en) * 2014-12-05 2017-07-04 Microsoft Technology Licensing, Llc Image annotation using aggregated page information from active and inactive indices
US9788796B2 (en) * 2015-10-16 2017-10-17 General Electric Company System and method of adaptive interpretation of ECG waveforms
WO2018017467A1 (en) * 2016-07-18 2018-01-25 NantOmics, Inc. Distributed machine learning systems, apparatus, and methods
CN106682095B (en) * 2016-12-01 2019-11-08 浙江大学 The prediction of subject description word and sort method based on figure
CN108268554A (en) * 2017-01-03 2018-07-10 中国移动通信有限公司研究院 A kind of method and apparatus for generating filtering junk short messages strategy
US11621969B2 (en) * 2017-04-26 2023-04-04 Elasticsearch B.V. Clustering and outlier detection in anomaly and causation detection for computing environments
CN107203787B (en) * 2017-06-14 2021-01-08 江西师范大学 Unsupervised regularization matrix decomposition feature selection method
CN108153899B (en) * 2018-01-12 2021-11-02 安徽大学 Intelligent text classification method
CN108090048B (en) * 2018-01-12 2021-05-25 安徽大学 College evaluation system based on multivariate data analysis
CN108319682B (en) * 2018-01-31 2021-12-28 天闻数媒科技(北京)有限公司 Method, device, equipment and medium for correcting classifier and constructing classification corpus
CN108280755A (en) * 2018-02-28 2018-07-13 阿里巴巴集团控股有限公司 The recognition methods of suspicious money laundering clique and identification device
CN108875816A (en) * 2018-06-05 2018-11-23 南京邮电大学 Merge the Active Learning samples selection strategy of Reliability Code and diversity criterion
CN110046634B (en) * 2018-12-04 2021-04-27 创新先进技术有限公司 Interpretation method and device of clustering result

Also Published As

Publication number Publication date
CN110046634B (en) 2021-04-27
TWI726420B (en) 2021-05-01
WO2020114108A1 (en) 2020-06-11
CN110046634A (en) 2019-07-23

Similar Documents

Publication Publication Date Title
TWI726420B (en) Interpretation method and device of cluster result
CN110309331B (en) Cross-modal deep hash retrieval method based on self-supervision
WO2020182019A1 (en) Image search method, apparatus, device, and computer-readable storage medium
US20200143248A1 (en) Machine learning model training method and device, and expression image classification method and device
CN105469096B (en) A kind of characteristic bag image search method based on Hash binary-coding
CN110362677B (en) Text data category identification method and device, storage medium and computer equipment
CN104750798B (en) Recommendation method and device for application program
CN106557485B (en) Method and device for selecting text classification training set
CN105022754B (en) Object classification method and device based on social network
WO2017092623A1 (en) Method and device for representing text as vector
CN110619051B (en) Question sentence classification method, device, electronic equipment and storage medium
CN108804617B (en) Domain term extraction method, device, terminal equipment and storage medium
TWI711934B (en) Interpretation method and device of embedded result
CN111125469B (en) User clustering method and device of social network and computer equipment
WO2018119593A1 (en) Statement recommendation method and device
CN110096591A (en) Long text classification method, device, computer equipment and storage medium based on bag of words
KR101545050B1 (en) Method for automatically classifying answer type and apparatus, question-answering system for using the same
WO2014118978A1 (en) Learning method, image processing device and learning program
CN110969172A (en) Text classification method and related equipment
CN107392311A (en) The method and apparatus of sequence cutting
JP7393475B2 (en) Methods, apparatus, systems, electronic devices, computer readable storage media and computer programs for retrieving images
CN112668482A (en) Face recognition training method and device, computer equipment and storage medium
CN111026887A (en) Cross-media retrieval method and system
Kurniawati et al. Model optimisation of class imbalanced learning using ensemble classifier on over-sampling data
CN113886547A (en) Client real-time conversation switching method and device based on artificial intelligence and electronic equipment