WO2020057283A1 - 无监督模型评估方法、装置、服务器及可读存储介质 - Google Patents

无监督模型评估方法、装置、服务器及可读存储介质 Download PDF

Info

Publication number
WO2020057283A1
WO2020057283A1 PCT/CN2019/099668 CN2019099668W WO2020057283A1 WO 2020057283 A1 WO2020057283 A1 WO 2020057283A1 CN 2019099668 W CN2019099668 W CN 2019099668W WO 2020057283 A1 WO2020057283 A1 WO 2020057283A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
neighbor
nodes
sample
unsupervised model
Prior art date
Application number
PCT/CN2019/099668
Other languages
English (en)
French (fr)
Inventor
林建滨
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to SG11202010227TA priority Critical patent/SG11202010227TA/en
Publication of WO2020057283A1 publication Critical patent/WO2020057283A1/zh
Priority to US17/086,120 priority patent/US10997528B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to the field of computer technology, and in particular, to an unsupervised model evaluation method, device, server, and readable storage medium.
  • the unsupervised learning model can calculate the graph data, and map each node in the graph data to a vector space to obtain the calculated node vector.
  • a supervised algorithm is usually used for evaluation, and then a machine learning model is trained. But training machine learning models requires more data and time.
  • the embodiments of the present specification provide and an unsupervised model evaluation method, device, server, and computer-readable storage medium.
  • an embodiment of the present specification provides an unsupervised model evaluation method, including:
  • S node vectors corresponding to S nodes are determined from the N node vectors obtained through the unsupervised model
  • the unsupervised model is evaluated based on the predicted values of the positive samples and the predicted values of the negative samples.
  • an unsupervised model evaluation device including:
  • a first determining module configured to determine S node vectors corresponding to S nodes among the N node vectors obtained through the unsupervised model, where N and S are positive integers, and N is greater than or equal to S;
  • a second determining module configured to determine a neighbor node of each of the S nodes and a non-neighbor node of each node
  • a positive sample prediction value determining module configured to determine, according to the node vector of the neighbor node of each node and the node vector of each node, the neighbor node of each node and the State the similarity of each node as the predicted value of the positive sample;
  • a negative sample prediction value determining module configured to determine the non-neighbor nodes of each node according to the node vector of the non-neighbor nodes of each node and the node vector of each node The similarity with each node is used as a predicted value of a negative sample;
  • An evaluation module is configured to evaluate the unsupervised model according to the predicted value of the positive sample and the predicted value of the negative sample.
  • a server including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the steps of the method described in any one of the foregoing.
  • an embodiment of the present specification provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of the method described in any one of the foregoing are implemented.
  • S node vectors corresponding to the S nodes are determined from the N node vectors obtained through the unsupervised model, and N and S are positive integers; Determining a neighbor node of each node in the S nodes, and a non-neighbor node of each node; according to a node vector of a neighbor node of each node, and each of the Node vector of each node, determine the similarity between the neighbor node of each node and each node as the predicted value of the positive sample; according to the nodes of the non-neighbor nodes of each node The point vector, and the node vector of each node, determining the similarity between the non-neighbor nodes of each node and each node as the predicted value of the negative sample; The predicted value and the predicted value of the negative sample are used to evaluate the unsupervised model.
  • the similarity between each node and a neighbor node is used as a positive sample, and the similarity between each node and a non-neighbor node is used as a negative sample. Evaluation to determine the effect of unsupervised models does not require additional model training, making the evaluation process of unsupervised models easier.
  • FIG. 1 is a flowchart of an unsupervised model evaluation method provided by the first aspect of the embodiment of the specification
  • FIG. 2 is a flowchart of a method for evaluating an unsupervised model based on a two-tuple sample according to an embodiment of the present specification
  • FIG. 3 is a schematic diagram of an unsupervised model evaluation device provided by the second aspect of the embodiment of the present specification.
  • FIG. 4 is a schematic diagram of a server provided by a third aspect of the embodiment of the present specification.
  • an embodiment of the present specification provides an unsupervised model evaluation method. As shown in FIG. 1, a flowchart of an unsupervised model evaluation method provided by an embodiment of the present specification includes the following steps:
  • Step S11 Determine S node vectors corresponding to the S nodes from the N node vectors obtained through the unsupervised model, where N and S are positive integers;
  • the N node vectors may be obtained by calculating N input data by an unsupervised model.
  • N input data can be regarded as N nodes.
  • the input data may be data determined according to the actual situation.
  • the input data may be a certain number of vectors used to represent words.
  • Each word may be regarded as a node, and each word may use an initial The vector is used to represent the initial vector of the word as the input of the unsupervised model.
  • the embedding embedding vector
  • embedding can be regarded as mapping nodes from the original space to another space, and maintaining the structure information and distance information of the nodes in the original space in another space.
  • the unsupervised model can be selected according to actual needs.
  • the unsupervised model can be a Word2Vec model.
  • each word is mapped to a low-dimensional vector space through the Word2Vec model.
  • the final output node vector can be determined by the loss function.
  • the NCE Noise Contrastive Estimation, noise contrast estimation
  • the node vector obtained through the NCE loss function can make the node closer to its neighbors in the vector space and closer to the non-neighbor nodes. Far features.
  • S different nodes may be selected from N nodes, or S nodes may be randomly selected from N nodes.
  • the S nodes may be selected at one time or multiple times, which is not limited here.
  • Step S12 determining a neighbor node of each of the S nodes and a non-neighbor node of each node;
  • the node may include multiple neighbor nodes and multiple non-neighbor nodes, and any of the multiple neighbor nodes may be randomly selected.
  • the number of neighbor nodes and an arbitrary number of non-neighbor nodes are randomly selected from a plurality of non-neighbor nodes.
  • the number of selected neighbor nodes and the number of selected non-neighbor nodes can be set. Not limited here.
  • the neighbor nodes of a node can be generated according to a preset neighbor node generation rule, or N nodes can be formed into a graph according to the relationship between the nodes, and the neighbor nodes are selected in the formed graph.
  • the edge connection may be performed according to the relationship between the nodes to form a graph structure including the connection relationship between the nodes.
  • the neighbor node of each node in the S nodes is found.
  • non-neighbor nodes In another embodiment, a neighbor node of each node may be generated according to a preset neighbor generation algorithm.
  • other methods may also be used to determine the neighbor nodes, which are not limited in the embodiments of the present specification. The determination of non-neighbor nodes is similar to the determination of neighbor nodes, and will not be repeated here.
  • Step S13 Determine the relationship between the neighbor node of each node and each node according to the node vector of the neighbor node of each node and the node vector of each node. Similarity as the predicted value of the positive sample;
  • Step S14 determine the non-neighbor nodes of each node and each node according to the node vector of the non-neighbor nodes of each node and the node vector of each node The similarity of points is used as the predicted value of negative samples;
  • the neighbor node may be used as a positive sample, and the non-neighbor node may be used as a negative sample.
  • the calculation of the similarity can be selected according to actual needs. In one embodiment, it can be obtained by calculating the inner product between vectors and normalizing the inner product.
  • the similarity may be a number between 0 and 1. If the similarity between two nodes is closer to 1, it indicates that the probability that the two nodes are neighbor nodes to each other is greater.
  • Step S15 Evaluate the unsupervised model according to the predicted values of the positive samples and the predicted values of the negative samples.
  • the closer the predicted value of the neighbor node, that is, the positive sample is closer to 0, and the closer the predicted value of the non-neighbor node, that is, the negative sample, is closer to 1, the worse the effect of the unsupervised model is. Therefore, the prediction value of the positive sample and the prediction value of the negative sample can be used to evaluate the quality of the unsupervised model. This method is more versatile and easy to implement.
  • determining the S node vectors corresponding to the S nodes from the N node vectors obtained through the unsupervised model includes: determining a node set, where the node set includes N nodes Points, the N nodes correspond to the N node vectors; a node is randomly selected as a sampling node in the node set, and a total of S times are selected to obtain the S nodes.
  • the N nodes corresponding to the N node vectors calculated by the unsupervised model may be used to form a node set.
  • S nodes there may be S times of selection in the node set, and each time randomly selects one of the N nodes as the sampling node.
  • the determining a neighbor node of each node in the S nodes and a non-neighbor node of each node includes: for each node in the S nodes Point, the following steps are performed: according to a preset neighbor generation algorithm, a set of neighbor nodes of the node, and a set of non-neighbor nodes of the node are determined; and one of the set of neighbor nodes is determined Or multiple neighbor nodes; determining one or more non-neighbor nodes in the set of non-neighbor nodes.
  • a preset neighbor generation algorithm can be used to obtain the neighbor node of the node.
  • the preset neighbor generation algorithm can be selected according to actual needs, such as first order Neighbor algorithm, random walk neighbor algorithm, etc.
  • the preset neighbor generation algorithm is the same as the neighbor generation algorithm used in the calculation process of the unsupervised model.
  • the non-neighbor node when a preset neighbor generation algorithm is used to generate a neighbor node set of a node, the non-neighbor node can be obtained by subtracting the neighbor node set from the node set. It should be understood that for each node, the number of elements contained in the corresponding neighbor node set may be different. For example, the first node has 5 neighbor nodes and the second node has 8 neighbor nodes. point. In the selection of neighbor nodes, the number of neighbor nodes may be fixed or random.
  • the same number of neighbor nodes are selected for each node. For example, if 1 node is selected, the One of the five neighbor nodes is randomly selected, and one of the eight neighbor nodes of the second node is randomly selected. In another embodiment, the number of neighbor nodes is randomly determined for each node. For example, 2 neighbor nodes are randomly selected from 5 neighbor nodes in the first node. Among the 8 neighbor nodes, 3 neighbor nodes are randomly selected. Of course, for each node, no matter which method is used to determine the neighbor node, the number of the determined neighbor nodes is less than or equal to the total number of neighbor nodes that the node actually corresponds to.
  • the determination method is similar to that of the above-mentioned neighbor nodes.
  • the number of non-neighbor nodes can be set according to actual needs. In one embodiment, at each non-neighbor node In the point set, 4-20 nodes can be randomly selected as the non-neighbor nodes of the node.
  • the method of determining non-neighbor nodes please refer to the description of determining the neighbor nodes above, which will not be repeated here.
  • the evaluating the unsupervised model according to the predicted value of the positive sample and the predicted value of the negative sample includes: according to the predicted value of the positive sample, and Construct a first type of binary sample based on the positive sample label value; construct a second type of binary sample based on the predicted value of the negative sample and a negative sample label value; based on the first type of binary sample and the The second class of two-tuple samples is described, and the unsupervised model is evaluated.
  • a verification set may be constructed, and the effect of evaluating the unsupervised model may be determined through the verification set.
  • the verification set may be initially set to an empty set. After constructing a two-tuple sample according to the positive samples and the negative samples, the two-tuple samples are added to the initial empty set to obtain the final verification set.
  • the prediction value of the positive sample of the neighbor node and the positive sample label value can be used to Determine a first-class binary group sample. For example, if the predicted value of a positive sample of a neighbor node is 0.9, then the corresponding first-class binary group sample is (1, 0.9).
  • a second type of binary group sample can be determined according to the predicted value of the negative sample of the non-neighbor node and the negative sample label value, for example, a negative sample of a non-neighbor node
  • the predicted value of is 0.3, and the corresponding second-class two-tuple sample is (0,0.3).
  • the unsupervised model can be evaluated.
  • FIG. 2 in the embodiment of the present specification, the A flowchart of a method for supervising a model for evaluation. The method includes the following steps:
  • Step S21 Determine a working characteristic curve of the target subject according to the first type of the two-tuple samples and the second type of the two-tuple samples;
  • Step S22 Obtain an area under the working characteristic curve of the target subject to evaluate the unsupervised model.
  • the first set of two-tuple samples and the second set of two-tuple samples are used to perform binary classification, and the superiority of the binary classification is evaluated by determining the area under the working characteristic curve of the target subject. inferior.
  • the predicted value may be used to represent the probability that two nodes are neighbor nodes to each other.
  • ROC receiver receiver operating characteristic
  • the sorted samples are binned by selecting different thresholds, that is, binary samples with predicted values greater than or equal to the threshold are predicted as positive samples, and binary samples with predicted values less than the threshold are predicted as negative samples. Then calculate the true case rate (TPR) and false positive case rate (FPR) under the threshold based on the actual corresponding label values in the two-tuple sample.
  • TPR true case rate
  • FPR false positive case rate
  • the false positive rate will involve the following four cases: True positive (TP), False positive (FP), False negative (FN), True negative ( True negative, TN).
  • TP means that the prediction is a positive sample and the prediction is correct
  • FP means that the prediction is a positive sample and the prediction is wrong
  • FN means that the prediction is a negative sample and the prediction is wrong
  • TN means that the prediction is a negative sample and the prediction is correct Already.
  • the binary sample is TP, and if the label value is a negative sample label, it indicates that the binary sample is FP. If the predicted value of a two-tuple sample is less than the threshold, it indicates that the two-tuple sample is predicted to be a negative sample, and then check the label value of the two-tuple sample. If the label value is a positive-sampled tag value, the two-tuple sample Is FN. If the label value is a negative sample label value, the binary sample is TN.
  • the true case rate represents the possibility that the prediction is positive and the prediction is correct
  • the false case rate represents the possibility that the prediction is positive but the prediction is wrong.
  • the predicted value of each two-tuple sample can be used as a threshold, and both the true case rate and the false positive case rate under the threshold can be calculated. For example, if the number of two-tuple samples is M, there are M predicted values, the M-predicted values are arranged in order of size, and each predicted value is used as a threshold.
  • the binary sample When the predicted value of the two-tuple sample is When the threshold value is greater than or equal to this threshold, the binary sample is predicted to be a positive sample, and when the predicted value of the binary sample is less than the threshold, the binary sample is predicted to be a negative sample.
  • a threshold is selected, a set of TPR and FPR can be obtained, that is, a point on the ROC curve. In this embodiment, a total of M sets of TPR and FPR are obtained, and a ROC curve can be drawn according to these M points.
  • the AUC curve (Area, Under ROC, Curve) is the area under the ROC curve, which can reflect the classification performance expressed by the ROC curve. Generally speaking, the larger the AUC value, the better the classification effect. Therefore, the effect of the unsupervised model can be evaluated by AUC. It should be understood that, in addition to using AUC to evaluate the effect of the unsupervised model, other methods can also be used, such as F1 value, etc., which is not limited here.
  • the node vector obtained through the unsupervised model is taken as an example to describe the unsupervised model evaluation method provided in the embodiment of the present specification.
  • the node set calculated by the unsupervised model is V
  • the node vector set corresponding to the node set is E
  • the number of samples of the node is S
  • the verification set T is initially an empty set.
  • a neighbor node and N non-neighbor nodes are determined to construct a target verification set.
  • Step 1 Obtain node set V, node vector set E, and target verification set The number of samples S at the node;
  • Step 2 Set the number of cycles i, i from 1 to S. During each cycle, perform the following steps:
  • the default Neighbor generation algorithm to generate a set of neighbor nodes v i and v i non-neighbor node set, and randomly sampled from a neighbor node in the neighbor node set, i.e. p i;
  • the query vector obtained from the set E of node v i, and N i is the vector of each node, and then calculates the similarity between the vectors v i and N i in each node vectors, the resulting degree of similarity of N
  • the value is recorded as the set sam vn ;
  • Step 3 Each two-tuple in the set T can be regarded as consisting of a label value and a predicted value, so the binary classification index such as AUC, F1 value, etc. can be used to evaluate the set T.
  • the evaluation result can be used to measure the unsupervised An indicator of the learning effect of the model.
  • an embodiment of the present specification provides an unsupervised model evaluation device. Please refer to FIG. 3, including:
  • the first determining module 31 is configured to determine S node vectors corresponding to the S nodes among the N node vectors obtained through the unsupervised model, where N and S are positive integers;
  • a second determining module 32 configured to determine a neighbor node of each of the S nodes and a non-neighbor node of each node;
  • the positive sample prediction value determining module 33 is configured to determine a neighbor node of each node according to a node vector of a neighbor node of each node and a node vector of each node. The similarity of each node is used as a predicted value of a positive sample;
  • the negative sample prediction value determining module 34 is configured to determine the non-neighbor nodes of each node according to the node vector of the non-neighbor nodes of each node and the node vector of each node. The similarity between the point and each node is used as the predicted value of the negative sample;
  • An evaluation module 35 is configured to evaluate the unsupervised model according to the predicted value of the positive sample and the predicted value of the negative sample.
  • the evaluation module 35 is configured to:
  • the unsupervised model is evaluated based on the first type of binary set samples and the second type of binary set samples.
  • the evaluation module 35 is configured to:
  • the area under the target subject's working characteristic curve is acquired to evaluate the unsupervised model.
  • the second determining module 32 is configured to:
  • One or more non-neighbor nodes are determined in the non-neighbor node set.
  • the first determining module 31 is configured to:
  • a node set acquisition module configured to determine a node set, where the node set includes N nodes, and the N nodes correspond to the N node vectors;
  • a sampling node acquisition module is configured to randomly select a node in the node set as a sampling node, and select a total of S times to obtain the S nodes.
  • the present invention also provides a server, as shown in FIG. 4, including a memory 404, a processor 402, and a memory 404 stored in A computer program running on a processor 402 that, when executed by the processor 402, implements the steps of any of the unsupervised model evaluation methods described above.
  • the bus architecture (represented by the bus 400).
  • the bus 400 may include any number of interconnected buses and bridges.
  • the bus 400 will include one or more processors represented by the processor 402 and memory 404.
  • the various circuits of the memory are linked together.
  • the bus 400 can also link various other circuits such as peripheral devices, voltage regulators, and power management circuits, which are well known in the art, and therefore, they will not be further described herein.
  • the bus interface 406 provides an interface between the bus 400 and the receiver 401 and the transmitter 403.
  • the receiver 401 and the transmitter 403 may be the same element, that is, a transceiver, providing a unit for communicating with various other devices on a transmission medium.
  • the processor 402 is responsible for managing the bus 400 and general processing, and the memory 404 may be used to store data used by the processor 402 when performing operations.
  • the present invention also provides a computer-readable storage medium having a computer program stored thereon, which is implemented when the program is executed by a processor Steps of any method based on an unsupervised model evaluation method.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a particular manner such that the instructions stored in the computer-readable memory produce a manufactured article including the instruction device, the instructions
  • the device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开一种无监督模型评估方法、装置、服务器及可读存储介质,在所述无监督模型评估方法中,在经过无监督模型得到的N个结点向量中确定出与S个结点对应的S个结点向量,确定所述S个结点中每个结点的邻居结点,以及所述每个结点的非邻居结点,将每个结点与邻居结点的向量相似度作为正样本,以及将每个结点与非邻居结点的向量相似度作为负样本,通过对正样本以及负样本进行评估来确定无监督模型的效果,不需要进行额外的模型训练,使无监督模型的评估过程更加容易实现。

Description

无监督模型评估方法、装置、服务器及可读存储介质 技术领域
本发明涉及计算机技术领域,尤其涉及一种无监督模型评估方法、装置、服务器及可读存储介质。
背景技术
随着科学技术的不断发展,无监督学习算法得到了广泛应用。无监督学习模型可以对图数据进行计算,将图数据中的每个结点都映射到一个向量空间中,得到计算后的结点向量。在对计算后得到的结点向量进行评估时时,通常采用有监督算法评估,再训练一个机器学习模型。但训练机器学习模型需要耗费更多的数据和时间。
发明内容
本说明书实施例提供及一种无监督模型评估方法、装置、服务器及计算机可读存储介质。
第一方面,本说明书实施例提供一种无监督模型评估方法,包括:
在经过无监督模型得到的N个结点向量中确定出与S个结点对应的S个结点向量;
确定所述S个结点中每个结点的邻居结点,以及所述每个结点的非邻居结点;
根据所述每个结点的邻居结点的结点向量,以及所述每个结点的结点向量,确定所述每个结点的邻居结点与所述每个结点的相似度作为正样本的预测值;
根据所述每个结点的非邻居结点的结点向量,以及所述每个结点的结点向量,确定所述每个结点的非邻居结点与所述每个结点的相似度作为负样本的预测值;
根据所述正样本的预测值以及所述负样本的预测值,对所述无监督模型进行评估。
第二方面,本说明书实施例提供一种无监督模型评估装置,包括:
第一确定模块,用于在经过无监督模型得到的N个结点向量中确定出与S个结点对应的S个结点向量,N和S均为正整数,N大于等于S;
第二确定模块,用于确定所述S个结点中每个结点的邻居结点,以及所述每个结点的非邻居结点;
正样本预测值确定模块,用于根据所述每个结点的邻居结点的结点向量,以及所述每个结点的结点向量,确定所述每个结点的邻居结点与所述每个结点的相似度作为正样本的预测值;
负样本预测值确定模块,用于根据所述每个结点的非邻居结点的结点向量,以及所述每个结点的结点向量,确定所述每个结点的非邻居结点与所述每个结点的相似度作为负样本的预测值;
评估模块,用于根据所述正样本的预测值以及所述负样本的预测值,对所述无监督模型进行评估。第三方面,本说明书实施例提供一种服务器,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行上述任一项所述方法的步骤。
第四方面,本说明书实施例提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述任一项所述方法的步骤。
本说明书实施例有益效果如下:
在本说明书实施例提供的无监督模型评估方法中,在经过无监督模型得到的N个结点向量中确定出与S个结点对应的S个结点向量,N和S均为正整数;确定所述S个结点中每个结点的邻居结点,以及所述每个结点的非邻居结点;根据所述每个结点的邻居结点的结点向量,以及所述每个结点的结点向量,确定所述每个结点的邻居结点与所述每个结点的相似度作为正样本的预测值;根据所述每个结点的非邻居结点的结点向量,以及所述每个结点的结点向量,确定所述每个结点的非邻居结点与所述每个结点的相似度作为负样本的预测值;根据所述正样本的预测值以及所述负样本的预测值,对所述无监督模型进行评估。本说明书实施例中的方案,将每个结点与邻居结点的相似度作为正样本,以及将每个结点与非邻居结点的相似度作为负样本,通过对正样本以及负样本进行评估来确定无监督模型的效果,不需要进行额外的模型训练,使无监督模型的评估过程更加容易实现。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1为本说明书实施例第一方面提供的一种无监督模型评估方法的流程图;
图2为本说明书实施例示出的根据二元组样本对无监督模型进行评估的方法流程图;
图3为本说明书实施例第二方面提供的无监督模型评估装置的示意图;
图4为本说明书实施例第三方面提供的服务器的示意图。
具体实施方式
为了更好的理解上述技术方案,下面通过附图以及具体实施例对本说明书实施例的技术方案做详细的说明,应当理解本说明书实施例以及实施例中的具体特征是对本说明书实施例技术方案的详细的说明,而不是对本说明书技术方案的限定,在不冲突的情况下,本说明书实施例以及实施例中的技术特征可以相互组合。
第一方面,本说明书实施例提供一种无监督模型评估方法,如图1所示,为本说明书实施例提供的无监督模型评估方法的流程图,该方法包括以下步骤:
步骤S11:在经过无监督模型得到的N个结点向量中确定出与S个结点对应的S个结点向量,N和S均为正整数;
本说明书实施例中,N个结点向量可以是无监督模型对N个输入数据进行计算得到的。其中,可以将N个输入数据看作为N个结点。输入数据可以是根据实际情况确定的数据,在一个实施例中,输入数据可以是一定数量的用于表示词语的向量,可以将每个词语看作是一个结点,每个词语可以用一个初始向量来表示,将词语的初始向量作为无监督模型的输入,通过无监督模型的计算,输出与每个结点对应的embedding(嵌入向量),即输出的结点向量。应理解的是,embedding可以看作是将结点由原始空间映射到另一个空间中,并且在另一个空间内保持结点在原始空间的结构信息和距离信息。
无监督模型可以根据实际需要进行选择,在一个实施例中,无监督模型可以为Word2Vec模型,在对词语进行处理时,通过Word2Vec模型将每个词语映射到一个低维向量空间,得到与每个词语结点对应的结点向量。
应理解的是,在无监督模型的计算过程中,最后输出的结点向量可以由损失函数来确定。例如,NCE(Noise Contrastive Estimation,噪音对比估计)损失函数,通过NCE损失函数得到的结点向量能够使该结点具有与其邻居结点在向量空间中的距离更近,与非邻居结点距离更远的特点。
在确定S个结点时,可以在N个结点中选择S个不同的结点,也可以在N个结点中随机选取S个结点。S个结点可以是一次性选取的,也可以是多次选取的,这里不做限定。
步骤S12:确定所述S个结点中每个结点的邻居结点,以及所述每个结点的非邻居结点;
本说明书实施例中,针对S个结点的每个结点来说,该结点可以包括多个邻居结点以及多个非邻居结点,可以在多个邻居结点中随机选出任意个数的邻居结点以及在多个非邻居结点中随机选出任意个数的非邻居结点。或者,可以设定选出的邻居结点的个数以及设定选出非邻居结点的个数。这里不做限定。
一个结点的邻居结点可以根据预设的邻居结点生成规则进行生成,也可以将N个结点根据结点之间的关系构成一个图,在构成的图中进行邻居结点的选择。在一个实施例中,可以根据各个结点之间的关系进行边连接,形成包括有结点之间连接关系的图结构,在该图结构中查找S个结点中每个结点的邻居结点和非邻居结点。在另一个实施例中,可以根据预设的邻居生成算法生成每个结点的邻居结点。当然,还可以采用其他的方式来确定邻居结点,本说明书实施例不做限定。非邻居结点的确定与邻居结点的确定类似,这里就不再赘述了。
步骤S13:根据所述每个结点的邻居结点的结点向量,以及所述每个结点的结点向量,确定所述每个结点的邻居结点与所述每个结点的相似度作为正样本的预测值;
步骤S14:根据所述每个结点的非邻居结点的结点向量,以及所述每个结点的结点向量,确定所述每个结点的非邻居结点与所述每个结点的相似度作为负样本的预测值;
本说明书实施例中,在确定了每个结点的邻居结点以及非邻居结点之后,可以将邻居结点作为正样本,将非邻居结点作为负样本。针对每个结点,分别计算该结点与各个邻居结点之间的相似度,作为邻居结点的预测值,以及分别计算该结点与各个非邻居结点之间的相似度,作为非邻居结点的预测值。相似度的计算可以根据实际需要来选择,在一个实施例中,可以通过计算向量之间的内积,并对内积做归一化处理得到。
应理解的是,相似度可以是0到1之间的数,如果两个结点之间的相似度越接近1,表明两个结点互为邻居结点的概率越大。
步骤S15:根据所述正样本的预测值以及所述负样本的预测值,对所述无监督模型进行评估。
本说明书实施例中,邻居结点的预测值越接近1,非邻居结点的预测值越接近0则表示无监督模型的效果越好。反之,邻居结点即正样本的预测值越接近0,非邻居结点即负样本的预测值越接近1,无监督模型的效果越差。因此,可以通过正样本的预测值以及负样本的预测值来评估无监督模型的好坏,这种方式更加通用以及易实现。
可选地,所述在经过无监督模型得到的N个结点向量中确定出与S个结点对应的S个结点向量,包括:确定结点集合,所述结点集合包含N个结点,所述N个结点与所述N个结点向量相对应;在所述结点集合中随机选取一个结点作为采样结点,共计选取S次,获得所述S个结点。
本说明书实施例中,可以将经过无监督模型计算得到的N个结点向量对应的N个结点构成结点集合。在选择S个结点时,可以在结点集合中有放回的选取S次,每次在N个结点中随机选取一个作为采样结点。
可选地,所述确定所述S个结点中每个结点的邻居结点,以及所述每个结点的非邻居结点,包括:针对所述S个结点中的每个结点,执行以下步骤:根据预设邻居生成算法,在N个结点中确定该结点的邻居结点集合,以及该结点的非邻居结点集合;在所述邻居结点集合中确定一个或多个邻居结点;在所述非邻居结点集合中确定一个或多个非邻居结点。
本说明书实施例中,针对S个结点中的每个结点,都可以采用预设邻居生成算法来获取该结点的邻居结点,预设邻居生成算法可以根据实际需要进行选择,如一阶邻居算法、随机游走邻居算法等。在一个实施例中,为了保证生成的邻居结点能够与无监督模型的计算过程中采用的结点关系保持一致,预设邻居生成算法与无监督模型计算过程中采用的邻居生成算法相同。
在一个实施例中,当采用预设邻居生成算法生成一个结点的邻居结点集合时,那么非邻居结点可以通过在结点集合中减去邻居结点集合得到。应理解的是,对于每个结点来说,对应的邻居结点集合所包含的元素个数可能不同,例如,第一结点拥有5个邻居结点,第二结点拥有8个邻居结点。在进行邻居结点的选择时,邻居结点的个数可以是固定的,也可以是随机的。
仍以上面第一结点和第二结点为例,在一个实施例中,针对每个结点选取相同个数的邻居结点,例如,均选取1个,则可以在第一结点的5个邻居结点中随机选出1个邻居结点,在第二结点的8个邻居结点中随机选出1个邻居结点。在另一个实施例中,针 对每个结点随机确定邻居结点的个数,如在第一结点中的5个邻居结点中随机选取2个邻居结点,在第二结点中的8个邻居结点中随机选取3个邻居结点等。当然,针对每个结点,不论使用哪种方式确定邻居结点,确定的邻居结点个数均小于或等于该结点实际对应的邻居结点的总数。
对于每个结点对应的非邻居结点,确定方式与上述邻居结点的确定方式类似,非邻居结点的数量可以根据实际需要进行设定,在一个实施例中,在每个非邻居结点集合中可以随机选取4~20个结点作为该结点的非邻居结点。非邻居结点的确定方式请参考上述确定邻居结点的描述,这里就不再赘述了。
在一种可选的实现方式中,所述根据所述正样本的预测值以及所述负样本的预测值,对所述无监督模型进行评估,包括:根据所述正样本的预测值,以及正样本标签值,构建第一类二元组样本;根据所述负样本的预测值,以及负样本标签值,构建第二类二元组样本;根据所述第一类二元组样本以及所述第二类二元组样本,对所述无监督模型进行评估。
本说明书实施例中,可以构建一个验证集,通过验证集来确定评估无监督模型的效果。在一个实施例中,验证集初始可以设置为空集,在根据正样本和负样本构建二元组样本后,将二元组样本添加至初始的空集中,得到最终的验证集。
为了对邻居结点和非邻居结点进行区分,本说明书实施例中可以通过设置正样本标签值以及负样本标签值来实现。在一个实施例中,正样本标签值设置为1,负样本标签值为0,那么对于结点的一个邻居结点来说,可以根据邻居结点的正样本的预测值以及正样本标签值来确定一个第一类二元组样本,例如,一个邻居结点的正样本的预测值为0.9,则对应的第一类二元组样本为(1,0.9)。对于结点的一个非邻居结点来说,可以根据非邻居结点的负样本的预测值以及负样本标签值来确定一个第二类二元组样本,例如,一个非邻居结点的负样本的预测值为0.3,则对应的第二类二元组样本为(0,0.3)。
根据上述构建的验证集,可以对无监督模型进行评估,如图2所示,为本说明书实施例中根据第一类二元组样本以及所述第二类二元组样本,对所述无监督模型进行评估的方法流程图,该方法包括以下步骤:
步骤S21:根据所述第一类二元组样本以及所述第二类二元组样本,确定目标受试者工作特征曲线;
步骤S22:获取所述目标受试者工作特征曲线下的面积,以评估所述无监督模型。
在该实施例中,可以对第一类二元组样本以及第二类二元组样本组成的验证集进行二分类,并通过确定目标受试者工作特征曲线下的面积来评价二分类的优劣。
应理解的是,在该实施例中,预测值可以用来表示两个结点互为邻居结点的概率。在绘制目标受试者工作特征(Receiver Operating Characteristic,ROC)曲线时,可以根据正样本的预测值以及负样本的预测值按照预测值的大小进行排序。通过选取不同的阈值来对排序好的样本进行二分类,即,预测值大于或等于阈值的二元组样本预测为正样本,预测值小于该阈值的二元组样本预测为负样本。再根据二元组样本中实际对应的标签值来计算该阈值下的真正例率(True Positive Rate,TPR)和假正例率(False Positive Rate,FPR)。
在计算真正例率,假正例率时会涉及到以下四种情况:真正例(True positive,TP),假正例(False positive,FP),假反例(False negative,FN),真反例(True negative,TN)。其中,TP是指预测是正样本,且预测对了,FP是指预测是正样本,且预测错了,FN是指预测是负样本,且预测错了,TN是指预测是负样本,且预测对了。以上述二元组样本为例,如果一个二元组样本,预测值大于或等于阈值,则表明该二元组样本被预测是正样本,再查看该二元组样本的标签值,如果是正样本标签值,则表明该二元组样本为TP,如果标签值是负样本标签,则表明该二元组样本为FP。如果一个二元组样本,预测值小于阈值,则表明该二元组样本被预测是负样本,再查看该二元组样本的标签值,如果标签值是正样本标签值,则该二元组样本为FN,如果标签值是负样本标签值,则该二元组样本为TN。
计算真正例率TPR=TP/(TP+FN),以及假正例率FPR=FP/(TN+FP)。其中,真正例率代表预测为正样本且预测对了的可能性,假正例率代表预测为正样本但是预测错了的可能性。
将真正例率作为纵轴,假正例率作为横轴,得到一个构成ROC曲线的坐标。在一个实施例中,可以将每个二元组样本的预测值作为一个阈值,均计算该阈值下的真正例率以及假正例率。例如,如果二元组样本的个数为M,则对应有M个预测值,将M个预测值按照大小顺序进行排列,并将每个预测值作为一个阈值,当二元组样本的预测值大于或等于这个阈值时,预测该二元组样本为正样本,当二元组样本的预测值小于这个阈值时,预测该二元组样本为负样本。每选取一个阈值,就可以得到一组TPR和FPR,即ROC曲线上的一点。在该实施例中,一共得到M组TPR和FPR,根据这M个点可以绘制ROC曲线。
AUC曲线(Area Under ROC Curve)为ROC曲线下的面积,能够反映ROC曲线表达的分类能,。通常来讲,AUC的值越大,表明分类的效果越好,因此可以通过AUC来评估无监督模型的效果。应理解的是,除了使用AUC来评估无监督模型的效果,还可以使用其他方法,例如F1值等,这里不做限定。
为了更好的理解本说明书实施例提供的无监督模型评估方法,下面以经过无监督模型得到的结点向量为例,来对本说明书实施例提供的无监督模型评估方法来进行说明。在该实施例中,经过无监督模型计算得到的结点集合为V,与结点集合对应的结点向量集合为E,结点的采样数为S,验证集T初始为空集,针对S个结点中的每个结点,均确定1个邻居结点以及N个非邻居结点构建目标验证集。
步骤一,获取结点集合V,结点向量集合E,目标验证集
Figure PCTCN2019099668-appb-000001
结点的采样数S;
步骤二,设置循环次数i,i从1到S,在每次循环过程中,执行以下步骤:
从集合V中随机采样出一个结点,记为v i
根据预设邻居生成算法,生成v i的邻居结点集合以及v i的非邻居结点集合,并从邻居结点集合中随机采样1个邻居结点,即为p i
从集合E中查询得到v i的向量,以及p i的向量,并计算两个向量之间的相似度,即为sam vp
构建二元组样本(1,sam vp),加入验证集T;
从非邻居结点集合中随机采样N个非邻居结点,记为集合N i
从集合E中查询得到结点v i的向量,以及N i中各结点的向量,然后计算v i的向量与N i中各结点向量之间的相似度,将得到的N个相似度值记为集合sam vn
对于集合sam vn中的各元素sam vnj,构造N个二元组(0,sam vnj),加入验证集T,其中,j的值为1~N。
步骤三:集合T中的每个二元组可视为由标签值和预测值构成,因此可使用二分类指标,如AUC、F1值等对集合T进行评估,评估结果可以作为衡量该无监督模型的学习效果的指标。
第二方面,基于同一发明构思,本说明书实施例提供一种无监督模型评估装置,请参考图3,包括:
第一确定模块31,用于在经过无监督模型得到的N个结点向量中确定出与S个结点对应的S个结点向量,N和S均为正整数;
第二确定模块32,用于确定所述S个结点中每个结点的邻居结点,以及所述每个结点的非邻居结点;
正样本预测值确定模块33,用于根据所述每个结点的邻居结点的结点向量,以及所述每个结点的结点向量,确定所述每个结点的邻居结点与所述每个结点的相似度作为正样本的预测值;
负样本预测值确定模块34,用于根据所述每个结点的非邻居结点的结点向量,以及所述每个结点的结点向量,确定所述每个结点的非邻居结点与所述每个结点的相似度作为负样本的预测值;
评估模块35,用于根据所述正样本的预测值以及所述负样本的预测值,对所述无监督模型进行评估。
在一种可选实现方式中,评估模块35,用于:
根据所述正样本的预测值,以及正样本标签值,构建第一类二元组样本;
根据所述负样本的预测值,以及负样本标签值,构建第二类二元组样本;
根据所述第一类二元组样本以及所述第二类二元组样本,对所述无监督模型进行评估。
在一种可选实现方式中,评估模块35,用于:
根据所述第一类二元组样本以及所述第二类二元组样本,确定目标受试者工作特征曲线;
获取所述目标受试者工作特征曲线下的面积,以评估所述无监督模型。
在一种可选实现方式中,第二确定模块32,用于:
针对所述S个结点中的每个结点,执行以下步骤:
根据预设邻居生成算法,在N个结点中确定该结点的邻居结点集合,以及该结点的非邻居结点集合;
在所述邻居结点集合中确定一个或多个邻居结点;
在所述非邻居结点集合中确定一个或多个非邻居结点。
在一种可选实现方式中,第一确定模块31用于:
结点集合获取模块,用于确定结点集合,所述结点集合包含N个结点,所述N个结点与所述N个结点向量相对应;
采样结点获取模块,用于在所述结点集合中随机选取一个结点作为采样结点,共计选取S次,获得所述S个结点。
关于上述装置,其中各个模块的具体功能已经在本发明实施例提供的无监督模型评估方法的实施例中进行了详细描述,此处将不做详细阐述说明。
第三方面,基于与前述实施例中无监督模型评估方法同样的发明构思,本发明还提供一种服务器,如图4所示,包括存储器404、处理器402及存储在存储器404上并可在处理器402上运行的计算机程序,所述处理器402执行所述程序时实现前文所述无监督模型评估方法的任一方法的步骤。
其中,在图4中,总线架构(用总线400来代表),总线400可以包括任意数量的互联的总线和桥,总线400将包括由处理器402代表的一个或多个处理器和存储器404代表的存储器的各种电路链接在一起。总线400还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口406在总线400和接收器401和发送器403之间提供接口。接收器401和发送器403可以是同一个元件,即收发机,提供用于在传输介质上与各种其他装置通信的单元。处理器402负责管理总线400和通常的处理,而存储器404可以被用于存储处理器402在执行操作时所使用的数据。
第四方面,基于与前述实施例中基于无监督模型评估方法的发明构思,本发明还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前文所述基于无监督模型评估方法的任一方法的步骤。
本说明书是参照根据本说明书实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执 行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的设备。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令设备的制造品,该指令设备实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (12)

  1. 一种无监督模型评估方法,所述方法包括:
    在经过无监督模型得到的N个结点向量中确定出与S个结点对应的S个结点向量,N和S均为正整数;
    确定所述S个结点中每个结点的邻居结点,以及所述每个结点的非邻居结点;
    根据所述每个结点的邻居结点的结点向量,以及所述每个结点的结点向量,确定所述每个结点的邻居结点与所述每个结点的相似度作为正样本的预测值;
    根据所述每个结点的非邻居结点的结点向量,以及所述每个结点的结点向量,确定所述每个结点的非邻居结点与所述每个结点的相似度作为负样本的预测值;
    根据所述正样本的预测值以及所述负样本的预测值,对所述无监督模型进行评估。
  2. 根据权利要求1所述的无监督模型评估方法,所述根据所述正样本的预测值以及所述负样本的预测值,对所述无监督模型进行评估,包括:
    根据所述正样本的预测值,以及正样本标签值,构建第一类二元组样本;
    根据所述负样本的预测值,以及负样本标签值,构建第二类二元组样本;
    根据所述第一类二元组样本以及所述第二类二元组样本,对所述无监督模型进行评估。
  3. 根据权利要求2所述的无监督模型评估方法,所述根据所述第一类二元组样本以及所述第二类二元组样本,对所述无监督模型进行评估,包括:
    根据所述第一类二元组样本以及所述第二类二元组样本,确定目标受试者工作特征曲线;
    获取所述目标受试者工作特征曲线下的面积,以评估所述无监督模型。
  4. 根据权利要求1所述的无监督模型评估方法,所述确定所述S个结点中每个结点的邻居结点,以及所述每个结点的非邻居结点,包括:
    针对所述S个结点中的每个结点,执行以下步骤:
    根据预设邻居生成算法,在N个结点中确定该结点的邻居结点集合,以及该结点的非邻居结点集合;
    在所述邻居结点集合中确定一个或多个邻居结点;
    在所述非邻居结点集合中确定一个或多个非邻居结点。
  5. 根据权利要求1所述的无监督模型评估方法,所述在经过无监督模型得到的N个结点向量中确定出与S个结点对应的S个结点向量,包括:
    确定结点集合,所述结点集合包含N个结点,所述N个结点与所述N个结点向量 相对应;
    在所述结点集合中随机选取一个结点作为采样结点,共计选取S次,获得所述S个结点。
  6. 一种无监督模型评估装置,所述装置包括:
    第一确定模块,用于在经过无监督模型得到的N个结点向量中确定出与S个结点对应的S个结点向量,N和S均为正整数;
    第二确定模块,用于确定所述S个结点中每个结点的邻居结点,以及所述每个结点的非邻居结点;
    正样本预测值确定模块,用于根据所述每个结点的邻居结点的结点向量,以及所述每个结点的结点向量,确定所述每个结点的邻居结点与所述每个结点的相似度作为正样本的预测值;
    负样本预测值确定模块,用于根据所述每个结点的非邻居结点的结点向量,以及所述每个结点的结点向量,确定所述每个结点的非邻居结点与所述每个结点的相似度作为负样本的预测值;
    评估模块,用于根据所述正样本的预测值以及所述负样本的预测值,对所述无监督模型进行评估。
  7. 根据权利要求6所述的无监督模型评估装置,所述评估模块,用于:
    根据所述正样本的预测值,以及正样本标签值,构建第一类二元组样本;
    根据所述负样本的预测值,以及负样本标签值,构建第二类二元组样本;
    根据所述第一类二元组样本以及所述第二类二元组样本,对所述无监督模型进行评估。
  8. 根据权利要求7所述的无监督模型评估装置,所述评估模块,用于:
    根据所述第一类二元组样本以及所述第二类二元组样本,确定目标受试者工作特征曲线;
    获取所述目标受试者工作特征曲线下的面积,以评估所述无监督模型。
  9. 根据权利要求6所述的无监督模型评估装置,所述第二确定模块,用于:
    针对所述S个结点中的每个结点,执行以下步骤:
    根据预设邻居生成算法,在N个结点中确定该结点的邻居结点集合,以及该结点的非邻居结点集合;
    在所述邻居结点集合中确定一个或多个邻居结点;
    在所述非邻居结点集合中确定一个或多个非邻居结点。
  10. 根据权利要求6所述的无监督模型评估装置,所述第一确定模块,用于:
    确定结点集合,所述结点集合包含N个结点,所述N个结点与所述N个结点向量相对应;
    在所述结点集合中随机选取一个结点作为采样结点,共计选取S次,获得所述S个结点。
  11. 一种服务器,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1-5任一项所述方法的步骤。
  12. 一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现权利要求1-5任一项所述方法的步骤。
PCT/CN2019/099668 2018-09-20 2019-08-07 无监督模型评估方法、装置、服务器及可读存储介质 WO2020057283A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
SG11202010227TA SG11202010227TA (en) 2018-09-20 2019-08-07 Unsupervised model evaluation method, apparatus, server, and computer-readable storage medium
US17/086,120 US10997528B2 (en) 2018-09-20 2020-10-30 Unsupervised model evaluation method, apparatus, server, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811101769.7A CN109615080B (zh) 2018-09-20 2018-09-20 无监督模型评估方法、装置、服务器及可读存储介质
CN201811101769.7 2018-09-20

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/086,120 Continuation US10997528B2 (en) 2018-09-20 2020-10-30 Unsupervised model evaluation method, apparatus, server, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2020057283A1 true WO2020057283A1 (zh) 2020-03-26

Family

ID=66002678

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/099668 WO2020057283A1 (zh) 2018-09-20 2019-08-07 无监督模型评估方法、装置、服务器及可读存储介质

Country Status (5)

Country Link
US (1) US10997528B2 (zh)
CN (1) CN109615080B (zh)
SG (1) SG11202010227TA (zh)
TW (1) TWI710970B (zh)
WO (1) WO2020057283A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112508115A (zh) * 2020-12-15 2021-03-16 北京百度网讯科技有限公司 建立节点表示模型的方法、装置、设备和计算机存储介质
CN115329063A (zh) * 2022-10-18 2022-11-11 江西电信信息产业有限公司 一种用户的意图识别方法及系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109615080B (zh) 2018-09-20 2020-05-26 阿里巴巴集团控股有限公司 无监督模型评估方法、装置、服务器及可读存储介质
CN112597209B (zh) * 2020-12-15 2024-07-26 深圳前海微众银行股份有限公司 数据的验证方法、装置、系统及计算机可读存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140025689A1 (en) * 2012-04-24 2014-01-23 International Business Machines Corporation Determining a similarity between graphs
CN107291803A (zh) * 2017-05-15 2017-10-24 广东工业大学 一种融合多类型信息的网络表示方法
CN109615080A (zh) * 2018-09-20 2019-04-12 阿里巴巴集团控股有限公司 无监督模型评估方法、装置、服务器及可读存储介质

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9017600D0 (en) * 1990-08-10 1990-09-26 British Aerospace An assembly and method for binary tree-searched vector quanisation data compression processing
US5325298A (en) * 1990-11-07 1994-06-28 Hnc, Inc. Methods for generating or revising context vectors for a plurality of word stems
US5317507A (en) * 1990-11-07 1994-05-31 Gallant Stephen I Method for document retrieval and for word sense disambiguation using neural networks
US6360227B1 (en) * 1999-01-29 2002-03-19 International Business Machines Corporation System and method for generating taxonomies with applications to content-based recommendations
US7389225B1 (en) * 2000-10-18 2008-06-17 Novell, Inc. Method and mechanism for superpositioning state vectors in a semantic abstract
US7047193B1 (en) * 2002-09-13 2006-05-16 Apple Computer, Inc. Unsupervised data-driven pronunciation modeling
US7330440B1 (en) * 2003-05-20 2008-02-12 Cisco Technology, Inc. Method and apparatus for constructing a transition route in a data communications network
US8441919B2 (en) * 2006-01-18 2013-05-14 Cisco Technology, Inc. Dynamic protection against failure of a head-end node of one or more TE-LSPs
US7881223B2 (en) * 2006-03-31 2011-02-01 Panasonic Corporation Method for on demand distributed hash table update
US7881474B2 (en) * 2006-07-17 2011-02-01 Nortel Networks Limited System and method for secure wireless multi-hop network formation
US8396582B2 (en) * 2008-03-08 2013-03-12 Tokyo Electron Limited Method and apparatus for self-learning and self-improving a semiconductor manufacturing tool
EP2479938A4 (en) * 2009-09-14 2016-09-07 Nec Corp COMMUNICATION SYSTEM, FORWARDING NOTIFICATION, PATH MANAGEMENT SERVER, COMMUNICATION PROCESS AND PROGRAM
US9912523B2 (en) * 2015-01-20 2018-03-06 Telefonaktiebolaget Lm Ericsson (Publ) Systems and methods for multi-variate attribute correlation
WO2017018926A1 (en) * 2015-07-24 2017-02-02 Certis Cisco Security Pte Ltd System and method for high speed threat intelligence management using unsupervised machine learning and prioritization algorithms
CN106294621B (zh) * 2016-08-02 2019-11-12 西南石油大学 一种基于复杂网络结点相似性的计算事件相似性的方法和系统
CN107909119B (zh) * 2017-12-11 2020-05-19 深圳先进技术研究院 集合间相似度的确定方法和装置
CN108536784B (zh) * 2018-03-29 2021-08-24 阿里巴巴(中国)有限公司 评论信息情感分析方法、装置、计算机存储介质和服务器

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140025689A1 (en) * 2012-04-24 2014-01-23 International Business Machines Corporation Determining a similarity between graphs
CN107291803A (zh) * 2017-05-15 2017-10-24 广东工业大学 一种融合多类型信息的网络表示方法
CN109615080A (zh) * 2018-09-20 2019-04-12 阿里巴巴集团控股有限公司 无监督模型评估方法、装置、服务器及可读存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112508115A (zh) * 2020-12-15 2021-03-16 北京百度网讯科技有限公司 建立节点表示模型的方法、装置、设备和计算机存储介质
CN112508115B (zh) * 2020-12-15 2023-10-24 北京百度网讯科技有限公司 建立节点表示模型的方法、装置、设备和计算机存储介质
CN115329063A (zh) * 2022-10-18 2022-11-11 江西电信信息产业有限公司 一种用户的意图识别方法及系统

Also Published As

Publication number Publication date
TWI710970B (zh) 2020-11-21
CN109615080A (zh) 2019-04-12
TW202044110A (zh) 2020-12-01
US10997528B2 (en) 2021-05-04
CN109615080B (zh) 2020-05-26
US20210049513A1 (en) 2021-02-18
SG11202010227TA (en) 2020-11-27

Similar Documents

Publication Publication Date Title
WO2020057283A1 (zh) 无监督模型评估方法、装置、服务器及可读存储介质
US10031945B2 (en) Automated outlier detection
CN112487168B (zh) 知识图谱的语义问答方法、装置、计算机设备及存储介质
CN114494260B (zh) 对象缺陷检测方法、装置、计算机设备和存储介质
CN110824587B (zh) 图像预测方法、装置、计算机设备和存储介质
US9842279B2 (en) Data processing method for learning discriminator, and data processing apparatus therefor
CN113704082A (zh) 模型评测方法、装置、电子设备及存储介质
WO2018036547A1 (zh) 一种数据处理的方法以及装置
CN112086144B (zh) 分子生成方法、装置、电子设备及存储介质
CN115545300B (zh) 一种基于图神经网络进行用户行为预测的方法及装置
CN113065525A (zh) 年龄识别模型训练方法、人脸年龄识别方法及相关装置
US20220269718A1 (en) Method And Apparatus For Tracking Object
CN111310122A (zh) 一种模型的数据处理方法、电子设备及存储介质
CN107292320B (zh) 系统及其指标优化方法及装置
US11295229B1 (en) Scalable generation of multidimensional features for machine learning
CN107203916B (zh) 一种用户信用模型建立方法及装置
CN111126617B (zh) 一种选择融合模型权重参数的方法、装置及设备
CN115392404B (zh) 一种离群点检测模型训练、离群点检测方法及装置
CN109886299B (zh) 一种用户画像方法、装置、可读存储介质及终端设备
US20220351533A1 (en) Methods and systems for the automated quality assurance of annotated images
CN113869033B (zh) 融入迭代式句对关系预测的图神经网络句子排序方法
CN117009863A (zh) 免疫组库分类方法、装置、设备和存储介质
JP2020027604A (ja) 情報処理方法、及び情報処理システム
CN112597699B (zh) 一种融入客观赋权法的社交网络谣言源识别方法
US20200394044A1 (en) Method and apparatus for creating shared pipelines for data processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19861669

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19861669

Country of ref document: EP

Kind code of ref document: A1