WO2019007417A1 - 基于隐私保护的训练样本生成方法和装置 - Google Patents

基于隐私保护的训练样本生成方法和装置 Download PDF

Info

Publication number
WO2019007417A1
WO2019007417A1 PCT/CN2018/094786 CN2018094786W WO2019007417A1 WO 2019007417 A1 WO2019007417 A1 WO 2019007417A1 CN 2018094786 W CN2018094786 W CN 2018094786W WO 2019007417 A1 WO2019007417 A1 WO 2019007417A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
original
samples
training
dimensional
Prior art date
Application number
PCT/CN2018/094786
Other languages
English (en)
French (fr)
Inventor
王力
赵沛霖
周俊
李小龙
Original Assignee
阿里巴巴集团控股有限公司
王力
赵沛霖
周俊
李小龙
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 王力, 赵沛霖, 周俊, 李小龙 filed Critical 阿里巴巴集团控股有限公司
Priority to EP18828486.3A priority Critical patent/EP3644231A4/en
Priority to SG11201912390YA priority patent/SG11201912390YA/en
Publication of WO2019007417A1 publication Critical patent/WO2019007417A1/zh
Priority to US16/734,643 priority patent/US10878125B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present specification relates to the field of data processing technologies, and in particular, to a training sample generation method and apparatus based on privacy protection.
  • the data used for mining contains a lot of sensitive information, such as data from the financial industry, data from government departments, and so on. How to protect these sensitive information as privacy in the process of data mining has become an issue of increasing concern.
  • the present specification provides a training sample generation method based on privacy protection
  • the original data to be mined includes m original samples, each original sample includes a d-dimensional original vector x and an output tag value y, m, d are Natural number
  • the method includes:
  • each transformation vector ⁇ being determined by the sum of yx of a plurality of randomly selected original samples
  • the n conversion vectors ⁇ are used as training samples of the two-class model.
  • a privacy protection-based two-category model training method provided by the present specification includes:
  • each of the transformation vectors ⁇ is determined by a sum of yx of a plurality of randomly selected original samples, the original samples being one of m samples of the original data, each The original sample includes the d-dimensional original vector x and the output tag value y, m, d are natural numbers,
  • the two-category model is trained to obtain a result model.
  • the present specification also provides a training sample generating apparatus based on privacy protection.
  • the original data to be mined includes m original samples, and each original sample includes a d-dimensional original vector x and an output label value y, m and d are natural numbers.
  • the device includes:
  • a conversion vector generation unit for generating n d-dimensional transformation vectors ⁇ , each transformation vector ⁇ being determined by a sum of yx of a plurality of randomly selected original samples;
  • a training sample generating unit is configured to use the n conversion vectors ⁇ as training samples of the two-category model.
  • a privacy protection-based two-category model training device includes:
  • a training sample acquiring unit configured to acquire n d-dimensional transform vectors ⁇ as training samples; each of the transform vectors ⁇ is determined by a sum of yx of a plurality of randomly selected original samples, where the original samples are m of original data One of the samples, each original sample including the d-dimensional original vector x and the output tag value y, m, d are natural numbers,
  • the model training unit is configured to train the two-category model based on the training sample to obtain a result model.
  • the computer device includes: a memory and a processor; the memory stores a computer program executable by the processor; and when the processor runs the computer program, performs the above-mentioned privacy protection-based training sample generation Method described.
  • a computer device provided by the present specification includes: a memory and a processor; the memory stores a computer program executable by the processor; and when the processor runs the computer program, performs the above-mentioned privacy-based binary classification model The steps described in the training method.
  • the present specification provides a computer readable storage medium having stored thereon a computer program that, when executed by a processor, performs the steps described above in the privacy protection based training sample generation method.
  • the present specification provides a computer readable storage medium having stored thereon a computer program that, when executed by a processor, performs the steps described above in the privacy protection based model training method.
  • the original vector X and the output flag value y in the m original samples are used, and the sum of the randomly selected yx is used as the conversion vector, so that n conversion vectors are used to perform two.
  • the result model obtained by the classification model training is consistent with the training using the original data, and is not affected by the random quantity, and each conversion vector is generated by several original samples and random quantities, and it is extremely difficult to restore the original data from the conversion vector. Therefore, the embodiment of the present specification can provide good protection for the privacy information and obtain the mining result consistent with the original data.
  • FIG. 1 is a flowchart of a method for generating a training sample based on privacy protection in an embodiment of the present specification
  • FIG. 2 is a flowchart of a method for training a two-category model based on privacy protection in an embodiment of the present specification
  • FIG. 3 is a schematic flow chart of a data mining process in an application example of the present specification
  • FIG. 4 is a hardware structural diagram of a device running an embodiment of the present specification
  • FIG. 5 is a logical structural diagram of a training sample generating apparatus based on privacy protection in an embodiment of the present specification
  • FIG. 6 is a logical structural diagram of a model training device based on privacy protection in an embodiment of the present specification.
  • the embodiment of the present specification proposes a new privacy protection based training sample generation method and a new privacy protection based two classification model training method, from m (m is a natural number) d (d is a natural number) dimension of the original
  • the vector x and the output marker value y randomly generate n (n is a natural number) d-dimensional transformation vectors ⁇ , and the transformation vector is generated in such a way that the two-class model based on the loss function of the transformation vector is minimized, that is, based on the original vector sum
  • the model with the smallest loss function of the marker value is output, so that the result model obtained by the transformation vector training can be used as the data mining result of the original data.
  • the embodiments of the present specification can be run on any computing and storage device, such as a mobile phone, a tablet, a PC (Personal Computer), a notebook, a server, etc., or can be operated by two or more.
  • the logical nodes of the devices cooperate with each other to implement the functions in the embodiments of the present specification.
  • the original data is a training sample with an output tag value
  • the sample capacity is m, that is, m samples are included (the samples of the original data are referred to as original samples), and each original sample includes a d-dimensional original.
  • Vector x, and output tag value y Let i (i is a natural number from 1 to m) original samples, the original vector is x i , and the output flag value is y i .
  • the flow of the training sample generation method based on privacy protection is as shown in FIG. 1
  • the flow of the model training method based on privacy protection is as shown in FIG. 2 .
  • Step 110 generating n d-dimensional transformation vectors ⁇ , each transformation vector ⁇ being determined by the sum of yx of randomly selected 0 to m original samples.
  • 0 to m original samples are randomly selected, and the yx of each selected original sample is calculated, and the sum of these yx values is taken as a conversion vector ⁇ .
  • the number of original samples selected each time may be fixed or random, and is not limited.
  • each yx is a d-dimensional vector
  • the generated conversion vector ⁇ is also a d-dimensional vector.
  • the transformation vector can be generated like this:
  • the conversion vector ⁇ can be the sum of the yx of any 0 to m original samples.
  • i 1, 2, ..., m ⁇ .
  • Equation 4 the loss function of the binary classification algorithm based on the transformation vector is as shown in Equation 4:
  • the second example generate an m-dimensional vector w, randomly take 0 or 1 as the value of each dimension of w, As a conversion vector ⁇ .
  • w i is the i-th dimension of the vector w.
  • the conversion vector ⁇ can be the sum of the yx of any 0 to m original samples. In the second example, there is no limit to the value of y.
  • step 120 the n conversion vectors ⁇ are used as training samples for the two-category model.
  • step 210 obtaining n d-dimensional transformation vectors ⁇ as training samples; each transformation vector ⁇ is determined by the sum of yx of a plurality of randomly selected original samples, and the original samples are m samples of the original data.
  • each original sample includes a d-dimensional original vector x and an output flag value y.
  • the data provider outputs the training samples generated in step 120 to the data mining party.
  • the data mining party can obtain training samples from the data provider in any manner, and the embodiment of the present specification is not limited.
  • step 220 based on the training sample, the two classification model is trained to obtain a result model.
  • the data mining party trains the two classification models with the training samples. Since the output tag value in the original data is already reflected in the conversion vector ⁇ , and the training samples composed of the n conversion vectors ⁇ have no tag values, the unsupervised learning algorithm can be used for training to obtain the result model.
  • the embodiment of the present specification has no limitation on the two-class classification model, for example, Boosting algorithm, Stochastic gradient descent (SODG), SVRG (Stochastic variance reduced gradient), Adagrad (Adaptive Gradient) , adaptive gradients, and so on.
  • SODG Stochastic gradient descent
  • SVRG Stochastic variance reduced gradient
  • Adagrad Adaptive Gradient
  • the training method of training the training samples composed of n conversion vectors ⁇ by using a specific two-class model is the same as the prior art.
  • An example of training using the Boosting algorithm is given below.
  • Other algorithms can refer to the implementation and will not be described in detail.
  • Equation 8 The value of each dimension of the n-dimensional intermediate variable ⁇ (t+1) for the next iteration round is then calculated according to Equation 8:
  • Equation 8 j is each natural number from 1 to n.
  • Equation 9 ⁇ Tk is the kth dimension of the d-dimensional vector ⁇ T .
  • n d-dimensional conversion vectors ⁇ are randomly generated from m d-dimensional original vectors x and output flag values y, and each conversion vector ⁇ is determined by the sum of a plurality of randomly selected yx.
  • the data provider entrusts the data mining party to perform data mining of the classification rules, and the data mining party constructs the data classification rules based on the two-category linear model.
  • the original data of the data provider S ⁇ (x i , y i )
  • i 1, 2, ..., m ⁇ , where x i ⁇ R d (ie x i is a d-dimensional vector), y i ⁇ ⁇ 1, -1 ⁇ (ie, the value of the output flag y i is -1 or 1). Since the data provider's raw data contains sensitive information about the user, privacy protection is required.
  • Step 310 Obtain m samples of the original data.
  • n Rados (Rademacher Observations) are calculated using the raw data.
  • Each Rado is a d-dimensional vector, denoted as ⁇ ⁇ , which is the conversion vector in this application example.
  • each Rado is as follows: the generated m is the vector ⁇ , and the value of each dimension of ⁇ is -1 or 1, which is randomly determined; and Rado corresponding to the ⁇ is determined according to Equation 10:
  • n Rados By randomly generating n vectors ⁇ , n Rados can be obtained.
  • Steps 310 and 320 run on a device or logical node controlled by the data provider.
  • the data provider will generate n Rados as data to be mined and provide them to the data miner.
  • step 330 a Boosting algorithm is used, and n Rados are used as training samples to train the two-category linear model to obtain a result model.
  • Step 330 runs on a device or logical node controlled by the data provider.
  • the data provider generates a multi-classification rule based on the two-categorical linear result model obtained from the training and delivers it to the data provider.
  • the manner of converting several bi-class linear result models into multi-class rules can be implemented by referring to the prior art, and will not be described again.
  • the embodiment of the present specification further provides a privacy protection based training sample generating device and a privacy protection based two classification model training device.
  • All of the above devices may be implemented by software, or may be implemented by hardware or a combination of hardware and software.
  • the CPU Central Process Unit
  • the device in which the above device is located usually includes other hardware such as a chip for transmitting and receiving wireless signals, and/or is used to implement network communication functions. Other hardware such as boards.
  • FIG. 5 is a schematic diagram of a training sample generating apparatus based on privacy protection according to an embodiment of the present disclosure.
  • the original data to be mined includes m original samples, and each original sample includes a d-dimensional original vector x and an output flag value y.
  • m, d are natural numbers
  • the device comprises a transformation vector generation unit and a training sample generation unit, wherein: the transformation vector generation unit is configured to generate n d-dimensional transformation vectors ⁇ , each of which is randomly selected from a plurality of originals The sum of the yx of the samples is determined; the training sample generating unit is configured to use the n conversion vectors ⁇ as training samples of the two-class model.
  • the value of y is -v or v
  • v is a real number.
  • the conversion vector generating unit is specifically configured to: generate an m-dimensional vector ⁇ , and randomly take -v or v as the ⁇ for each dimension. Value, will As a transformation vector ⁇ , y i is the output marker value of the i-th original sample, x i is the original vector of the i-th original sample, ⁇ i is the ith dimension of the vector ⁇ ; repeating the above process n times to obtain n conversions Vector ⁇ .
  • the conversion vector generating unit is specifically configured to: generate an m-dimensional vector w, randomly use 0 or 1 as the value of each dimension of w, As a conversion vector ⁇ , w i is the i-th dimension of the vector w, y i is the output flag value of the i-th original sample, x i is the original vector of the i-th original sample; repeating the above process n times to obtain n conversions Vector ⁇ .
  • FIG. 6 is a schematic diagram of a privacy protection-based binary classification model training apparatus according to an embodiment of the present disclosure, including a training sample acquisition unit and a model training unit, wherein: the training sample acquisition unit is configured to acquire n d-dimensional transformation vectors ⁇ As a training sample; each of the conversion vectors ⁇ is determined by the sum of yx of a plurality of randomly selected original samples, the original samples being one of m samples of the original data, each original sample including a d-dimensional original vector x And the output marker values y, m, and d are natural numbers, and the model training unit is configured to perform training based on the training samples and the two classification models to obtain a result model.
  • the training sample acquisition unit is configured to acquire n d-dimensional transformation vectors ⁇ As a training sample
  • each of the conversion vectors ⁇ is determined by the sum of yx of a plurality of randomly selected original samples, the original samples being one of m samples of the original data, each original sample including a d-
  • the two-category model includes: an enhanced Boosting algorithm, a stochastic gradient descent SGD algorithm, a random variance reduction gradient SVRG algorithm, or an adaptive gradient Adagrad algorithm.
  • Embodiments of the present specification provide a computer device including a memory and a processor.
  • the computer stores a computer program executable by the processor; and when the processor runs the stored computer program, the processor performs the steps of the privacy protection-based training sample generation method in the embodiment of the present specification.
  • the steps of the privacy-protected training sample generation method please refer to the previous content, and will not be repeated.
  • Embodiments of the present specification provide a computer device including a memory and a processor.
  • the computer stores a computer program executable by the processor; and when the processor runs the stored computer program, the processor performs the steps of the privacy protection-based two-category model training method in the embodiment of the present specification.
  • the steps of the privacy protection-based two-category model training method please refer to the previous content, and will not be repeated.
  • Embodiments of the present specification provide a computer readable storage medium having stored thereon computer programs that, when executed by a processor, perform each of the privacy protection-based training sample generation methods in the embodiments of the present specification step.
  • a processor For a detailed description of the steps of the privacy-protected training sample generation method, please refer to the previous content, and will not be repeated.
  • Embodiments of the present specification provide a computer readable storage medium having stored thereon computer programs that, when executed by a processor, perform a privacy protection based two-category model training method in an embodiment of the present specification Each step.
  • a privacy protection based two-category model training method For a detailed description of the steps of the privacy protection-based two-category model training method, please refer to the previous content, and will not be repeated.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
  • embodiments of the present specification can be provided as a method, system, or computer program product.
  • embodiments of the present specification can take the form of an entirely hardware embodiment, an entirely software embodiment or a combination of software and hardware.
  • embodiments of the present specification can take the form of a computer program product embodied on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer usable program code embodied therein. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

本说明书提供一种基于隐私保护的训练样本生成方法,被挖掘的原始数据包括m个原始样本,每个原始样本包括d维的原始向量x和输出标记值y,m、d为自然数,所述方法包括:生成n个d维的转换向量π,每个转换向量π由随机选取的若干个原始样本的yx之和确定;将所述n个转换向量π作为二分类模型的训练样本。

Description

基于隐私保护的训练样本生成方法和装置 技术领域
本说明书涉及数据处理技术领域,尤其涉及一种基于隐私保护的训练样本生成方法和装置。
背景技术
随着互联网的发展和普及,各种基于网络进行的活动都在源源不断的产生数据,许多企业、政府甚至个人等都掌握着大量的用户数据。数据挖掘技术能够从大量的数据中发现有价值的知识、模式、规则等信息,为科学研究、商业决策、过程控制等提供辅助支持,成为数据利用的重要方式。
在一些应用场景中,用于挖掘的数据包含了很多敏感信息,例如金融行业的数据、政府部门的数据等。如何将这些敏感信息在数据挖掘的过程中作为隐私保护起来,成为一个越来越受人关注的问题。
发明内容
有鉴于此,本说明书提供一种基于隐私保护的训练样本生成方法,被挖掘的原始数据包括m个原始样本,每个原始样本包括d维的原始向量x和输出标记值y,m、d为自然数,所述方法包括:
生成n个d维的转换向量π,每个转换向量π由随机选取的若干个原始样本的yx之和确定;
将所述n个转换向量π作为二分类模型的训练样本。
本说明书提供的一种基于隐私保护的二分类模型训练方法,包括:
获取n个d维的转换向量π作为训练样本;每个所述转换向量π由随机选取的若干个原始样本的yx之和确定,所述原始样本为原始数据的m个样本之一,每个原始样本包括d维的原始向量x和输出标记值y,m、d为自然数,
基于所述训练样本,对二分类模型进行训练,得到结果模型。
本说明书还提供了一种基于隐私保护的训练样本生成装置,被挖掘的原始数据包括m个原始样本,每个原始样本包括d维的原始向量x和输出标记值y,m、d为自然数,所述装置包括:
转换向量生成单元,用于生成n个d维的转换向量π,每个转换向量π由随机选取的若干个原始样本的yx之和确定;
训练样本生成单元,用于将所述n个转换向量π作为二分类模型的训练样本。
本说明书提供的一种基于隐私保护的二分类模型训练装置,包括:
训练样本获取单元,用于获取n个d维的转换向量π作为训练样本;每个所述转换向量π由随机选取的若干个原始样本的yx之和确定,所述原始样本为原始数据的m个样本之一,每个原始样本包括d维的原始向量x和输出标记值y,m、d为自然数,
模型训练单元,用于基于所述训练样本,对二分类模型进行训练,得到结果模型。
本说明书提供的一种计算机设备,包括:存储器和处理器;所述存储器上存储有可由处理器运行的计算机程序;所述处理器运行所述计算机程序时,执行上述基于隐私保护的训练样本生成方法所述的步骤。
本说明书提供的一种计算机设备,包括:存储器和处理器;所述存储器上存储有可由处理器运行的计算机程序;所述处理器运行所述计算机程序时,执行上述基于隐私保护的二分类模型训练方法所述的步骤。
本说明书提供的一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器运行时,执行上述基于隐私保护的训练样本生成方法所述的步骤。
本说明书提供的一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器运行时,执行上述基于隐私保护的模型训练方法所述的步骤。
由以上技术方案可见,本说明书的实施例中,采用m个原始样本中的原始向量X和输出标记值y,将随机选取的若干个yx之和作为转换向量,使得采用n个转换向量进行二分类模型训练得到的结果模型、与采用原始数据进行训练相一致,不会受到随机量的影响,并且每个转换向量均由若干个原始样本和随机量生成,极难从转换向量还原出原始数据,因此本说明书的实施例既能为隐私信息提供良好的保护,又能得到与采用原始数据相一致的挖掘结果。
附图说明
图1是本说明书实施例中一种应用基于隐私保护的训练样本生成方法的流程图;
图2是本说明书实施例中一种基于隐私保护的二分类模型训练方法的流程图;
图3是本说明书应用示例中一种数据挖掘过程的流程示意图;
图4是运行本说明书实施例的设备的一种硬件结构图;
图5是本说明书实施例中一种基于隐私保护的训练样本生成装置的逻辑结构图;
图6是本说明书实施例中一种基于隐私保护的模型训练装置的逻辑结构图。
具体实施方式
本说明书的实施例提出一种新的基于隐私保护的训练样本生成方法和一种新的基于隐私保护的二分类模型训练方法,从m(m为自然数)个d(d为自然数)维的原始向量x和输出标记值y,随机生成n(n为自然数)个d维的转换向量π,转换向量的生成方式使得令基于转换向量的损失函数最小的二分类模型,即是令基于原始向量和输出标记值的损失函数最小的模型,从而可以将采用转换向量训练所得的结果模型作为原始数据的数据挖掘结果。
本说明书的实施例可以运行在任何具有计算和存储能力的设备上,如手机、平板电脑、PC(Personal Computer,个人电脑)、笔记本、服务器等设备;还可以由运行在两个或两个以上设备的逻辑节点,相互协同来实现本说明书实施例中的各项功能。
本说明书的实施例中,原始数据为带有输出标记值的训练样本,样本容量为m,即包括m个样本(将原始数据的样本称为原始样本),每个原始样本包括d维的原始向量x、以及输出标记值y。设第i(i为从1到m的自然数)个原始样本中,原始向量为x i,输出标记值为y i
本说明书的实施例中,基于隐私保护的训练样本生成方法的流程如图1所示,基于隐私保护的模型训练方法的流程如图2所示。
步骤110,生成n个d维的转换向量π,每个转换向量π由随机选取的0到m个原始样本的yx之和确定。
在m个原始样本中,随机选出0到m个原始样本,计算每个选出的原始样本的yx,将这些yx的和值作为一个转换向量π。每次选出的原始样本的数目可以是固定的,也可以是随机的,不做限定。
由于每个yx是一个d维的向量,所生成的转换向量π也是一个d维的向量。
生成转换向量π的具体方式可以有多种,本说明书的实施例不做限定,以下举两个例子进行说明。
第一个例子:在一些应用场景中,采用正负符号来作为二分类的输出标记值,即y的取值为-v或v(v为实数)。这种情形下,可以这样生成转换向量:
生成一个m维向量σ,随机将-v或v作为σ每一维的取值,将
Figure PCTCN2018094786-appb-000001
作为一个转换向量π;其中,σ i为向量σ的第i维;
重复上述过程n次得到n个转换向量π。
由于
Figure PCTCN2018094786-appb-000002
或者为0,或者为y i,转换向量π可以是任意的0到m个 原始样本的yx的和值。
在第一个例子中,设基于原始数据的线性模型为:
Y(x)=θ Tx   式1
式1中,θ T为d维的权重向量,则基于原始数据的二分类算法的损失函数如式2所示:
Figure PCTCN2018094786-appb-000003
式2中,S={(x i,y i)|i=1,2,…,m}。
设基于转换向量的线性模型为:
Y(π)=θ Tπ  式3
则基于转换向量的二分类算法的损失函数如式4所示:
Figure PCTCN2018094786-appb-000004
式4中,π σ为由σ生成的转换向量,∑ m={-v,+v} m
以下以v=1为例,来说明F log(S,θ)与
Figure PCTCN2018094786-appb-000005
之间存在与σ无关的线性关系,推导过程如下:
定义:
Figure PCTCN2018094786-appb-000006
转换向量π σ可以表示为:
Figure PCTCN2018094786-appb-000007
则以下等式成立:
Figure PCTCN2018094786-appb-000008
Figure PCTCN2018094786-appb-000009
可见,F log(S,θ)与
Figure PCTCN2018094786-appb-000010
之间为线性关系。当
Figure PCTCN2018094786-appb-000011
时,F log(S,θ)与
Figure PCTCN2018094786-appb-000012
之间的线性关系仍然成立,并且与σ无关。这样,使得F log(S,θ)最小的θ就是使得
Figure PCTCN2018094786-appb-000013
最小的θ,即式5成立:
Figure PCTCN2018094786-appb-000014
从上述论证过程可以得知,采用若干个转换向量π对二分类模型进行训练,与采用原始数据对二分类模型进行训练,得出的结果模型是一致的。
第二个例子:生成一个m维向量w,随机将0或者1作为w每一维的取值,将
Figure PCTCN2018094786-appb-000015
作为一个转换向量π。其中,w i为向量w的第i维。重复上述过程n次可以得到n个转换向量π。
由于w i或者为0,或者为1,转换向量π可以是任意的0到m个原始样本的yx的和值。第二个例子中对y的取值没有限制。
基于类似于第一个例子中的论证过程可以得出同样的结论,采用若干个转换向量π对二分类模型进行训练,会得到与采用原始数据对二分类模型进行训练相一致的结果模型,具体的论证过程不再赘述。
在数据提供方,步骤120,将所述n个转换向量π作为二分类模型的训练样本。
在数据挖掘方,步骤210,获取n个d维的转换向量π作为训练样本;每个转换向量π由随机选取的若干个原始样本的yx之和确定,原始样本为原始数据的m个样本之一,每个原始样本包括d维的原始向量x和输出标记值y。
数据提供方将步骤120中生成的训练样本,输出至数据挖掘方。数据挖掘方可以采用任意的方式从数据提供方获得训练样本,本说明书的实施例不 做限定。
在数据挖掘方,步骤220,基于该训练样本,对二分类模型进行训练,得到结果模型。
数据挖掘方在得到训练样本后,以该训练样本来对二分类模型进行训练。由于原始数据中的输出标记值已经体现在转换向量π中,而由n个转换向量π构成的训练样本没有标记值,可以采用无监督学习算法进行训练,得出结果模型。
本说明书的实施例对二分类模型没有限制,例如可以采用Boosting(增强)算法、SGD(Stochastic gradient descent,随机梯度下降)、SVRG(Stochastic variance reduced gradient,随机方差减小梯度)、Adagrad(Adaptive Gradient,自适应梯度)等等。
采用某种具体的二分类模型训练n个转换向量π构成的训练样本的方式与现有技术相同。以下给出采用Boosting算法进行训练的一个例子,其他算法可参照实现,不再详述。
Boosting算法的初始化:设n个转换向量π构成的样本空间为:δ r={π 12,…,π n};预设Boosting算法的迭代次数T(T为自然数);将线性模型θ的初始值θ 0置为d维的0向量;将n维中间变量ω的初始值ω 1置为每个维度均等于1/n;预先计算π *k,k为从1到d的每个自然数,π *k为n个转换向量π在第k个维度的最大值。
Boosting算法的从第1轮到第T轮的迭代过程:
设当前的迭代轮次为t,对于π的每一维k,计算:
Figure PCTCN2018094786-appb-000016
将使|r k|(r k的绝对值)最大的k记为ι(t),根据式6和式7计算r t和α t
Figure PCTCN2018094786-appb-000017
Figure PCTCN2018094786-appb-000018
再按照式8计算用于下一迭代轮次的n维中间变量ω (t+1)的每一维度的 值:
Figure PCTCN2018094786-appb-000019
式8中,j为从1到n的每个自然数。
在T轮迭代完毕后,可根据式9得到训练所得的结果模型θ T
Figure PCTCN2018094786-appb-000020
式9中,θ Tk为d维向量θ T的第k维。
可见,本说明书的实施例中,从m个d维的原始向量x和输出标记值y,随机生成n个d维的转换向量π,每个转换向量π由随机选取的若干个yx之和确定,并以n个转换向量π为训练样本进行二分类模型训练,得到与采用原始数据进行训练相一致的结果模型,从而不仅因在生成转换向量的过程中,因采用了多个原始样本且引入了随机量,使得还原原始数据极其困难,并且能得到与采用原始数据相一致的挖掘结果,避免了信息失真。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
在本说明书的一个应用示例中,数据提供方委托数据挖掘方进行分类规则的数据挖掘,数据挖掘方以二分类线性模型为基础来构建数据分类规则。数据提供方的原始数据S={(x i,y i)|i=1,2,…,m},其中x i∈R d(即x i为d维向量),y i∈{1,-1}(即输出标记值y i的取值为-1或1)。由于数据提供方的原始数据中包含用户的敏感信息,需要进行隐私保护。
能够提供隐私保护的分类数据挖掘流程如图3所示。
步骤310,获取原始数据的m个样本。
步骤320,采用原始数据计算n个Rado(Rademacher Observations,拉 德马赫观察资料)。每个Rado为一个d维向量,记为π σ,是本应用示例中的转换向量。
每个Rado的计算方法如下:生成m为向量σ,σ每一维的取值为-1或者1,随机确定;根据式10确定对应于该σ的Rado:
Figure PCTCN2018094786-appb-000021
以下举例说明:假设原始数据共5个样本,原始向量x的维度m=4,原始数据如表1所示:
Figure PCTCN2018094786-appb-000022
表1
设在生成一个Rado时,向量σ的随机值为σ={-1,1,-1,1,1},根据式10计算π σ的各个维度值:
第1维:
Figure PCTCN2018094786-appb-000023
Figure PCTCN2018094786-appb-000024
第2维:
Figure PCTCN2018094786-appb-000025
Figure PCTCN2018094786-appb-000026
第3维:
Figure PCTCN2018094786-appb-000027
Figure PCTCN2018094786-appb-000028
第4维:
Figure PCTCN2018094786-appb-000029
Figure PCTCN2018094786-appb-000030
可以得到一个Rado为{8,11,5,7}。
随机生成n个向量σ,即可得到n个Rado。
步骤310和320运行在数据提供方控制的设备或逻辑节点上。数据提供方将生成n个Rado作为待挖掘的数据,提供给数据挖掘方。
步骤330,采用Boosting算法,以n个Rado为训练样本,对二分类线性模型进行训练,得到结果模型。
步骤330运行在数据提供方控制的设备或逻辑节点上。数据提供方根据训练所得的二分类线性结果模型,生成多分类规则,交付给数据提供方。将数个二分类线性结果模型转换为多分类规则的方式可参照现有技术实现,不再赘述。
与上述流程实现对应,本说明书的实施例还提供了一种基于隐私保护的训练样本生成装置、和一种基于隐私保护的二分类模型训练装置。上述装置均可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为逻辑意义上的装置,是所在设备的CPU(Central Process Unit,中央处理器)将对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,除了图4所示的CPU、内存以及存储器之外,上述装置所在的设备通常还包括用于进行无线信号收发的芯片等其他硬件,和/或用于实现网络通信功能的板卡等其他硬件。
图5所示为本说明书实施例提供的一种基于隐私保护的训练样本生成装置,被挖掘的原始数据包括m个原始样本,每个原始样本包括d维的原始向量x和输出标记值y,m、d为自然数,所述装置包括转换向量生成单元和训练样本生成单元,其中:转换向量生成单元用于生成n个d维的转换向量π,每个转换向量π由随机选取的若干个原始样本的yx之和确定;训练样本生成单元用于将所述n个转换向量π作为二分类模型的训练样本。
可选的,所述y的取值为-v或v,v为实数;所述转换向量生成单元具体用于:生成一个m维向量σ,随机将-v或v作为σ每一维的取值,将
Figure PCTCN2018094786-appb-000031
作为一个转换向量π,y i为第i个原始样本的输出标记值,x i为第i个原始样本的原始向量,σ i为向量σ的第i维;重复上述过程n次得到n个转换向量π。
可选的,所述转换向量生成单元具体用于:生成一个m维向量w,随机将0或者1作为w每一维的取值,将
Figure PCTCN2018094786-appb-000032
作为一个转换向量π,w i为向量w的第i维,y i为第i个原始样本的输出标记值,x i为第i个原始样本的原始向量;重复上述过程n次得到n个转换向量π。
图6所示为本说明书实施例提供的一种基于隐私保护的二分类模型训练装置,包括训练样本获取单元和模型训练单元,其中:训练样本获取单元用于获取n个d维的转换向量π作为训练样本;每个所述转换向量π由随机选取的若干个原始样本的yx之和确定,所述原始样本为原始数据的m个样本之一,每个原始样本包括d维的原始向量x和输出标记值y,m、d为自然数,模型训练单元用于基于所述训练样本,二分类模型进行训练,得到结果模型。
可选的,所述二分类模型包括:增强Boosting算法、随机梯度下降SGD算法、随机方差减小梯度SVRG算法、或自适应梯度Adagrad算法。
本说明书的实施例提供了一种计算机设备,该计算机设备包括存储器和处理器。其中,存储器上存储有能够由处理器运行的计算机程序;处理器在运行存储的计算机程序时,执行本说明书实施例中基于隐私保护的训练样本生成方法的各个步骤。对基于隐私保护的训练样本生成方法的各个步骤的详细描述请参见之前的内容,不再重复。
本说明书的实施例提供了一种计算机设备,该计算机设备包括存储器和处理器。其中,存储器上存储有能够由处理器运行的计算机程序;处理器在运行存储的计算机程序时,执行本说明书实施例中基于隐私保护的二分类模型训练方法的各个步骤。对基于隐私保护的二分类模型训练方法的各个步骤的详细描述请参见之前的内容,不再重复。
本说明书的实施例提供了一种计算机可读存储介质,该存储介质上存储 有计算机程序,这些计算机程序在被处理器运行时,执行本说明书实施例中基于隐私保护的训练样本生成方法的各个步骤。对基于隐私保护的训练样本生成方法的各个步骤的详细描述请参见之前的内容,不再重复。
本说明书的实施例提供了一种计算机可读存储介质,该存储介质上存储有计算机程序,这些计算机程序在被处理器运行时,执行本说明书实施例中基于隐私保护的二分类模型训练方法的各个步骤。对基于隐私保护的二分类模型训练方法的各个步骤的详细描述请参见之前的内容,不再重复。
以上所述仅为本说明书的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那 些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本说明书的实施例可提供为方法、系统或计算机程序产品。因此,本说明书的实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本说明书的实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。

Claims (14)

  1. 一种基于隐私保护的训练样本生成方法,被挖掘的原始数据包括m个原始样本,每个原始样本包括d维的原始向量x和输出标记值y,m、d为自然数,所述方法包括:
    生成n个d维的转换向量π,每个转换向量π由随机选取的若干个原始样本的yx之和确定;
    将所述n个转换向量π作为二分类模型的训练样本。
  2. 根据权利要求1所述的方法,所述y的取值为-v或v,v为实数;
    所述生成n个d维的转换向量π,每个转换向量π由随机选取的若干个原始样本的yx之和确定,包括:生成一个m维向量σ,随机将-v或v作为σ每一维的取值,将
    Figure PCTCN2018094786-appb-100001
    作为一个转换向量π,y i为第i个原始样本的输出标记值,x i为第i个原始样本的原始向量,σ i为向量σ的第i维;重复上述过程n次得到n个转换向量π。
  3. 根据权利要求1所述的方法,所述生成n个d维的转换向量π,每个转换向量π由随机选取的若干个原始样本的yx之和确定,包括:生成一个m维向量w,随机将0或者1作为w每一维的取值,将
    Figure PCTCN2018094786-appb-100002
    作为一个转换向量π,w i为向量w的第i维,y i为第i个原始样本的输出标记值,x i为第i个原始样本的原始向量;重复上述过程n次得到n个转换向量π。
  4. 一种基于隐私保护的二分类模型训练方法,包括:
    获取n个d维的转换向量π作为训练样本;每个所述转换向量π由随机选取的若干个原始样本的yx之和确定,所述原始样本为原始数据的m个样本之一,每个原始样本包括d维的原始向量x和输出标记值y,m、d为自然数,
    基于所述训练样本,对二分类模型进行训练,得到结果模型。
  5. 根据权利要求4所述的方法,所述二分类模型包括:增强Boosting算法、随机梯度下降SGD算法、随机方差减小梯度SVRG算法、或自适应梯度Adagrad算法。
  6. 一种基于隐私保护的训练样本生成装置,被挖掘的原始数据包括m个原始样本,每个原始样本包括d维的原始向量x和输出标记值y,m、d为自然数,所述装置包括:
    转换向量生成单元,用于生成n个d维的转换向量π,每个转换向量π由随机选取的若干个原始样本的yx之和确定;
    训练样本生成单元,用于将所述n个转换向量π作为二分类模型的训练样本。
  7. 根据权利要求6所述的装置,所述y的取值为-v或v,v为实数;
    所述转换向量生成单元具体用于:生成一个m维向量σ,随机将-v或v作为σ每一维的取值,将
    Figure PCTCN2018094786-appb-100003
    作为一个转换向量π,y i为第i个原始样本的输出标记值,x i为第i个原始样本的原始向量,σ i为向量σ的第i维;重复上述过程n次得到n个转换向量π。
  8. 根据权利要求6所述的装置,所述转换向量生成单元具体用于:生成一个m维向量w,随机将0或者1作为w每一维的取值,将
    Figure PCTCN2018094786-appb-100004
    作为一个转换向量π,w i为向量w的第i维,y i为第i个原始样本的输出标记值,x i为第i个原始样本的原始向量;重复上述过程n次得到n个转换向量π。
  9. 一种基于隐私保护的二分类模型训练装置,包括:
    训练样本获取单元,用于获取n个d维的转换向量π作为训练样本;每个所述转换向量π由随机选取的若干个原始样本的yx之和确定,所述原始样本为原始数据的m个样本之一,每个原始样本包括d维的原始向量x和输出标记值y,m、d为自然数,
    模型训练单元,用于基于所述训练样本,对二分类模型进行训练,得到结果模型。
  10. 根据权利要求8所述的装置,所述二分类模型包括:增强Boosting算法、随机梯度下降SGD算法、随机方差减小梯度SVRG算法、或自适应梯度Adagrad算法。
  11. 一种计算机设备,包括:存储器和处理器;所述存储器上存储有可由处理器运行的计算机程序;所述处理器运行所述计算机程序时,执行如权利要求1到3任意一项所述的步骤。
  12. 一种计算机设备,包括:存储器和处理器;所述存储器上存储有可由处理器运行的计算机程序;所述处理器运行所述计算机程序时,执行如权利要求4到5任意一项所述的步骤。
  13. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器运行时,执行如权利要求1到3任意一项所述的步骤。
  14. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器运行时,执行如权利要求4到5任意一项所述的步骤。
PCT/CN2018/094786 2017-07-07 2018-07-06 基于隐私保护的训练样本生成方法和装置 WO2019007417A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP18828486.3A EP3644231A4 (en) 2017-07-07 2018-07-06 METHOD AND DEVICE FOR GENERATING LEARNING SAMPLES BASED ON CONFIDENTIALITY PROTECTION
SG11201912390YA SG11201912390YA (en) 2017-07-07 2018-07-06 Privacy protection based training sample generation method and device
US16/734,643 US10878125B2 (en) 2017-07-07 2020-01-06 Privacy protection based training sample generation method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710552377.1 2017-07-07
CN201710552377.1A CN109214404A (zh) 2017-07-07 2017-07-07 基于隐私保护的训练样本生成方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/734,643 Continuation US10878125B2 (en) 2017-07-07 2020-01-06 Privacy protection based training sample generation method and device

Publications (1)

Publication Number Publication Date
WO2019007417A1 true WO2019007417A1 (zh) 2019-01-10

Family

ID=64950625

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/094786 WO2019007417A1 (zh) 2017-07-07 2018-07-06 基于隐私保护的训练样本生成方法和装置

Country Status (6)

Country Link
US (1) US10878125B2 (zh)
EP (1) EP3644231A4 (zh)
CN (1) CN109214404A (zh)
SG (1) SG11201912390YA (zh)
TW (1) TW201907318A (zh)
WO (1) WO2019007417A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097079A (zh) * 2019-03-29 2019-08-06 浙江工业大学 一种基于分类边界的用户隐私保护方法

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918444A (zh) * 2019-02-01 2019-06-21 上海尚阵智能科技有限公司 模型结果的训练/验证/管理方法/系统、介质及设备
US10956597B2 (en) 2019-05-23 2021-03-23 Advanced New Technologies Co., Ltd. Loss function value determination method and device and electronic equipment
CN110263294B (zh) * 2019-05-23 2020-08-04 阿里巴巴集团控股有限公司 损失函数取值的确定方法、装置和电子设备
CN112183757B (zh) * 2019-07-04 2023-10-27 创新先进技术有限公司 模型训练方法、装置及系统
CN111367960A (zh) * 2020-02-25 2020-07-03 北京明略软件系统有限公司 一种实现数据处理的方法、装置、计算机存储介质及终端
CN111160573B (zh) * 2020-04-01 2020-06-30 支付宝(杭州)信息技术有限公司 保护数据隐私的双方联合训练业务预测模型的方法和装置
CN114936650A (zh) * 2020-12-06 2022-08-23 支付宝(杭州)信息技术有限公司 基于隐私保护的联合训练业务模型的方法及装置
CN113361658B (zh) * 2021-07-15 2022-06-14 支付宝(杭州)信息技术有限公司 一种基于隐私保护的图模型训练方法、装置及设备
CN114202673A (zh) * 2021-12-13 2022-03-18 深圳壹账通智能科技有限公司 证件分类模型的训练方法、证件分类方法、装置和介质
CN115422574A (zh) * 2022-08-15 2022-12-02 中国银联股份有限公司 一种数据处理方法、装置、电子设备及存储介质
CN115719085B (zh) * 2023-01-10 2023-04-18 武汉大学 一种深度神经网络模型反演攻击防御方法及设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090187522A1 (en) * 2008-01-18 2009-07-23 Siemens Medical Solutions Usa, Inc. System and Method for Privacy Preserving Predictive Models for Lung Cancer Survival Analysis
CN102955946A (zh) * 2011-08-18 2013-03-06 刘军 基于线性分类树和神经网络的两阶段快速分类器
CN106709447A (zh) * 2016-12-21 2017-05-24 华南理工大学 基于目标定位与特征融合的视频中异常行为检测方法
CN106845510A (zh) * 2016-11-07 2017-06-13 中国传媒大学 基于深度层级特征融合的中国传统视觉文化符号识别方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5297688B2 (ja) * 2008-05-09 2013-09-25 株式会社日立製作所 ベクトル秘匿型内積計算システム、ベクトル秘匿型内積計算方法及び暗号鍵共有システム
US8909250B1 (en) * 2013-07-02 2014-12-09 Google Inc. Obscuring true location for location-based services
SG11201703247WA (en) * 2014-10-24 2017-05-30 Nat Ict Australia Ltd Learning with transformed data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090187522A1 (en) * 2008-01-18 2009-07-23 Siemens Medical Solutions Usa, Inc. System and Method for Privacy Preserving Predictive Models for Lung Cancer Survival Analysis
CN102955946A (zh) * 2011-08-18 2013-03-06 刘军 基于线性分类树和神经网络的两阶段快速分类器
CN106845510A (zh) * 2016-11-07 2017-06-13 中国传媒大学 基于深度层级特征融合的中国传统视觉文化符号识别方法
CN106709447A (zh) * 2016-12-21 2017-05-24 华南理工大学 基于目标定位与特征融合的视频中异常行为检测方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LIU, HONGWEI ET AL.: "Privacy protection algorithm of support vector machine based on rotation disturbance", STATISTICS AND DECISION, no. 19, 10 October 2012 (2012-10-10), pages 94 - 96, XP009518577, DOI: 10.13546/j.cnki.tjyjc.2012.19.033 *
PENG, XIAOBING ET AL.: "Research Progress of Privacy-Preserving Support Vector Machines", JOURNAL OF JIANGSU UNIVERSITY , vol. 38, no. 1, 1 January 2017 (2017-01-01), pages 78 - 85, XP055662921, DOI: 10.3969/j.issn.1671-7775.2017.01.014 *
See also references of EP3644231A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097079A (zh) * 2019-03-29 2019-08-06 浙江工业大学 一种基于分类边界的用户隐私保护方法
CN110097079B (zh) * 2019-03-29 2021-03-30 浙江工业大学 一种基于分类边界的用户隐私保护方法

Also Published As

Publication number Publication date
US20200143080A1 (en) 2020-05-07
CN109214404A (zh) 2019-01-15
TW201907318A (zh) 2019-02-16
SG11201912390YA (en) 2020-01-30
US10878125B2 (en) 2020-12-29
EP3644231A4 (en) 2020-06-10
EP3644231A1 (en) 2020-04-29

Similar Documents

Publication Publication Date Title
WO2019007417A1 (zh) 基于隐私保护的训练样本生成方法和装置
US10248664B1 (en) Zero-shot sketch-based image retrieval techniques using neural networks for sketch-image recognition and retrieval
Radhakrishnan et al. Overparameterized neural networks implement associative memory
Raja et al. Cloud K-SVD: A collaborative dictionary learning algorithm for big, distributed data
CN113221183B (zh) 实现隐私保护的多方协同更新模型的方法、装置及系统
US11875253B2 (en) Low-resource entity resolution with transfer learning
CN110019793A (zh) 一种文本语义编码方法及装置
CN111400504B (zh) 企业关键人的识别方法和装置
Kaji et al. An adversarial approach to structural estimation
TW201224808A (en) Method for classification of objects in a graph data stream
US20240185080A1 (en) Self-supervised data obfuscation in foundation models
US9928214B2 (en) Sketching structured matrices in nonlinear regression problems
CN114330474A (zh) 一种数据处理方法、装置、计算机设备以及存储介质
CN111582284B (zh) 用于图像识别的隐私保护方法、装置和电子设备
CN113779380A (zh) 跨域推荐、内容推荐方法、装置及设备
CN116127925B (zh) 基于对文本进行破坏处理的文本数据增强方法及装置
US20160335053A1 (en) Generating compact representations of high-dimensional data
US10013644B2 (en) Statistical max pooling with deep learning
CN110633476B (zh) 用于获取知识标注信息的方法及装置
Duan et al. An efficient algorithm for solving the nonnegative tensor least squares problem
Bogoya et al. Systems with local and nonlocal diffusions, mixed boundary conditions, and reaction terms
Yang et al. Predictive approximate Bayesian computation via saddle points
KR102593137B1 (ko) 딥러닝 기술을 이용한 비도덕적인 이미지 분류 장치 및 방법
KR20190101551A (ko) 퍼지 범주 표현을 이용한 확률 레이블 부착 알고리즘을 사용한 분류 방법
US20230143721A1 (en) Teaching a machine classifier to recognize a new class

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18828486

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018828486

Country of ref document: EP

Effective date: 20200120