WO2015149497A1 - 一种基于分布式的数据统计的方法 - Google Patents

一种基于分布式的数据统计的方法 Download PDF

Info

Publication number
WO2015149497A1
WO2015149497A1 PCT/CN2014/088170 CN2014088170W WO2015149497A1 WO 2015149497 A1 WO2015149497 A1 WO 2015149497A1 CN 2014088170 W CN2014088170 W CN 2014088170W WO 2015149497 A1 WO2015149497 A1 WO 2015149497A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
data
data set
preset
attribute
Prior art date
Application number
PCT/CN2014/088170
Other languages
English (en)
French (fr)
Inventor
欧阳军
范伟
何诚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015149497A1 publication Critical patent/WO2015149497A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks

Definitions

  • the present invention relates to the field of Internet technologies, and in particular, to a method and apparatus based on distributed data statistics.
  • the method implements statistics on data in a preset data source, but cannot solve the security problem of data statistics of multiple nodes in a distributed computing environment.
  • the embodiments of the present invention provide a method and an apparatus based on distributed data statistics, which can solve the security problem of data statistics based on distributed multiple nodes.
  • a first aspect of an embodiment of the present invention discloses a method based on distributed data statistics, the method comprising:
  • the second node receives the encrypted data set sent by the at least two first nodes, where the first node and the second node are nodes in a distributed network; the second node utilizes according to preset data attributes.
  • the encrypted data set constructs a decision tree; the second node obtains a statistical result of the data according to the preset data attribute and the decision tree.
  • the method before the second node receives the encrypted data set sent by the at least two first nodes, the method further includes:
  • the second node transmits a public key to the first node, such that the first node encrypts the data set according to the public key to obtain an encrypted data set.
  • the second node before constructing the decision tree by using the encrypted data set according to the preset data attribute, further includes:
  • the second node rearranges at least one column of the encrypted data set according to a preset arrangement rule to obtain a first data set
  • the constructing the decision tree by using the data set according to the preset data attribute includes:
  • the second node constructs a decision tree by using the second data set according to a preset data attribute.
  • the second node is utilized according to preset data attributes.
  • the encrypted data set construction decision tree includes:
  • the second node acquires data one by one from the encrypted data set according to a preset manner, and determines a key attribute value of the data;
  • the second node compares the value of the preset data attribute with a key attribute value of the data, and obtains a comparison result
  • the second node inserts the acquired data as a leaf node into the decision tree according to the result of the comparison.
  • the second node according to the preset data attribute and the decision tree, obtain statistics of the data, including:
  • the second node performs statistics on the leaf nodes that need to be traversed, and obtains statistical results.
  • a second aspect of the embodiments of the present invention discloses a device based on distributed data statistics, the device comprising:
  • a receiving unit configured to receive an encrypted data set sent by at least two first nodes, where the first node and the second node are nodes in a distributed network;
  • a constructing unit configured to construct a decision tree by using the encrypted data set according to a preset data attribute
  • an obtaining unit configured to obtain a statistical result of the data according to the preset data attribute and the decision tree.
  • the device further includes a sending unit,
  • the sending unit is configured to send a public key to the first node, so that the first node encrypts the data set according to the public key to obtain an encrypted data set;
  • the receiving unit is configured to receive an encrypted data set sent by at least two first nodes.
  • the device further includes an arranging unit and a decrypting unit;
  • the arranging unit is configured to rearrange at least one column of the encrypted data set received by the receiving unit according to a preset arrangement rule to obtain a first data set;
  • the decrypting unit is configured to decrypt the first data set according to a private key, to obtain a second data set, where the private key corresponds to the public key;
  • the constructing unit is specifically configured to construct a decision tree by using the second data set according to a preset data attribute.
  • the generating unit specifically includes a first determining subunit, Determining a subunit, comparing the subunit, and inserting the subunit;
  • the first determining subunit is specifically configured to determine a value of the preset data attribute
  • the second determining subunit is specifically configured to acquire data one by one from the encrypted data set according to a preset manner, and determine a key attribute value of the data;
  • the comparing subunit is specifically configured to compare the value of the preset data attribute with a key attribute value of the data, and obtain a comparison result;
  • the inserting subunit is specifically configured to insert the acquired data as a leaf node into the decision tree according to the result of the comparison.
  • the acquiring unit includes a third determining subunit and a statistical subunit
  • the third determining subunit is specifically configured to determine, according to the preset data attribute and a value of a preset data attribute determined by the first determining subunit, a leaf node that needs to be traversed in the decision tree;
  • the statistics subunit is specifically configured to perform statistics on the leaf nodes that need to be traversed, and obtain statistical results.
  • the distributed data statistic method and apparatus provided by the embodiments of the present invention are used to construct a decision tree by using an encrypted data set according to preset data attributes, so that the data is in the Data statistics are completed in the case of encryption, thus ensuring the security of the data.
  • FIG. 1 is a flowchart of a method for distributed data statistics according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for distributed data statistics according to another embodiment of the present invention.
  • FIG. 3 is a flowchart of a method for distributed data statistics according to another embodiment of the present invention.
  • FIG. 4 is a flowchart of a method for distributed data statistics according to another embodiment of the present invention.
  • FIG. 5 is a structural diagram of an apparatus based on distributed data statistics according to an embodiment of the present invention.
  • FIG. 6 is a structural diagram of an apparatus based on distributed data statistics according to another embodiment of the present invention.
  • FIG. 7 is a structural diagram of an apparatus based on distributed data statistics according to another embodiment of the present invention.
  • FIG. 8 is a structural diagram of an apparatus based on distributed data statistics according to another embodiment of the present invention.
  • FIG. 9 is a structural diagram of an apparatus based on distributed data statistics according to another embodiment of the present invention.
  • a method for distributed data statistics according to an embodiment of the present invention is described below, and the method specifically includes:
  • the method steps of the distributed data statistics based on the embodiment of the present invention as described in FIG. 1 include 101 to 103.
  • the second node receives an encrypted data set sent by at least two first nodes, where the first node and the second node are nodes in a distributed network.
  • the second node may be a type of trusted server or terminal that receives data and performs calculation in a distributed network, and may be one or more.
  • the first node may be a server or a terminal in a distributed network.
  • step 101 further comprising step 104;
  • the step 104 is specifically: the second node sends a public key to the first node, so that the first node encrypts the data set according to the public key to obtain an encrypted data set.
  • the second node has a private key corresponding to the public key, and only the private key can decrypt the data encrypted by the public key.
  • the second node constructs a decision tree by using the encrypted data set according to a preset data attribute.
  • the encrypted data set received by the second node includes a plurality of attributes, and the second node may select one or several attributes as preset data attributes according to requirements, and then use the received encrypted data according to the preset data attributes.
  • the collection constructs a decision tree.
  • the data in the data set encrypted by the public key will become ciphertext, and the data in the cipher text state can be compared, size, addition, subtraction, summation, averaging, retrieval, and the like.
  • the data set received by the second node has an attribute of a score column.
  • the received data set can be constructed with the set passing score as a judgment condition.
  • a decision tree For example, a data record with a score greater than or equal to the pass score is assigned to the left side of the tree, and a data record with a score less than a few numbers is assigned to the right side of the tree.
  • the number of child nodes on the left side of the decision tree and the number of child nodes on the right side are separately counted. , you can get statistical results.
  • step 102 further comprising step 105 and step 106;
  • the step 105 is: the second node rearranges at least one column of the received encrypted data set according to a preset arrangement rule to obtain a first data set;
  • the step 106 is: the second node decrypts the first data set according to a private key, and obtains a second data set, where the private key corresponds to the public key;
  • Step 102 The second node constructs a decision tree by using the data set according to a preset data attribute, including:
  • the second node constructs a decision tree by using the second data set according to a preset data attribute.
  • the second node rearranges at least one column of the received encrypted data set, because even if the order of the data changes, there is no data summation, data averaging, or comparison size.
  • the effect can also conceal the real information, and the preset operations include operations such as comparing size, averaging, summation, and the like.
  • the step 102 specifically includes steps 1021 to 1024;
  • Step 1021 The second node determines a value of the preset data attribute.
  • a preset The data attribute is a score. If you need to get the number of students with more than 90 points, you can determine the value of the preset data attribute is 90.
  • Step 1022 The second node acquires data one by one from the encrypted data set according to a preset manner, and determines a key attribute value of the data.
  • the preset method may be a random method, a method from the back to the back, a method from the back to the front, or the like, or may be defined autonomously.
  • Step 1023 The second node compares the value of the preset data attribute with a key attribute value of the data, and obtains a comparison result.
  • the present invention for example, it is required to count the number of students whose math scores are higher than 90 points, and the key attribute values of the data are mathematical scores.
  • Step 1024 The second node inserts the acquired data as a leaf node into the decision tree according to the result of the comparison.
  • the second node acquires a statistical result of the data according to the data attribute and the decision tree.
  • the data attribute is a judgment condition for constructing the decision tree, and according to the data attribute, a leaf node of the decision tree corresponding to the data attribute is counted, and a statistical result is obtained.
  • the step 103 specifically includes steps 1031 to 1032:
  • Step 1031 The second node determines, according to the preset data attribute and the value of the preset data attribute, a leaf node that needs to be traversed in the decision tree.
  • Step 1032 The second node performs statistics on the leaf nodes that need to be traversed, and obtains statistics results.
  • the second node receives the public key encrypted data set sent by the first node, and the second node sets the judgment attribute, such as the gender ratio of the male and female in the statistical data set, and randomly selects the data set.
  • the data to determine whether the attribute of the data is male or female, the data of the attribute male can be placed as a child node on the left side of the root node of the random tree, and the attribute can be The female's data is placed as a child node on the right side of the root node of the random tree until the data in the data set is selected.
  • the number of child nodes on the left side of the root node of the random tree and the number of child nodes on the right side are counted, and the ratio of male to female can be obtained.
  • the information in the data set since the data set has been encrypted, in the process of acquiring the ratio of male to female, the information in the data set is completely in the ciphertext state, the information in the data set is not leaked, and since the second node contains at least two first The data of the node, then the data between the nodes is also in the ciphertext state, ensuring the security of the first node data.
  • the second node receives the public key encrypted data set sent by the at least two first nodes, and the second node merges the received data set into one data set, and then merges the data set.
  • the data corresponding to one or several attributes of the data set is adjusted, for example, the merged set is set A, and the set A includes three attributes a, b, and c, which can correspond to the a attribute in the set A.
  • the data is adjusted in a preset order, which may be a random order, a high-to-low order, and the like.
  • the decision tree is constructed by using the encrypted data set according to the preset data attributes, so that the data is completed in the case of encryption, thereby ensuring the data. Security.
  • the apparatus 20 includes a receiving unit 201, a construction unit 202, and an acquisition unit 203.
  • the receiving unit 201 is configured to receive an encrypted data set sent by at least two first nodes, where the first node and the second node are nodes in a distributed network;
  • the second node may be a type of trusted server or terminal that receives data and performs calculation in a distributed network, and may be one or more.
  • the first node may be a server or a terminal in a distributed network.
  • the constructing unit 202 is configured to use the encrypted data set according to a preset data attribute Decision tree
  • the generating unit 202 includes a first determining subunit 2021, a second determining subunit 2022, a comparing subunit 2023, and an inserting subunit 2024.
  • the first determining subunit 2021 is specifically configured to determine a value of the preset data attribute.
  • the second determining sub-unit 2022 is specifically configured to acquire data one by one from the encrypted data set according to a preset manner, and determine a key attribute value of the data;
  • the comparing subunit 2023 is specifically configured to compare the value of the preset data attribute with a key attribute value of the data, and obtain a comparison result;
  • the insertion subunit 2024 is specifically configured to insert the acquired data as a leaf node into the decision tree according to the result of the comparison.
  • the obtaining unit 203 is configured to obtain a statistical result of the data according to the preset data attribute and the decision tree.
  • the acquisition unit 203 includes a third determination sub-unit 2031 and a statistical sub-unit 2032;
  • the third determining sub-unit 2031 is configured to determine, according to the preset data attribute and a value of the preset data attribute determined by the first determining sub-unit, a leaf node that needs to be traversed in the decision tree;
  • the statistic sub-unit 2032 is configured to perform statistics on the leaf nodes that need to be traversed, and obtain statistical results.
  • the device 20 further includes an arranging unit 204 and a decrypting unit 205;
  • the arranging unit 204 is configured to rearrange at least one column of the encrypted data set received by the receiving unit 201 according to a preset arrangement rule to obtain a first data set.
  • the decrypting unit 205 is specifically configured to perform decryption according to the private key, the first data set, to obtain a second data set, where the private key corresponds to the public key;
  • the constructing unit 202 is specifically configured to construct a decision tree by using the second data set according to a preset data attribute.
  • the encrypted data set received by the constructing unit 202 includes a plurality of attributes, and the device 20 may select one or several attributes as preset data attributes according to requirements, and then utilize the received encrypted data set according to the preset data attributes. Construct a decision tree.
  • the data in the data set encrypted by the public key will become ciphertext, and the data in the cipher text state can be compared, size, addition, subtraction, summation, averaging, retrieval, and the like.
  • the device 20 further includes a sending unit 206;
  • the sending unit 206 is configured to send a public key to the first node, so that the first node encrypts the data set according to the public key to obtain an encrypted data set;
  • the receiving unit 201 is configured to receive the encrypted data set sent by the at least two first nodes.
  • the device 20 has a private key corresponding to the public key, and only the private key can decrypt the data encrypted by the public key.
  • the decision tree is constructed by using the encrypted data set according to the preset data attributes, so that the data is completed in the case of encryption, thereby ensuring the data. Security.
  • FIG. 9 illustrates a structure of a message forwarding device according to another embodiment of the present invention, including at least one processor 301 (eg, a CPU), a memory 302, at least one network interface 303, and at least one communication bus 304 for implementing these Connection communication between devices.
  • the processor 301 is configured to execute executable modules, such as computer programs, stored in the memory 302.
  • the memory 302 may include a high speed random access memory (RAM: Random Access Memory), and may also include a non-volatile memory such as at least one disk memory.
  • the communication connection between the network device and at least one other network element is implemented by at least one network interface 303 (which may be wired or wireless), and may use an Internet, a wide area network, a local network, a metropolitan area network, or the like.
  • the memory 302 stores a program 3021 that can be executed by the processor 301.
  • the program includes:
  • the second node receives the encrypted data set sent by the at least two first nodes, where the first node and the second node are nodes in the distributed network;
  • the second node constructs a decision tree by using the encrypted data set according to a preset data attribute
  • the second node acquires a statistical result of the data according to the preset data attribute and the decision tree.
  • the decision tree is constructed by using the encrypted data set according to the preset data attributes, so that the data is completed in the case of encryption, thereby ensuring the data. Security.
  • the content is based on the same concept as the method embodiment of the present invention.
  • the description in the method embodiment of the present invention and details are not described herein again.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Storage Device Security (AREA)

Abstract

涉及互联网技术领域,具体涉及一种基于分布式的数据统计的方法及装置。其中方法包括:第二节点接收至少两个第一节点发送的加密的数据集合,所述第一节点以及所述第二节点均为分布式网络中的节点;所述第二节点根据预设的数据属性,利用所述加密的数据集合构造决策树;所述第二节点根据所述预设的数据属性和所述决策树,获取数据的统计结果。提供的技术方案可以解决基于分布式的多个节点的数据统计的安全问题。

Description

一种基于分布式的数据统计的方法 技术领域
本发明涉及互联网技术领域,具体涉及一种基于分布式的数据统计的方法及装置。
背景技术
随着大数据时代的到来,面对海量数据信息,如何从这些信息中提取有效的数据显得尤为重要。
现有技术中有一种方法,该方法将任一数据集合中的数据按照属性构造决策树,通过统计该决策树中的叶子节点的数量即可获取统计结果。
该方法实现了对预设数据源中数据的统计,但是无法解决分布式计算环境下多个节点的数据统计的安全问题。
发明内容
本发明实施例提供了基于分布式的数据统计的方法及装置,可以解决基于分布式的多个节点的数据统计的安全问题。
本发明实施例的第一方面公开了基于分布式的数据统计的方法,所述方法包括:
第二节点接收至少两个第一节点发送的加密的数据集合,所述第一节点以及所述第二节点均为分布式网络中的节点;所述第二节点根据预设的数据属性,利用所述加密的数据集合构造决策树;所述第二节点根据所述预设的数据属性和所述决策树,获取数据的统计结果。
结合第一方面,在第一方面的第一种实现方式中,所述第二节点接收至少两个第一节点发送的加密的数据集合之前,还包括:
所述第二节点向所述第一节点发送公开密钥,以使得所述第一节点根据所述公开密钥对数据集合进行加密获得加密的数据集合。
结合第一方面的第一种实现方式,在第一方面的第二种实现方式中,所述第二节点根据预设的数据属性,利用所述加密的数据集合构造决策树之前,还包括:
所述第二节点按照预设的排列规则,将所述加密的数据集合中的至少一列数据进行重新排列,以获得第一数据集合;
所述第二节点根据私有密钥,对所述第一数据集合进行解密,获得第二数据集合,所述私有密钥与所述公有密钥对应;
所述第二节点根据预设的数据属性,利用所述数据集合构造决策树包括:
所述第二节点根据预设的数据属性,利用所述第二数据集合构造决策树。
结合第一方面或第一方面的第一种实现方式或第一方面的第二种实现方式,在第一方面的第三种实现方式中,所述第二节点根据预设的数据属性,利用所述加密的数据集合构造决策树包括:
所述第二节点确定所述预设的数据属性的值;
所述第二节点按照预设的方式从所述加密的数据集合中逐条获取数据,并确定所述数据的关键属性值;
所述第二节点将所述预设的数据属性的值与所述数据的关键属性值进行比较,并获取比较的结果;
所述第二节点根据所述比较的结果,将所述获取的数据作为叶子节点插入到所述决策树中。
结合第一方面的第三种实现方式,在第一方面的第四种实现方式中,所述第二节点根据所述预设的数据属性和所述决策树,获取数据的统计结果包括:
所述第二节点根据所述预设的数据属性以及所述预设的数据属性的值,确定在所述决策树中需要遍历的叶子节点;
所述第二节点对所述需要遍历的叶子节点进行统计,获取统计的结果。
本发明实施例的第二方面公开了一种基于分布式的数据统计的装置,所述装置包括:
接收单元,用于接收至少两个第一节点发送的加密的数据集合,所述第一节点以及所述第二节点均为分布式网络中的节点;
构造单元,用于根据预设的数据属性,利用所述加密的数据集合构造决策树;
获取单元,用于根据所述预设的数据属性和所述决策树,获取数据的统计结果。
结合第二方面,在第二方面的第一种实现方式中,所述装置还包括发送单元,
所述发送单元,用于向所述第一节点发送公开密钥,以使得所述第一节点根据所述公开密钥对数据集合进行加密获得加密的数据集合;
所述接收单元,用于接收至少两个第一节点发送的加密的数据集合。
结合第二方面或第二方面的第一种实现方式,在第二方面的第二种实现方式中,
所述装置还包括排列单元,解密单元;
所述排列单元,具体用于按照预设的排列规则,将所述接收单元接收的加密的数据集合中的至少一列数据进行重新排列,以获得第一数据集合;
所述解密单元,具体用于根据私有密钥,对所述第一数据集合进行解密,获得第二数据集合,所述私有密钥与所述公有密钥对应;
所述构造单元,具体用于根据预设的数据属性,利用所述第二数据集合构造决策树。
结合第二方面或第二方面的第一种实现方式或第二方面的第二种实现方式,在第二方面的第三种实现方式中,所述生成单元具体包括第一确定子单元,第二确定子单元,比较子单元以及插入子单元;
所述第一确定子单元,具体用于确定所述预设的数据属性的值;
所述第二确定子单元,具体用于按照预设的方式从所述加密的数据集合中逐条获取数据,并确定所述数据的关键属性值;
所述比较子单元,具体用于将所述预设的数据属性的值与所述数据的关键属性值进行比较,并获取比较的结果;
所述插入子单元,具体用于根据所述比较的结果,将所述获取的数据作为叶子节点插入到所述决策树中。
结合第二方面的第三种实现方式,在第二方面的第四种实现方式中,所述获取单元包括第三确定子单元和统计子单元;
所述第三确定子单元,具体用于根据所述预设的数据属性以及所述第一确定子单元确定的预设的数据属性的值,确定在所述决策树中需要遍历的叶子节点;
所述统计子单元,具体用于对所述需要遍历的叶子节点进行统计,获取统计的结果。
从本发明实施例提供的以上技术方案可以看出,使用本发明实施例提供的基于分布式的数据统计方法及装置,根据预设的数据属性,利用加密的数据集合构造决策树,使得数据在加密的情况下完成了数据统计,从而保证了数据的安全性。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明一实施例提供的基于分布式的数据统计的方法流程图;
图2为本发明另一实施例提供的基于分布式的数据统计的方法流程图;
图3为本发明另一实施例提供的基于分布式的数据统计的方法流程图;
图4为本发明另一实施例提供的基于分布式的数据统计的方法流程图;
图5为本发明一实施例提供的基于分布式的数据统计的装置结构图;
图6为本发明另一实施例提供的基于分布式的数据统计的装置结构图;
图7为本发明另一实施例提供的基于分布式的数据统计的装置结构图;
图8为本发明另一实施例提供的基于分布式的数据统计的装置结构图;
图9为本发明另一实施例提供的基于分布式的数据统计的装置结构图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
下面根据图1,描述本发明实施例的一种基于分布式的数据统计的方法,该方法具体包括:
如图1描述本发明实施例的基于分布式的数据统计的方法步骤包括101至103。
101、第二节点接收至少两个第一节点发送的加密的数据集合,所述第一节点以及所述第二节点均为分布式网络中的节点;
其中,第二节点可以是分布式网络中接收数据并进行计算的一类可信赖的服务器或者终端,可以是一个也可以有多个。
其中,第一节点可以是分布式网络中的一台服务器或者一部终端。
可选的,如图4所述,在所述步骤101之前还包括步骤104;
所述步骤104具体为:第二节点向所述第一节点发送公开密钥,以使得所述第一节点根据所述公开密钥对数据集合进行加密获得加密的数据集合。
其中,第二节点拥有与公开密钥对应的私有密钥,只有该私有密钥才可以对被公有密钥加密过的数据解密。
102、所述第二节点根据预设的数据属性,利用所述加密的数据集合构造决策树。
其中,第二节点接收到的加密的数据集合包含很多属性,第二节点可以根据需要选择一种或者几种属性作为预设的数据属性,然后根据预设的数据属性利用接收到的加密的数据集合构造决策树。
其中,通过公开密钥加密的数据集合中的数据会变成密文,处于密文状态的数据可以进行比较大小、加减、求和、求平均值、检索等操作。
在本发明的一个实施例中,例如第二节点接收到的数据集合中有属性为分数一栏,为了统计及格率,就可以以设置的及格分数为判断条件,将接收到的数据集合构造成决策树。例如将分数大于或等于及格分数的数据记录分到树的左边,将分数小于几个数的数据记录分到树的右边,最后分别统计决策树左边子节点的个数和右边子节点的个数,即可获得统计结果。
可选的,如图2所述,在步骤102之前还包括步骤105和步骤106;
所述步骤105为:所述第二节点按照预设的排列规则,将接收的加密的数据集合中的至少一列数据进行重新排列,以获得第一数据集合;
所述步骤106为:所述第二节点根据私有密钥,对所述第一数据集合进行解密,获得第二数据集合,所述私有密钥与所述公有密钥对应;
所述步骤102:所述第二节点根据预设的数据属性,利用所述数据集合构造决策树包括:
所述第二节点根据预设的数据属性,利用所述第二数据集合构造决策树。
在本发明的一个实施例中,第二节点对接收的加密的数据集合中的至少一列数据进行重新排列,因为即使数据的顺序发生变化,对于数据求和、数据求平均值或者比较大小并没有影响,同时还可以掩饰真实信息,所述预设运算包括比较大小、求平均值、求和等运算。
在本发明的一个实施例中,如图3所示,所述步骤102具体包括步骤1021至1024;
步骤1021:所述第二节点确定所述预设的数据属性的值;
在本发明的一个实施例中,例如在一个学生成绩的数据集合中,预设的 数据属性为分数,如果需要获得90分以上学生的人数,就可以确定预设的数据属性的值为90。
步骤1022:所述第二节点按照预设的方式从所述加密的数据集合中逐条获取数据,并确定所述数据的关键属性值;
其中,预设的方式可以是随机方式,也可以从前往后的方式,也可以是从后往前的方式等等,也可以自主进行定义。
步骤1023:所述第二节点将所述预设的数据属性的值与所述数据的关键属性值进行比较,并获取比较的结果;
在本发明的一个实施例中,例如需要统计数学成绩高于90分的学生的人数,那么数据的关键属性值即为数学成绩。
步骤1024:所述第二节点根据所述比较的结果,将所述获取的数据作为叶子节点插入到所述决策树中。
103、所述第二节点根据所述数据属性和所述决策树,获取数据的统计结果。
其中,所述数据属性为构造所述决策树的判断条件,根据所述数据属性,统计与该数据属性对应的所述决策树的叶子节点,即可获得统计结果。
在本发明的一个实施例中,如图4所示,所述步骤103具体包括步骤1031至1032:
步骤1031:所述第二节点根据所述预设的数据属性以及所述预设的数据属性的值,确定在所述决策树中需要遍历的叶子节点;
步骤1032:所述第二节点对所述需要遍历的叶子节点进行统计,获取统计的结果。
在本发明的一个实施例中,第二节点接收到第一节点发送的经过公开密钥加密的数据集合,第二节点设置判断属性,例如统计数据集合中男女的性别比例,随机选取数据集合中的数据,判断该数据的属性是男的还是女,可以将属性为男的的数据作为子节点放在随机树根节点的左边,可以将属性为 女的的数据作为子节点放在随机树根节点的右边,直至将数据集合中的数据选择完,最后统计随机树根节点左边子节点的数目以及右边子节点的数目,即可获取男女比例。其中,由于数据集合已经被加密,所以在获取男女比例的过程中,数据集合中的信息完全处于密文状态,数据集合中的信息没有被泄露,又由于第二节点中包含至少两个第一节点的数据,那么节点之间的数据也是处于密文状态,保证第一节点数据的安全。
在本发明的一个实施例中,第二节点接收到至少两个第一节点发送的经过公开密钥加密的数据集合,第二节点将接收到的数据集合合并成一个数据集合,然后对合并后的数据集合的中的某一个或者几个属性对应的数据进行调整,例如合并后的集合为集合A,该集合A中包含a、b、c三个属性,可以对集合A中a属性对应的数据按照预设的顺序进行调整,所述预设的顺序可以是随机顺序,也可以从高到底的顺序等等。A中的a属性对应的数据被调整后,对a属性对应的数据求和、比较大小、求平均值等都没有影响,而且保护了原始数据的安全。其中,由于对集合A中的数据做了调整,也可以对A进行私钥解密,使得A中数据可以处于明文状态下进行处理。
从上可知,使用本发明实施例的基于分布式的数据统计方法,根据预设的数据属性,利用加密的数据集合构造决策树,使得数据在加密的情况下完成数据统计,从而还保证了数据的安全性。
下面根据图5描述本发明实施例的一种基于分布式的数据统计的装置20。如图5所示,装置20包括:接收单元201,构造单元202,获取单元203。
接收单元201,用于接收至少两个第一节点发送的加密的数据集合,所述第一节点以及所述第二节点均为分布式网络中的节点;
其中,第二节点可以是分布式网络中接收数据并进行计算的一类可信赖的服务器或者终端,可以是一个也可以有多个。
其中,第一节点可以是分布式网络中的一台服务器或者一部终端。
构造单元202,用于根据预设的数据属性,利用所述加密的数据集合构造 决策树;
在本发明的一个实施例中,如图6所述,生成单元202包括第一确定子单元2021,第二确定子单元2022,比较子单元2023以及插入子单元2024。
第一确定子单元2021,具体用于确定所述预设的数据属性的值;
第二确定子单元2022,具体用于按照预设的方式从所述加密的数据集合中逐条获取数据,并确定所述数据的关键属性值;
比较子单元2023,具体用于将所述预设的数据属性的值与所述数据的关键属性值进行比较,并获取比较的结果;
插入子单元2024,具体用于根据所述比较的结果,将所述获取的数据作为叶子节点插入到所述决策树中。
获取单元203,用于根据所述预设的数据属性和所述决策树,获取数据的统计结果。
在本发明的一个实施例中,如图7所述,获取单元203包括第三确定子单元2031以及统计子单元2032;
第三确定子单元2031,具体用于根据所述预设的数据属性以及所述第一确定子单元确定的预设的数据属性的值,确定在所述决策树中需要遍历的叶子节点;
统计子单元2032,具体用于对所述需要遍历的叶子节点进行统计,获取统计的结果。
可选的,如图8所述,所述装置20还包括排列单元204以及解密单元205;
排列单元204,具体用于按照预设的排列规则,将接收单元201接收的加密的数据集合中的至少一列数据进行重新排列,以获得第一数据集合;
解密单元205,具体用于根据私有密钥,第一数据集合进行解密,获得第二数据集合,所述私有密钥与所述公有密钥对应;
构造单元202,具体用于根据预设的数据属性,利用所述第二数据集合构造决策树。
其中,构造单元202接收到的加密的数据集合包含很多属性,装置20可以根据需要选择一种或者几种属性作为预设的数据属性,然后根据预设的数据属性利用接收到的加密的数据集合构造决策树。
其中,通过公开密钥加密的数据集合中的数据会变成密文,处于密文状态的数据可以进行比较大小、加减、求和、求平均值、检索等操作。
可选的,如图4所述,所述装置20还包括发送单元206;
发送单元206,用于向所述第一节点发送公开密钥,以使得所述第一节点根据所述公开密钥对数据集合进行加密获得加密的数据集合;
接收单元201,用于接收至少两个第一节点发送的加密的数据集合。
其中,装置20拥有与公开密钥对应的私有密钥,只有该私有密钥才可以对被公有密钥加密过的数据解密。
从上可知,使用本发明实施例的基于分布式的数据统计方法,根据预设的数据属性,利用加密的数据集合构造决策树,使得数据在加密的情况下完成数据统计,从而还保证了数据的安全性。
图9描述了本发明另一个实施例提供的报文转发设备的结构,包括至少一个处理器301(例如CPU),存储器302,至少一个网络接口303,和至少一个通信总线304,用于实现这些装置之间的连接通信。处理器301用于执行存储器302中存储的可执行模块,例如计算机程序。存储器302可能包含高速随机存取存储器(RAM:Random Access Memory),也可能还包括非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。通过至少一个网络接口303(可以是有线或者无线)实现该网络设备与至少一个其他网元之间的通信连接,可以使用互联网,广域网、本地网、城域网等。
在一些实施方式中,存储器302存储了程序3021,程序3021可以被处理器301执行,这个程序包括:
第二节点接收至少两个第一节点发送的加密的数据集合,所述第一节点以及所述第二节点均为分布式网络中的节点;
所述第二节点根据预设的数据属性,利用所述加密的数据集合构造决策树;
所述第二节点根据所述预设的数据属性和所述决策树,获取数据的统计结果。
具体的实施步骤与图1所示的实施例相同,此处不再赘述。
从上可知,使用本发明实施例的基于分布式的数据统计方法,根据预设的数据属性,利用加密的数据集合构造决策树,使得数据在加密的情况下完成数据统计,从而还保证了数据的安全性。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。
上述装置和系统内的各模块之间的信息交互、执行过程等内容,由于与本发明方法实施例基于同一构思,具体内容可参见本发明方法实施例中的叙述,此处不再赘述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,上述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,上述的存储介质可为磁碟、光盘、只读存储记忆体(ROM:Read-Only Memory)或随机存储记忆体(RAM:Random Access Memory)等。
本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。

Claims (10)

  1. 一种基于分布式的数据统计的方法,其特征在于,所述方法包括:
    第二节点接收至少两个第一节点发送的加密的数据集合,所述第一节点以及所述第二节点均为分布式网络中的节点;
    所述第二节点根据预设的数据属性,利用所述加密的数据集合构造决策树;
    所述第二节点根据所述预设的数据属性和所述决策树,获取数据的统计结果。
  2. 如权利要求1所述的方法,其特征在于,所述第二节点接收至少两个第一节点发送的加密的数据集合之前,还包括:
    所述第二节点向所述第一节点发送公开密钥,以使得所述第一节点根据所述公开密钥对数据集合进行加密获得加密的数据集合。
  3. 如权利要求2所述的方法,其特征在于,所述第二节点根据预设的数据属性,利用所述加密的数据集合构造决策树之前,还包括:
    所述第二节点按照预设的排列规则,将所述加密的数据集合中的至少一列数据进行重新排列,以获得第一数据集合;
    所述第二节点根据私有密钥,对所述第一数据集合进行解密,获得第二数据集合,所述私有密钥与所述公有密钥对应;
    所述第二节点根据预设的数据属性,利用所述数据集合构造决策树包括:
    所述第二节点根据预设的数据属性,利用所述第二数据集合构造决策树。
  4. 如权利要求1至3任一所述的方法,其特征在于,所述第二节点根据预设的数据属性,利用所述加密的数据集合构造决策树包括:
    所述第二节点确定所述预设的数据属性的值;
    所述第二节点按照预设的方式从所述加密的数据集合中逐条获取数据,并确定所述数据的关键属性值;
    所述第二节点将所述预设的数据属性的值与所述数据的关键属性值进行 比较,并获取比较的结果;
    所述第二节点根据所述比较的结果,将所述获取的数据作为叶子节点插入到所述决策树中。
  5. 如权利要求4所述的方法,其特征在于,所述第二节点根据所述预设的数据属性和所述决策树,获取数据的统计结果包括:
    所述第二节点根据所述预设的数据属性以及所述预设的数据属性的值,确定在所述决策树中需要遍历的叶子节点;
    所述第二节点对所述需要遍历的叶子节点进行统计,获取统计的结果。
  6. 一种基于分布式的数据统计的装置,其特征在于,所述装置包括:
    接收单元,用于接收至少两个第一节点发送的加密的数据集合,所述第一节点以及所述第二节点均为分布式网络中的节点;
    构造单元,用于根据预设的数据属性,利用所述加密的数据集合构造决策树;
    获取单元,用于根据所述预设的数据属性和所述决策树,获取数据的统计结果。
  7. 如权利要求6所述的装置,其特征在于,所述装置还包括发送单元,
    所述发送单元,用于向所述第一节点发送公开密钥,以使得所述第一节点根据所述公开密钥对数据集合进行加密获得加密的数据集合;
    所述接收单元,用于接收至少两个第一节点发送的加密的数据集合。
  8. 如权利要求6或7所述的装置,其特征在于,所述装置还包括排列单元,解密单元;
    所述排列单元,具体用于按照预设的排列规则,将所述接收单元接收的加密的数据集合中的至少一列数据进行重新排列,以获得第一数据集合;
    所述解密单元,具体用于根据私有密钥,对所述第一数据集合进行解密,获得第二数据集合,所述私有密钥与所述公有密钥对应;
    所述构造单元,具体用于根据预设的数据属性,利用所述第二数据集合 构造决策树。
  9. 如权利要求6至8任一所述的装置,其特征在于,所述生成单元具体包括第一确定子单元,第二确定子单元,比较子单元以及插入子单元;
    所述第一确定子单元,具体用于确定所述预设的数据属性的值;
    所述第二确定子单元,具体用于按照预设的方式从所述加密的数据集合中逐条获取数据,并确定所述数据的关键属性值;
    所述比较子单元,具体用于将所述预设的数据属性的值与所述数据的关键属性值进行比较,并获取比较的结果;
    所述插入子单元,具体用于根据所述比较的结果,将所述获取的数据作为叶子节点插入到所述决策树中。
  10. 如权利要求9所述的装置,其特征在于,所述获取单元包括第三确定子单元和统计子单元;
    所述第三确定子单元,具体用于根据所述预设的数据属性以及所述第一确定子单元确定的预设的数据属性的值,确定在所述决策树中需要遍历的叶子节点;
    所述统计子单元,具体用于对所述需要遍历的叶子节点进行统计,获取统计的结果。
PCT/CN2014/088170 2014-03-29 2014-10-09 一种基于分布式的数据统计的方法 WO2015149497A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410124509.7 2014-03-29
CN201410124509.7A CN104951472A (zh) 2014-03-29 2014-03-29 一种基于分布式的数据统计的方法

Publications (1)

Publication Number Publication Date
WO2015149497A1 true WO2015149497A1 (zh) 2015-10-08

Family

ID=54166135

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/088170 WO2015149497A1 (zh) 2014-03-29 2014-10-09 一种基于分布式的数据统计的方法

Country Status (2)

Country Link
CN (1) CN104951472A (zh)
WO (1) WO2015149497A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569200A (zh) * 2021-08-03 2021-10-29 北京金山云网络技术有限公司 数据统计的方法、装置及服务器

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569666B (zh) * 2019-09-03 2023-09-08 深圳前海微众银行股份有限公司 一种基于区块链的数据统计的方法及装置
CN115409613A (zh) * 2022-09-13 2022-11-29 中债金科信息技术有限公司 债券风险检测模型训练方法和债券风险检测方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101105854A (zh) * 2007-08-23 2008-01-16 上海交通大学 基于决策树的远程教育环境中学生情况在线检测方法
CN102054002A (zh) * 2009-10-28 2011-05-11 中国移动通信集团公司 一种数据挖掘系统中决策树的生成方法及装置
US20110307423A1 (en) * 2010-06-09 2011-12-15 Microsoft Corporation Distributed decision tree training
CN103248492A (zh) * 2013-05-23 2013-08-14 清华大学 可验证的分布式隐私数据比较与排序方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100454243C (zh) * 2006-11-28 2009-01-21 北京龙阁创意数码科技有限公司 基于海量数据的多元动漫产品开发系统和开发方法
CN102053962B (zh) * 2009-11-02 2012-09-26 清华大学深圳研究生院 基于轻量型中间件的网络化rfid系统及数据交互方法
CN103391185B (zh) * 2013-08-12 2017-06-16 北京泰乐德信息技术有限公司 一种轨道交通监测数据的云安全存储和处理方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101105854A (zh) * 2007-08-23 2008-01-16 上海交通大学 基于决策树的远程教育环境中学生情况在线检测方法
CN102054002A (zh) * 2009-10-28 2011-05-11 中国移动通信集团公司 一种数据挖掘系统中决策树的生成方法及装置
US20110307423A1 (en) * 2010-06-09 2011-12-15 Microsoft Corporation Distributed decision tree training
CN103248492A (zh) * 2013-05-23 2013-08-14 清华大学 可验证的分布式隐私数据比较与排序方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569200A (zh) * 2021-08-03 2021-10-29 北京金山云网络技术有限公司 数据统计的方法、装置及服务器

Also Published As

Publication number Publication date
CN104951472A (zh) 2015-09-30

Similar Documents

Publication Publication Date Title
US11206132B2 (en) Multiparty secure computing method, device, and electronic device
TWI706279B (zh) 多方安全計算方法及裝置、電子設備
TWI714219B (zh) 基於區塊鏈的業務資料加密方法及裝置
US10250573B2 (en) Leveraging transport-layer cryptographic material
US9485096B2 (en) Encryption / decryption of data with non-persistent, non-shared passkey
CN110138802B (zh) 用户特征信息获取方法、装置,区块链节点、网络,及存储介质
CN107145792B (zh) 基于密文数据的多用户隐私保护数据聚类方法及系统
US9264407B2 (en) Computer-implemented system and method for establishing distributed secret shares in a private data aggregation scheme
US20170163413A1 (en) System and Method for Content Encryption in a Key/Value Store
US9020149B1 (en) Protected storage for cryptographic materials
US11381381B2 (en) Privacy preserving oracle
EP3703304B1 (en) Cloud-based secure computation of the median
JP2014002365A5 (zh)
JP2017225116A5 (zh)
KR101615137B1 (ko) 속성 기반의 데이터 접근 방법
US20190123896A1 (en) Quantum direct communication method with user authentication and apparatus using the same
US10063655B2 (en) Information processing method, trusted server, and cloud server
WO2018165835A1 (zh) 云密文访问控制方法及系统
US9641328B1 (en) Generation of public-private key pairs
CN114223175B (zh) 在防止获取或操控时间数据的同时生成网络数据的序列
WO2015149497A1 (zh) 一种基于分布式的数据统计的方法
WO2022072146A1 (en) Privacy preserving centroid models using secure multi-party computation
JP2022177209A (ja) 複数の集約サーバを使用してデータ操作を防止すること
CN108540486A (zh) 云密钥的生成和使用方法
US20210167945A1 (en) Identity-based hash proof system configuration apparatus, identity-based encryption apparatus, identity-based hash proof system configuration method and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14888372

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase
122 Ep: pct application non-entry in european phase

Ref document number: 14888372

Country of ref document: EP

Kind code of ref document: A1