CN117474131A - Isolated forest model generation method, data transmission method, device and electronic equipment - Google Patents

Isolated forest model generation method, data transmission method, device and electronic equipment Download PDF

Info

Publication number
CN117474131A
CN117474131A CN202311321674.7A CN202311321674A CN117474131A CN 117474131 A CN117474131 A CN 117474131A CN 202311321674 A CN202311321674 A CN 202311321674A CN 117474131 A CN117474131 A CN 117474131A
Authority
CN
China
Prior art keywords
node
split
attribute information
data set
confusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311321674.7A
Other languages
Chinese (zh)
Inventor
冯泽瑾
林元晟
刘承昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202311321674.7A priority Critical patent/CN117474131A/en
Publication of CN117474131A publication Critical patent/CN117474131A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses an isolated forest model generation method, a data transmission device and electronic equipment. One embodiment of the method comprises the following steps: for each of the orphan tree initial information in the orphan tree initial information set, performing the orphan tree generating step of: transmitting at least one confusion attribute information set and at least one node splitting attribute information corresponding to at least one node to be split to a first terminal; for each node to be split, the following generation steps are performed: receiving a first coded data set corresponding to a node to be split; determining a node data set corresponding to each next node corresponding to the node to be split based on the first coded data set; generating an isolated tree according to the next node set; and generating an isolated forest model according to each obtained isolated tree. The implementation mode is related to information safety, can protect the data safety of each participant of the isolated forest model, ensure ownership of the model by a model owner and avoid other participants from perceiving the model.

Description

Isolated forest model generation method, data transmission method, device and electronic equipment
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to an isolated forest model generation method, a data transmission device and electronic equipment.
Background
Anomaly detection is a technique for detecting anomalous data. At present, in the case of abnormality detection, the following methods are generally adopted: abnormal data in the dataset is detected using a pre-trained abnormality detection model. Wherein the anomaly detection model is trained based on data provided by at least one data provider.
However, the inventors found that when detecting abnormal data in a data set in the above manner, there are often the following technical problems:
data exchange is required to be continuously carried out with each participant in the model iteration process, so that data leakage of each participant of the model is easy to cause.
The above information disclosed in this background section is only for enhancement of understanding of the background of the inventive concept and, therefore, may contain information that does not form the prior art that is already known to those of ordinary skill in the art in this country.
Disclosure of Invention
The disclosure is in part intended to introduce concepts in a simplified form that are further described below in the detailed description. The disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Some embodiments of the present disclosure propose an isolated forest model generating method, a data transmitting method, a device and an electronic apparatus to solve the technical problems mentioned in the background section above.
In a first aspect, some embodiments of the present disclosure provide a method of generating an orphan forest model, the method comprising: for each of the orphan tree initial information in the orphan tree initial information set, performing the orphan tree generating step of: responding to at least one node to be split, corresponding to the initial information of the isolated tree, of which the node attribute information meets the confusion characteristic condition, and transmitting the at least one confusion attribute information set and the at least one node splitting attribute information corresponding to the at least one node to be split to a first terminal; for each of the at least one node to be split, performing the generating steps of: receiving a first coded data set which is sent by the first terminal and corresponds to the node to be split; determining a node data set corresponding to each next node in the next node group and corresponding to the node to be split based on the first coded data set; generating an isolated tree according to the next node set in response to determining that the node data set corresponding to the next node in the next node set meets a preset data condition; and generating an isolated forest model according to each obtained isolated tree.
Optionally, the method further comprises: and in response to determining that the obtained node data set corresponding to at least one next node in the next node set does not meet the preset data condition, determining the next node in the next node set which does not meet the preset data condition as a node to be split, obtaining at least one node to be split, and executing the orphan tree generation step again.
Optionally, the method further comprises: in response to determining that the node attribute information of the node to be split does not meet the confusion characteristic condition, determining the node to be split in the node to be split set as a target split node, and obtaining at least one target split node; for each of the at least one target split node, performing the steps of: determining node splitting attribute information corresponding to the target splitting node; and determining a node data set corresponding to each next node in the next node group corresponding to the target split node based on the node split attribute information.
Optionally, the confusion attribute information in the at least one confusion attribute information set is attribute information in a full-scale attribute information set, the node splitting attribute information in the at least one node splitting attribute information is attribute information in the full-scale attribute information set, and the node data in the node data set is full-scale sample data in the full-scale sample data set; and the method further comprises, before the performing the following orphan tree generating step for each orphan tree initial information in the orphan tree initial information set: receiving a second coded data set sent by the first terminal, and acquiring an uncoded data set, wherein the second coded data set comprises a coded attribute data set and a coded sample data set, and the uncoded data set comprises an uncoded attribute data set and an uncoded sample data set; determining the encoded attribute data set and the unencoded attribute data set as the full-scale attribute information set; and carrying out fusion processing on the coded sample data set and the uncoded sample data set to obtain the full sample data set.
Optionally, the sending the at least one confusion attribute information set and the at least one node splitting attribute information corresponding to the at least one node to be split to the first terminal includes: for each of the at least one node to be split, performing the following sending steps: sorting the confusion attribute information set and the node splitting attribute information corresponding to the nodes to be split to obtain an attribute information sequence; and transmitting the attribute information sequence to the first terminal.
Optionally, the determining, based on the first encoded data set, a node data set corresponding to each next node in the next node group corresponding to the node to be split includes: determining target coded data corresponding to the first coded data set according to the attribute information sequence corresponding to the node to be split; and for each next node in the next node group corresponding to the node to be split, determining an intersection of the target encoded data and the node data set corresponding to the node to be split as the node data set corresponding to the next node in response to determining that the next node is the first node.
Optionally, the method further comprises: and in response to determining that the next node is a second node, determining a difference set of the target encoded data and the node data set corresponding to the node to be split as the node data set corresponding to the next node.
Optionally, the method further comprises: determining an abnormal sample data set corresponding to the full sample data set based on the isolated forest model; and controlling an associated display device to display the abnormal sample data set.
In a second aspect, some embodiments of the present disclosure provide an orphan forest model generating apparatus, the apparatus comprising: an execution unit configured to execute, for each of the isolated tree initial information in the isolated tree initial information set, the following isolated tree generation step: responding to at least one node to be split, corresponding to the initial information of the isolated tree, of which the node attribute information meets the confusion characteristic condition, and transmitting the at least one confusion attribute information set and the at least one node splitting attribute information corresponding to the at least one node to be split to a first terminal; for each of the at least one node to be split, performing the generating steps of: receiving a first coded data set which is sent by the first terminal and corresponds to the node to be split; determining a node data set corresponding to each next node in the next node group and corresponding to the node to be split based on the first coded data set; generating an isolated tree according to the next node set in response to determining that the node data set corresponding to the next node in the next node set meets a preset data condition; and a generation unit configured to generate an isolated forest model from the obtained individual isolated trees.
Optionally, the isolated forest model generating apparatus described above is further configured to: and in response to determining that the obtained node data set corresponding to at least one next node in the next node set does not meet the preset data condition, determining the next node in the next node set which does not meet the preset data condition as a node to be split, obtaining at least one node to be split, and executing the orphan tree generation step again.
Optionally, the isolated forest model generating apparatus described above is still further configured to: in response to determining that the node attribute information of the node to be split does not meet the confusion characteristic condition, determining the node to be split in the node to be split set as a target split node, and obtaining at least one target split node; for each of the at least one target split node, performing the steps of: determining node splitting attribute information corresponding to the target splitting node; and determining a node data set corresponding to each next node in the next node group corresponding to the target split node based on the node split attribute information.
Optionally, the confusion attribute information in the at least one confusion attribute information set is attribute information in a full-scale attribute information set, the node splitting attribute information in the at least one node splitting attribute information is attribute information in the full-scale attribute information set, and the node data in the node data set is full-scale sample data in the full-scale sample data set. The isolated forest model generating apparatus is further configured to, before the performing the following isolated tree generating step for each of the isolated tree initial information in the isolated tree initial information set: receiving a second coded data set sent by the first terminal, and acquiring an uncoded data set, wherein the second coded data set comprises a coded attribute data set and a coded sample data set, and the uncoded data set comprises an uncoded attribute data set and an uncoded sample data set; determining the encoded attribute data set and the unencoded attribute data set as the full-scale attribute information set; and carrying out fusion processing on the coded sample data set and the uncoded sample data set to obtain the full sample data set.
Optionally, the execution unit is further configured to: for each of the at least one node to be split, performing the following sending steps: sorting the confusion attribute information set and the node splitting attribute information corresponding to the nodes to be split to obtain an attribute information sequence; and transmitting the attribute information sequence to the first terminal.
Optionally, the execution unit is further configured to: determining target coded data corresponding to the first coded data set according to the attribute information sequence corresponding to the node to be split; and for each next node in the next node group corresponding to the node to be split, determining an intersection of the target encoded data and the node data set corresponding to the node to be split as the node data set corresponding to the next node in response to determining that the next node is the first node.
Optionally, the execution unit is still further configured to: and in response to determining that the next node is a second node, determining a difference set of the target encoded data and the node data set corresponding to the node to be split as the node data set corresponding to the next node.
Optionally, the isolated forest model generating apparatus described above is still further configured to: determining an abnormal sample data set corresponding to the full sample data set based on the isolated forest model; and controlling an associated display device to display the abnormal sample data set.
In a third aspect, some embodiments of the present disclosure provide a data transmission method, including: receiving at least one confusion attribute information set and at least one node splitting attribute information sent by a second terminal; generating at least one confusion coded data set corresponding to the at least one confusion attribute information set and at least one split coded data corresponding to the at least one node split attribute information; for each of the at least one aliased code data set, determining split code data satisfying a preset matching condition from among the aliased code data set and the at least one split code data as a first code data set; the determined at least one first encoded data set is transmitted to the second terminal.
Optionally, the generating at least one confusion coded data set corresponding to the at least one confusion attribute information set and at least one split coded data corresponding to the at least one node split attribute information includes: generating confusion coded data based on the confusion attribute information for each confusion attribute information in the at least one confusion attribute information set; for each node split attribute information of the at least one node split attribute information, split encoded data is generated based on the node split attribute information.
In a fourth aspect, some embodiments of the present disclosure provide a data transmission apparatus, including: a receiving unit configured to receive at least one confusion attribute information set and at least one node splitting attribute information transmitted by the second terminal; a generating unit configured to generate at least one confusion coded data set corresponding to the at least one confusion attribute information set and at least one split coded data corresponding to the at least one node split attribute information; a determining unit configured to determine, for each of the at least one confusion coded data set, split coded data satisfying a preset matching condition among the confusion coded data set and the at least one split coded data set as a first coded data set; and a transmitting unit configured to transmit the determined at least one first encoded data set to the second terminal.
Optionally, the generating unit is further configured to: generating confusion coded data based on the confusion attribute information for each confusion attribute information in the at least one confusion attribute information set; for each node split attribute information of the at least one node split attribute information, split encoded data is generated based on the node split attribute information.
In a fifth aspect, some embodiments of the present disclosure provide an electronic device comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors causes the one or more processors to implement the method described in any of the implementations of the first aspect above.
In a sixth aspect, some embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method described in any of the implementations of the first aspect above.
In a seventh aspect, some embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements the method described in any of the implementations of the first aspect above.
The above embodiments of the present disclosure have the following advantageous effects: by the method for generating the isolated forest model, which is disclosed by some embodiments, the data security of each party of the model can be protected. Specifically, the data leakage of each party of the model is caused by: data exchange is required to be continuously carried out with each data provider in the model iteration process, so that data leakage of each model participant is easy to cause. Based on this, the isolated forest model generation method of some embodiments of the present disclosure performs the following isolated tree generation steps for each of the isolated tree initial information in the isolated tree initial information set: first, at least one confusion attribute information set and at least one node splitting attribute information corresponding to at least one node to be split are sent to a first terminal in response to determining that at least one node to be split in the node to be split set corresponding to the isolation tree initial information exists and the node attribute information meets the confusion characteristic condition. Thus, the true and obfuscated attributes for the split nodes and node data sets may be sent to the first terminal for the first terminal to divide the at least one node data provided by itself. Then, for each of the at least one node to be split, performing the following generating steps: receiving a first coded data set which is sent by the first terminal and corresponds to the node to be split; and determining a node data set corresponding to each next node in the next node group and corresponding to the node to be split based on the first coded data set. Therefore, the code data provided by other data providers can be received, and the node to be split is split according to the first code data set, so that each node of the next level corresponding to the node to be split and the corresponding node data set are obtained, and whether to continue splitting the node is conveniently determined according to the node data set corresponding to the node. And then, in response to determining that the obtained node data set corresponding to the next node in the next node set meets the preset data condition, generating an isolated tree according to the next node set. Therefore, when the node data set corresponding to each node at the bottommost layer of the isolated tree reaches the iteration termination condition, each node at the bottommost layer no longer needs to be split, and the generation of the isolated tree is finished. And finally, generating an isolated forest model according to each obtained isolated tree. Therefore, in the isolated forest model generation method of some embodiments of the present disclosure, in the process of model iteration, by adding confusion features for data exchange with each participant, each participant other than the model owner can be made difficult to obtain information of the model, so that the model data can be protected from leakage, ownership of the model by the model owner is ensured, and other participants are prevented from perceiving the model. And because the model can use the coded data provided by each participant to replace the real data, the real data corresponding to the coded data provided by each participant can be protected from leakage. Therefore, the data security of each participant of the isolated forest model can be protected.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
FIG. 1 is a schematic illustration of one application scenario of an orphan forest model generation method according to some embodiments of the present disclosure;
FIG. 2 is a flow chart of some embodiments of an orphan forest model generation method according to the present disclosure;
FIG. 3 is a flow chart of further embodiments of an orphan forest model generation method according to the present disclosure;
fig. 4 is a schematic diagram of one application scenario of a data transmission method according to some embodiments of the present disclosure;
FIG. 5 is a flow chart of some embodiments of a data transmission method according to the present disclosure;
FIG. 6 is a schematic structural diagram of some embodiments of an orphan forest model generation device according to the present disclosure;
fig. 7 is a schematic diagram of the structure of some embodiments of a data transmission apparatus according to the present disclosure;
fig. 8 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
Operations such as collection, storage, use, etc. of personal information (e.g., user history) of a user referred to in the present disclosure involve the relevant organization or individual being up to the end of the obligation including developing personal information security impact assessment, fulfilling informed obligations to the personal information body, soliciting authorized consent from the personal information body in advance, etc., before performing the corresponding operations.
The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 is a schematic diagram of one application scenario of an orphan forest model generation method according to some embodiments of the present disclosure.
In the application scenario of fig. 1, first, the electronic device 101 performs the following orphan tree generation step for each orphan tree initial information in the orphan tree initial information set 102: and in response to determining at least one node to be split, which is in a node set to be split and corresponds to the initial information of the isolated tree and has node attribute information meeting the confusion characteristic condition, sending at least one confusion attribute information set and at least one node splitting attribute information corresponding to the at least one node to be split to the first terminal. For each of the at least one node to be split, performing the generating steps of: and receiving a first coded data set which is transmitted by the first terminal and corresponds to the node to be split. And determining a node data set corresponding to each next node in the next node group and corresponding to the node to be split based on the first coded data set. And generating an isolated tree according to the next node set in response to determining that the obtained node data set corresponding to the next node in the next node set meets a preset data condition. For example, the above-described isolated tree initial information set 102 may include isolated tree initial information 1, isolated tree initial information 2, and isolated tree initial information 3. Then, an isolated forest model 103 is generated from each of the obtained isolated trees. For example, the isolated forest model 103 described above may include an isolated tree 1, an isolated tree 2, and an isolated tree 3.
The electronic device 101 may be hardware or software. When the electronic device is hardware, the electronic device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the electronic device is embodied as software, it may be installed in the above-listed hardware device. It may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein.
It should be understood that the number of terminal devices in fig. 1 is merely illustrative. There may be any number of electronic devices as desired for an implementation.
With continued reference to fig. 2, a flow 200 of some embodiments of an orphan forest model generation method according to the present disclosure is shown. The method for generating the isolated forest model comprises the following steps:
step 201, for each of the isolated tree initial information in the isolated tree initial information set, performing the following isolated tree generation step:
and step 2011, transmitting at least one confusion attribute information set and at least one node splitting attribute information corresponding to the at least one node to be split to the first terminal in response to determining that the node to be split set corresponding to the initial information of the isolated tree exists at least one node to be split, and the node attribute information of which meets the confusion characteristic condition.
In some embodiments, an execution body of the orphan forest model generating method (for example, the electronic device 101 shown in fig. 1) sends, to the first terminal, at least one confusion attribute information set and at least one node splitting attribute information corresponding to at least one node to be split, in response to determining that there is at least one node to be split in the node to be split set corresponding to the orphan tree initial information, where the node attribute information satisfies a confusion characteristic condition. Wherein, the above-mentioned orphan tree initial information in the orphan tree initial information set may be information required for creating one orphan tree. The above-mentioned orphan initial information in the orphan initial information set may include an initial node to be split and an initial node data set. For example, the initial node to be split may be a root node. The initial node data in the initial node data set may be user data corresponding to one user. For example, the user data may include, but is not limited to, at least one of: user identification, fixed value (e.g., fixed asset), mobile value (e.g., bank running water), number of orders. The node to be split in the node set to be split may be a node corresponding to a node data set satisfying the data quantity condition. The node data set may be a subset of the initial node data set. The data quantity condition may be that the quantity of node data in the node data set is greater than a preset value. The preset value may be a preset value. For example, the preset value may be 1. The node attribute information may be information of an attribute bound to a tree node, which is obtained in advance for data division, corresponding to the node to be split. The node attribute information may include an attribute identifier, a data terminal, a first attribute value, and a second attribute value. The attribute identifier may uniquely identify an attribute corresponding to the user data. The data terminal may be a source terminal for acquiring the attribute when the corresponding attribute is acquired. The first attribute value may be an upper limit value of the corresponding attribute. The second attribute value may be a lower limit value of the corresponding attribute. The confusion characteristic condition may be that the data terminal corresponding to the node to be split and included in the node attribute information is the first terminal. The first terminal may be a terminal providing user data. The confusion attribute information set in the at least one confusion attribute information set may have associated nodes to be split. The confusion attribute information in the at least one confusion attribute information set may be information in which the corresponding data terminal is an attribute of the first terminal. The node splitting attribute information in the at least one node splitting attribute information may have an associated node to be split. The node splitting attribute information in the at least one node splitting attribute information may be node attribute information corresponding to a node to be split.
In practice, the executing body may respond to determining at least one node to be split in the node set to be split corresponding to the initial information of the orphan tree, where the node attribute information meets the confusion characteristic condition, and for each node to be split in the at least one node to be split, may sequentially send, by a wired connection manner or a wireless connection manner, the confusion attribute information set and the node splitting attribute information corresponding to the node to be split to the first terminal. It should be noted that the wireless connection may include, but is not limited to, 3G/4G connections, wiFi connections, bluetooth connections, wiMAX connections, zigbee connections, UWB (ultra wideband) connections, and other now known or later developed wireless connection means.
In some optional implementations of some embodiments, the executing body may send at least one confusion attribute information set and at least one node splitting attribute information corresponding to the at least one node to be split to the first terminal through:
the first step, for each node to be split in the at least one node to be split, performs the following transmission steps:
and a first sub-step, sorting the confusion attribute information set and the node splitting attribute information corresponding to the node to be split to obtain an attribute information sequence. Wherein, the attribute information sequence may be an ordered set of attribute information arranged according to a predetermined rule. The predetermined rule may be a preset rule. For example, the predetermined rule may be that the node split attribute information is arranged in the first bit or the second bit.
As an example, first, the execution body may determine each of the confusion attribute information and the node splitting attribute information in the confusion attribute information set as attribute information, respectively, to obtain each attribute information. Then, the execution body may randomly arrange each attribute information to obtain an attribute information sequence.
And a second sub-step of transmitting the attribute information sequence to the first terminal. The execution body may directly send the attribute information sequence to the first terminal through a wired connection or a wireless connection.
Step 2012, for each node to be split in the at least one node to be split, performing the following generating steps:
and step 20121, receiving a first coded data set which is sent by the first terminal and corresponds to the node to be split.
In some embodiments, the executing body may receive, by using a wired connection or a wireless connection, a first encoded data set corresponding to the node to be split, where the first encoded data set is sent by the first terminal. The first encoded data in the first encoded data set may be a subset of the user encoded data set obtained by dividing the user encoded data set according to the corresponding attribute. The first encoded data in the first set of encoded data may comprise at least one user encoded data. The user-encoded data in the set of user-encoded data may be an identification of the user. For example, the user-encoded data in the user-encoded data set may be a number. The user encoded data in the set of user encoded data may comprise a user identification. The user identification may be an identification of the corresponding user.
Alternatively, the first encoded data set may be an ordered set.
Step 20122, determining a node data set corresponding to each next node in the next node group corresponding to the node to be split based on the first coded data set.
In some embodiments, the executing entity may determine a node data set corresponding to each next node in the next node group corresponding to the node to be split based on the first encoded data set. Wherein, the next node group may be a set of child nodes corresponding to the node to be split. The node data set corresponding to each next node in the next node group may be a subset of the node data set corresponding to the node to be split.
As an example, first, the execution body may randomly select one first encoded data from the first encoded data set. Then, the execution body may determine the selected first encoded data as a node data set corresponding to any one of the next nodes in the next node group. Finally, the executing body may determine the difference set between the selected first encoded data and the node data corresponding to the node to be split as the node data set corresponding to the next node meeting the preset unassigned condition in the next node group. The preset unallocated condition may be that the next node does not correspond to the node data set.
In some optional implementations of some embodiments, the executing entity may determine, based on the first encoded data set, a node data set corresponding to each next node in the next node group corresponding to the node to be split by:
and determining target coded data corresponding to the first coded data set according to the attribute information sequence corresponding to the node to be split. The target encoded data may be first encoded data corresponding to node splitting attribute information corresponding to the node to be split in the first encoded data set.
As an example, first, the execution body may determine, as the target sequence value, a sequence number corresponding to attribute information satisfying a preset attribute condition in the attribute information sequence corresponding to the node to be split. Then, the execution body may determine the first encoded data having the sequence number of the first encoded data set as the target sequence value as target encoded data.
And a second step of determining, for each next node in the next node group corresponding to the node to be split, an intersection of the target encoded data and the node data set corresponding to the node to be split as the node data set corresponding to the next node in response to determining that the next node is the first node. For example, the first node may be a left child node of the node to be split.
Optionally, the executing body may further determine, in response to determining that the next node is a second node, a difference set between the target encoded data and the node data set corresponding to the node to be split as the node data set corresponding to the next node. For example, the second node may be a right child node of the node to be split.
Optionally, the above execution body may further execute the following steps:
in the first step, in response to determining that the node attribute information of the node to be split does not meet the confusion characteristic condition in the node set to be split, determining the node to be split in the node set to be split, which does not meet the confusion characteristic condition in the node attribute information, as a target split node, and obtaining at least one target split node.
Second, for each of the at least one target split node, performing the steps of:
and a first sub-step of determining node splitting attribute information corresponding to the target splitting node. The execution body may determine node attribute information corresponding to the target split node as node split attribute information.
And a second sub-step of determining a node data set corresponding to each next node in the next node group corresponding to the target split node based on the node split attribute information.
As an example, first, the execution body may randomly select one attribute value as the target attribute value in a section composed of the first attribute value and the second attribute value corresponding to the node splitting attribute information. Then, the execution body may divide the node data set corresponding to the target split node according to the target attribute value, to obtain at least one node data subset. And finally, selecting a node data subset meeting preset unselected conditions from the at least one node data subset as a node data set of the next node for each next node in the next node group corresponding to the target split node. The preset unselected condition may be that the node data subset is not selected.
And step 2013, generating an isolated tree according to the next node set in response to determining that the obtained node data set corresponding to the next node in the next node set meets the preset data condition.
In some embodiments, the executing entity may generate the orphan tree according to the next node set in response to determining that the node data set corresponding to the next node in the obtained next node set satisfies the preset data condition. The preset data conditions may be: the node data sets corresponding to the next node in the next node set only comprise one node data.
As an example, the executing body may determine each node and the splitting path between the level of the next node set and the first level as an orphan tree in response to determining that the obtained node data set corresponding to the next node in the next node set satisfies the preset data condition. The first level may be a level where a node to be split corresponding to the initial information of the orphan tree is located. The splitting path may be a path between a node to be split and a corresponding next node.
Optionally, the executing body may further determine, in response to determining that the obtained node data set corresponding to at least one next node in the next node set does not meet the preset data condition, the next node in the next node set that does not meet the preset data condition as a node to be split, obtain at least one node to be split, and execute the orphan tree generating step again.
And 202, generating an isolated forest model according to each obtained isolated tree.
In some embodiments, the executing entity may generate an isolated forest model according to each obtained isolated tree. Wherein, the isolated forest model can be a model composed of individual isolated trees. The execution subject may determine each of the obtained isolated trees as an isolated forest model.
The above embodiments of the present disclosure have the following advantageous effects: by the method for generating the isolated forest model, which is disclosed by some embodiments, the data security of each party of the model can be protected. Specifically, the data leakage of each party of the model is caused by: data exchange is required to be continuously carried out with each data provider in the model iteration process, so that data leakage of each model participant is easy to cause. Based on this, the isolated forest model generation method of some embodiments of the present disclosure performs the following isolated tree generation steps for each of the isolated tree initial information in the isolated tree initial information set: first, at least one confusion attribute information set and at least one node splitting attribute information corresponding to at least one node to be split are sent to a first terminal in response to determining that at least one node to be split in the node to be split set corresponding to the isolation tree initial information exists and the node attribute information meets the confusion characteristic condition. Thus, the true and obfuscated attributes for the split nodes and node data sets may be sent to the first terminal for the first terminal to divide the at least one node data provided by itself. Then, for each of the at least one node to be split, performing the following generating steps: receiving a first coded data set which is sent by the first terminal and corresponds to the node to be split; and determining a node data set corresponding to each next node in the next node group and corresponding to the node to be split based on the first coded data set. Therefore, the code data provided by other data providers can be received, and the node to be split is split according to the first code data set, so that each node of the next level corresponding to the node to be split and the corresponding node data set are obtained, and whether to continue splitting the node is conveniently determined according to the node data set corresponding to the node. And then, in response to determining that the obtained node data set corresponding to the next node in the next node set meets the preset data condition, generating an isolated tree according to the next node set. Therefore, when the node data set corresponding to each node at the bottommost layer of the isolated tree reaches the iteration termination condition, each node at the bottommost layer no longer needs to be split, and the generation of the isolated tree is finished. And finally, generating an isolated forest model according to each obtained isolated tree. Therefore, in the isolated forest model generation method of some embodiments of the present disclosure, in the process of model iteration, by adding confusion features for data exchange with each participant, each participant other than the model owner can be made difficult to obtain information of the model, so that the model data can be protected from leakage, ownership of the model by the model owner is ensured, and other participants are prevented from perceiving the model. And because the model can use the coded data provided by each participant to replace the real data, the real data corresponding to the coded data provided by each participant can be protected from leakage. Therefore, the data security of each participant of the isolated forest model can be protected.
With further reference to FIG. 3, a flow 300 of further embodiments of an orphan forest model generation method is shown. The process 300 of the isolated forest model generation method includes the steps of:
step 301, receiving a second coded data set sent by the first terminal, and obtaining an uncoded data set.
In some embodiments, the executing entity may receive the second encoded data set sent by the first terminal through a wired connection or a wireless connection, and obtain the unencoded data set from the user database. The second encoded data in the second encoded data set may be user data after the encoding process. For example, the encoding process may be numbering user data. The second encoded data set may comprise an encoded attribute data set and an encoded sample data set. The coded attribute data in the coded attribute data set may be a coded attribute identifier of an attribute corresponding to the user data. The coded attribute identifier may be a unique identifier obtained by coding an attribute. For example, the coded attribute identification may be a number. The encoded sample data in the encoded sample data set may be an encoded user identifier corresponding to a user. The coded user identification may be a user identification. The uncoded data in the uncoded data set may be user data which is not subjected to the coding process. The unencoded data set may include an unencoded attribute data set and an unencoded sample data set. The uncoded attribute data in the uncoded attribute data set may be attribute data corresponding to user data. For example, the attribute data may include, but is not limited to, at least one of: name of the attribute, value interval, etc. The uncoded sample data in the uncoded sample data set may be user data corresponding to a user. The user database may be a database storing at least one user data.
Optionally, before the receiving the second encoded data set sent by the first terminal and acquiring the unencoded data set, the executing body and the first terminal may perform main attribute alignment on the user data. The primary attribute may be a unique identifier for the user, for example, the primary attribute may be an identification code of the user.
Step 302, the encoded attribute data set and the unencoded attribute data set are determined as a full set of attribute information.
In some embodiments, the execution body may determine the encoded attribute data set and the unencoded attribute data set as a full-scale attribute information set. The total attribute information set may be an information set having total attribute information.
As an example, the execution body may determine each encoded attribute data in the encoded attribute data set and each unencoded attribute data in the unencoded attribute data set as the full-size attribute information, and obtain the full-size attribute information set.
And step 303, performing fusion processing on the coded sample data set and the uncoded sample data set to obtain a full sample data set.
In some embodiments, the execution body may perform fusion processing on the encoded sample data set and the unencoded sample data set to obtain a full sample data set. Wherein the full sample data set may be an information set having the full sample data.
As an example, the execution body may determine, for each uncoded sample data in the uncoded sample data set, the uncoded sample data and the coded sample data satisfying a preset user matching condition in the coded sample data set as the full-size sample data. The preset user matching condition may be that an attribute value of a main attribute corresponding to the encoded sample data is the same as an attribute value of a main attribute corresponding to the unencoded sample data.
Optionally, the confusion attribute information in the at least one confusion attribute information set is attribute information in a full-scale attribute information set. The node split attribute information in the at least one node split attribute information is attribute information in a full-scale attribute information set. The node data in the node data set is the full-scale sample data in the full-scale sample data set.
Step 304, for each orphan tree initial information in the orphan tree initial information set, performing the orphan tree generation step of:
step 3041, in response to determining that at least one node to be split exists in the node set to be split corresponding to the initial information of the isolated tree, wherein the node attribute information of the node set to be split meets the confusion characteristic condition, sending at least one confusion attribute information set corresponding to the at least one node to be split and at least one node splitting attribute information to the first terminal.
Step 3042, for each node to be split in the at least one node to be split, performing the following generating steps:
and step 30421, receiving a first coded data set sent by the first terminal and corresponding to the node to be split.
Step 30422, determining a node data set corresponding to each next node in the next node group corresponding to the node to be split based on the first encoded data set.
Step 3043, in response to determining that the obtained node data set corresponding to the next node in the next node set meets the preset data condition, generating an orphan tree according to the next node set.
And 305, generating an isolated forest model according to each obtained isolated tree.
In some embodiments, the specific implementation of steps 304-305 and the technical effects thereof may refer to steps 201-202 in the corresponding embodiment of fig. 2, which are not described herein.
Step 306, determining an abnormal sample data set corresponding to the full sample data set based on the isolated forest model.
In some embodiments, the executing entity may determine an abnormal sample data set corresponding to the full sample data set based on the isolated forest model. The abnormal sample data in the abnormal sample data set may be user data with a high outlier degree.
As an example, the execution subject may perform the step of determining an abnormal sample data set corresponding to the full-scale sample data set based on the isolated forest model:
and determining a node depth value set corresponding to each full sample data in the full sample data set based on the isolated forest model to obtain a node depth value set. The node depth value group in the node depth value group set may be a set of depth values of leaf nodes on each isolated tree corresponding to the same full sample data. The execution body may execute the following steps for each full sample data in the full sample data set:
a first sub-step of, for each of the isolated trees in the isolated forest model, performing the steps of:
and step one, determining the path length value from the leaf node to the root node corresponding to the full sample data. The execution body can determine the path length value from the leaf node to the root node corresponding to the full sample data through a preset binary tree traversal method.
As an example, the binary tree traversal method described above may include, but is not limited to, at least one of: first order traversal, middle order traversal, etc.
And secondly, determining the path length value as a node depth value.
And a second step of determining, for each of the node depth value sets, an expected value of each of the node depth values in the node depth value set as a sample expected value.
Third, determining an average of the determined at least one sample expected value as an expected average.
Fourth, for each node depth value group in the node depth value group set, the following steps are performed:
and a first sub-step of determining the ratio of the expected value of the sample corresponding to the node depth value set to the expected average value as an expected duty ratio.
A second substep of determining the inverse of the desired duty cycle as the desired index value.
And a third sub-step of determining, as an outlier, a power value based on a preset base value and an exponent based on the desired exponent value.
And a fourth sub-step of determining the total sample data corresponding to the node depth value group and the outlier value as sample outlier information.
And fifthly, determining at least one sample outlier which meets the preset outlier condition in the determined sample outlier information as an abnormal sample data set. The preset outlier condition may be that an outlier corresponding to the sample outlier information is greater than an outlier of at least a preset ratio value in each outlier. The preset ratio may be 95%.
In step 307, the associated display device is controlled to display the abnormal sample data set.
In some embodiments, the execution subject may control an associated display device to display the abnormal sample data set. The display device may be an electronic device with a display screen.
As can be seen in fig. 3, the flow 300 of the orphan forest model generation method in some embodiments corresponding to fig. 3 embodies the steps of generating full-volume attribute information and full-volume sample data sets from the data of the respective data providers, and for use by the model to derive the abnormal sample data sets, as compared to the description of some embodiments corresponding to fig. 2. Therefore, the scheme described by the embodiments adopts a mode of combining federal learning and an isolated forest, and expands the dimension of the data attribute in the isolated forest by introducing data from different sources, so that an isolated forest model with more accurate abnormality detection on the data can be obtained.
Fig. 4 is a schematic diagram of an application scenario of a data transmission method according to some embodiments of the present disclosure.
In the application scenario of fig. 4, first, the electronic device 401 may receive at least one confusion attribute information set 402 and at least one node splitting attribute information 403 transmitted by the second terminal. For example, the at least one confusion attribute information set 402 may include a confusion attribute information set 1 and a confusion attribute information set 2. The at least one node split attribute information 403 may include node split attribute information 1 and node split attribute information 2. Then, the electronic device 401 generates at least one confusion coded data set 404 corresponding to the at least one confusion attribute information set 402 and at least one split coded data 405 corresponding to the at least one node split attribute information. For example, the at least one confusion coded data set 404 may include a confusion coded data set 1 corresponding to confusion attribute information set 1 and a confusion coded data set 2 corresponding to confusion attribute information set 2. The at least one split encoded data 405 may include split encoded data 1 corresponding to node split attribute information 1 and split encoded data 2 corresponding to node split attribute information 2. Then, for each of the at least one confusion coded data set 404, split coded data satisfying a preset matching condition among the confusion coded data set and the at least one split coded data 405 is determined as a first coded data set. Finally, the determined at least one first encoded data set 406 is transmitted to the second terminal. For example, the at least one first encoded data set 406 may include a first encoded data set 1 and a first encoded data set 2. Wherein the first encoded data set 1 may be composed of a mixed encoded data set 1 and split encoded data 1. The first encoded data set 2 may be formed by obfuscating the encoded data set 2 and splitting the encoded data 2.
The electronic device 401 may be hardware or software. When the electronic device is hardware, the electronic device may be implemented as a distributed cluster formed by a plurality of servers or terminal devices, or may be implemented as a single server or a single terminal device. When the electronic device is embodied as software, it may be installed in the above-listed hardware device. It may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein.
It should be understood that the number of electronic devices in fig. 4 is merely illustrative. There may be any number of electronic devices as desired for an implementation.
With continued reference to fig. 5, a flow 500 of some embodiments of a data transmission method according to the present disclosure is shown. The data transmission method comprises the following steps:
step 501, at least one confusion attribute information set and at least one node splitting attribute information sent by a second terminal are received.
In some embodiments, the execution body of the data transmission method may receive the at least one confusion attribute information set and the at least one node splitting attribute information transmitted by the second terminal through a wired connection or a wireless connection. The second terminal may be a terminal that generates an isolated forest model.
Step 502 generates at least one confusion coded data set corresponding to the at least one confusion attribute information set and at least one split coded data corresponding to the at least one node split attribute information.
In some embodiments, the executing entity may generate the at least one confusion coded data set corresponding to the at least one confusion attribute information set and the at least one split coded data corresponding to the at least one node split attribute information in various manners. The confusion coded data in the at least one confusion coded data set may be a set of at least one coded sample data obtained by dividing a coded sample data set by confusion attribute information. The split encoded data of the at least one split encoded data may be a set of at least one encoded sample data obtained by dividing an encoded sample data set by node split attribute information.
In some optional implementations of some embodiments, the executing entity may generate the at least one confusion coded data set corresponding to the at least one confusion attribute information set and the at least one split coded data corresponding to the at least one node split attribute information by:
First, for each confusion attribute information in the at least one confusion attribute information set, confusion coded data is generated based on the confusion attribute information.
As an example, the execution subject generates obfuscated encoded data based on the obfuscated attribute information by:
and a first sub-step of randomly selecting an attribute value from the value interval corresponding to the confusion attribute information to be determined as a first splitting threshold. The value interval may be composed of an upper limit value and a lower limit value corresponding to the confusion attribute information.
A second sub-step of determining at least one of the encoded sample data sets satisfying the first split threshold condition as an encoded sample data subset as aliased encoded data. The first split threshold condition may be that an attribute value of confusion attribute information corresponding to encoded sample data is smaller than and/or equal to the first split threshold.
Optionally, the executing body may further generate confusion coded data based on the confusion attribute information by:
and a first sub-step of determining an average value of the upper limit value and the lower limit value corresponding to the confusion attribute information as a first split threshold.
A second sub-step of determining at least one of the encoded sample data sets satisfying the first split threshold condition as an encoded sample data subset as aliased encoded data.
And a second step of generating split encoded data based on the node split attribute information for each of the at least one node split attribute information.
As an example, the execution body generates split encoded data based on the node split attribute information by:
and a first sub-step of randomly selecting an attribute value from the value interval corresponding to the node splitting attribute information to determine the attribute value as a second splitting threshold value. The value interval corresponding to the node splitting attribute information may be composed of an upper limit value and a lower limit value corresponding to the node splitting attribute information.
And a second sub-step of determining at least one code sample data satisfying a second split threshold condition in the code sample data set as a code sample data subset as split code data. The second splitting threshold condition may be that an attribute value of node splitting attribute information corresponding to encoded sample data is less than and/or equal to the second splitting threshold.
Optionally, the executing body may further generate split encoded data based on the node split attribute information by:
and a first sub-step of determining an average value of the upper limit value and the lower limit value corresponding to the node splitting attribute information as a second splitting threshold value.
And a second sub-step of determining at least one code sample data satisfying a second split threshold condition in the code sample data set as a code sample data subset as split code data.
If the first split threshold condition is that the attribute value of the confusion attribute information corresponding to the encoded sample data is equal to or greater than the first split threshold corresponding to the confusion attribute information, the second split threshold condition may be that the attribute value of the node split attribute information corresponding to the encoded sample data is equal to or greater than the second split threshold corresponding to the node split attribute information.
Step 503, for each of the at least one aliased coded data set, determining split coded data satisfying a preset matching condition from among the aliased coded data set and the at least one split coded data as a first coded data set.
In some embodiments, the execution body may determine, for each of the at least one confusion coded data set, split coded data satisfying a preset matching condition among the confusion coded data set and the at least one split coded data set as the first coded data set. The preset matching condition may be that the split encoded data and the confusion encoded data set are received simultaneously, or that the receiving order of the split encoded data corresponding to at least one split encoded data set is the same as the receiving order of the confusion encoded data set corresponding to at least one confusion encoded data set.
Step 504, the determined at least one first encoded data set is transmitted to the second terminal.
In some embodiments, the executing entity may send the determined at least one first encoded data set to the second terminal. The execution body may sequentially send each first encoded data set to the second terminal, or may simultaneously send each first encoded data set to the second terminal, and the specific sending manner is not limited herein.
The above embodiments of the present disclosure have the following advantageous effects: according to the data transmission method of some embodiments of the present disclosure, it may not be necessary to determine whether the received attribute is a real attribute of the model for splitting the node, only the code sample data set needs to be divided for each received attribute, and each subset of the code sample data set obtained after division is transmitted to the second terminal. Based on this, the model data can be protected from other participants than the model owner, and thus, the data security of the model can be protected.
With further reference to fig. 6, as an implementation of the method shown in the above figures, the present disclosure provides embodiments of an orphan forest model generating apparatus, which correspond to those method embodiments shown in fig. 2, and which are particularly applicable in various electronic devices.
As shown in fig. 6, an isolated forest model generating apparatus 600 includes: an execution unit 601 and a generation unit 602. Wherein the execution unit 601 is configured to execute, for each orphan tree initial information in the orphan tree initial information set, the following orphan tree generation step: responding to at least one node to be split, corresponding to the initial information of the isolated tree, of which the node attribute information meets the confusion characteristic condition, and transmitting the at least one confusion attribute information set and the at least one node splitting attribute information corresponding to the at least one node to be split to a first terminal; for each of the at least one node to be split, performing the generating steps of: receiving a first coded data set which is sent by the first terminal and corresponds to the node to be split; determining a node data set corresponding to each next node in the next node group and corresponding to the node to be split based on the first coded data set; generating an isolated tree according to the next node set in response to determining that the node data set corresponding to the next node in the next node set meets a preset data condition; a generating unit 602 configured to generate an isolated forest model from the obtained individual isolated trees.
In some optional implementations of some embodiments, the isolated forest model generating apparatus 600 described above may be further configured to: and in response to determining that the obtained node data set corresponding to at least one next node in the next node set does not meet the preset data condition, determining the next node in the next node set which does not meet the preset data condition as a node to be split, obtaining at least one node to be split, and executing the orphan tree generation step again.
In some optional implementations of some embodiments, the isolated forest model generating apparatus 600 described above may be further configured to: in response to determining that the node attribute information of the node to be split does not meet the confusion characteristic condition, determining the node to be split in the node to be split set as a target split node, and obtaining at least one target split node; for each of the at least one target split node, performing the steps of: determining node splitting attribute information corresponding to the target splitting node; and determining a node data set corresponding to each next node in the next node group corresponding to the target split node based on the node split attribute information.
In some optional implementations of some embodiments, the confusion attribute information in the at least one confusion attribute information set is attribute information in a full-scale attribute information set, the node splitting attribute information in the at least one node splitting attribute information is attribute information in the full-scale attribute information set, and the node data in the node data set is full-scale sample data in the full-scale sample data set. The isolated forest model generating apparatus 600 may be further configured to, before performing the following isolated tree generating step for each of the isolated tree initial information in the isolated tree initial information set: receiving a second coded data set sent by the first terminal, and acquiring an uncoded data set, wherein the second coded data set comprises a coded attribute data set and a coded sample data set, and the uncoded data set comprises an uncoded attribute data set and an uncoded sample data set; determining the encoded attribute data set and the unencoded attribute data set as the full-scale attribute information set; and carrying out fusion processing on the coded sample data set and the uncoded sample data set to obtain the full sample data set.
In some alternative implementations of some embodiments, the execution unit 601 may be further configured to: for each of the at least one node to be split, performing the following sending steps: sorting the confusion attribute information set and the node splitting attribute information corresponding to the nodes to be split to obtain an attribute information sequence; and transmitting the attribute information sequence to the first terminal.
In some alternative implementations of some embodiments, the execution unit 601 may be further configured to: determining target coded data corresponding to the first coded data set according to the attribute information sequence corresponding to the node to be split; and for each next node in the next node group corresponding to the node to be split, determining an intersection of the target encoded data and the node data set corresponding to the node to be split as the node data set corresponding to the next node in response to determining that the next node is the first node.
In some alternative implementations of some embodiments, the execution unit 601 may be further configured to: and in response to determining that the next node is a second node, determining a difference set of the target encoded data and the node data set corresponding to the node to be split as the node data set corresponding to the next node.
In some optional implementations of some embodiments, the isolated forest model generating apparatus 600 may be still further configured to: determining an abnormal sample data set corresponding to the full sample data set based on the isolated forest model; and controlling an associated display device to display the abnormal sample data set.
It will be appreciated that the elements described in the orphan forest model generating apparatus 600 correspond to the individual steps in the method described with reference to fig. 2. Thus, the operations, features and resulting benefits described above with respect to the method are equally applicable to the apparatus 600 and the units contained therein, and are not described in detail herein.
With further reference to fig. 7, as an implementation of the method shown in the foregoing figures, the present disclosure provides some embodiments of a data transmission apparatus, which correspond to those method embodiments shown in fig. 5, and which are particularly applicable to various electronic devices.
As shown in fig. 7, a data transmission apparatus 700 includes: a receiving unit 701, a generating unit 702, a determining unit 703, and a transmitting unit 704. Wherein the receiving unit 701 is configured to receive at least one confusion attribute information set and at least one node splitting attribute information sent by the second terminal; a generating unit 702 configured to generate at least one confusion coded data set corresponding to the at least one confusion attribute information set and at least one split coded data corresponding to the at least one node split attribute information; a determining unit 703 configured to determine, for each of the at least one confusion coded data set, split coded data satisfying a preset matching condition, of the confusion coded data set and the at least one split coded data set, as a first coded data set; a transmitting unit 704 configured to transmit the determined at least one first encoded data set to the above-mentioned second terminal.
In some optional implementations of some embodiments, the generating unit 702 is further configured to: generating confusion coded data based on the confusion attribute information for each confusion attribute information in the at least one confusion attribute information set; for each node split attribute information of the at least one node split attribute information, split encoded data is generated based on the node split attribute information.
It will be appreciated that the elements described in the data transmission device 700 correspond to the various steps in the method described with reference to fig. 5. Thus, the operations, features and resulting benefits described above for the method are equally applicable to the apparatus 700 and the units contained therein, and are not described in detail herein.
Referring now to fig. 8, a schematic diagram of an electronic device 800 (e.g., electronic device 101 of fig. 1) suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 8 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.
As shown in fig. 8, the electronic device 800 may include a processing means (e.g., a central processor, a graphics processor, etc.) 801, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the electronic device 800 are also stored. The processing device 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
In general, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, etc.; storage 808 including, for example, magnetic tape, hard disk, etc.; communication means 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 shows an electronic device 800 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead. Each block shown in fig. 8 may represent one device or a plurality of devices as needed.
In particular, according to some embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communication device 809, or from storage device 808, or from ROM 802. The above-described functions defined in the methods of some embodiments of the present disclosure are performed when the computer program is executed by the processing device 801.
It should be noted that, in some embodiments of the present disclosure, the computer readable medium may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: for each of the orphan tree initial information in the orphan tree initial information set, performing the orphan tree generating step of: responding to at least one node to be split, corresponding to the initial information of the isolated tree, of which the node attribute information meets the confusion characteristic condition, and transmitting the at least one confusion attribute information set and the at least one node splitting attribute information corresponding to the at least one node to be split to a first terminal; for each of the at least one node to be split, performing the generating steps of: receiving a first coded data set which is sent by the first terminal and corresponds to the node to be split; determining a node data set corresponding to each next node in the next node group and corresponding to the node to be split based on the first coded data set; generating an isolated tree according to the next node set in response to determining that the node data set corresponding to the next node in the next node set meets a preset data condition; and generating an isolated forest model according to each obtained isolated tree. Receiving at least one confusion attribute information set and at least one node splitting attribute information sent by a second terminal; generating at least one confusion coded data set corresponding to the at least one confusion attribute information set and at least one split coded data corresponding to the at least one node split attribute information; for each of the at least one aliased code data set, determining split code data satisfying a preset matching condition from among the aliased code data set and the at least one split code data as a first code data set; the determined at least one first encoded data set is transmitted to the second terminal.
Computer program code for carrying out operations for some embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in some embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The described units may also be provided in a processor, for example, described as: a processor includes an execution unit and a generation unit. The names of these units do not constitute limitations on the unit itself in some cases, and the generation unit may also be described as "a unit that generates an isolated forest model from each of the resulting isolated trees", for example.
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
Some embodiments of the present disclosure also provide a computer program product comprising a computer program which, when executed by a processor, implements any of the isolated forest model generation methods and a data transmission method described above.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above technical features, but encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the invention. Such as the above-described features, are mutually substituted with (but not limited to) the features having similar functions disclosed in the embodiments of the present disclosure.

Claims (15)

1. An isolated forest model generation method, comprising:
for each of the orphan tree initial information in the orphan tree initial information set, performing the orphan tree generating step of:
responding to at least one node to be split, corresponding to the initial information of the isolated tree, of which the node attribute information meets the confusion characteristic condition, and transmitting the at least one confusion attribute information set and the at least one node splitting attribute information corresponding to the at least one node to be split to a first terminal;
for each of the at least one node to be split, performing the generating steps of:
receiving a first coded data set which is sent by the first terminal and corresponds to the node to be split;
determining a node data set corresponding to each next node in the next node group and corresponding to the node to be split based on the first coded data set;
generating an isolated tree according to the next node set in response to determining that the node data set corresponding to the next node in the next node set meets a preset data condition;
and generating an isolated forest model according to each obtained isolated tree.
2. The method of claim 1, wherein the method further comprises:
And in response to determining that the obtained node data set corresponding to at least one next node in the next node set does not meet the preset data condition, determining the next node in the next node set which does not meet the preset data condition as a node to be split, obtaining at least one node to be split, and executing the orphan tree generation step again.
3. The method of claim 1, wherein the method further comprises:
in response to determining that node attribute information of the nodes to be split does not meet the confusion characteristic condition, determining the nodes to be split in the nodes to be split, which do not meet the confusion characteristic condition, as target split nodes, and obtaining at least one target split node;
for each of the at least one target split node, performing the steps of:
determining node splitting attribute information corresponding to the target splitting node;
and determining a node data set corresponding to each next node in the next node group corresponding to the target split node based on the node split attribute information.
4. The method of claim 1, wherein the confusion attribute information in the at least one confusion attribute information set is attribute information in a full-scale attribute information set, the node split attribute information in the at least one node split attribute information is attribute information in a full-scale attribute information set, and the node data in the node data set is full-scale sample data in a full-scale sample data set; and
Before the performing the following orphan tree generating step for each orphan tree initial information in the orphan tree initial information set, the method further comprises:
receiving a second coded data set sent by the first terminal, and acquiring an uncoded data set, wherein the second coded data set comprises a coded attribute data set and a coded sample data set, and the uncoded data set comprises the uncoded attribute data set and the uncoded sample data set;
determining the encoded attribute data set and the unencoded attribute data set as the full set of attribute information;
and carrying out fusion processing on the coded sample data set and the uncoded sample data set to obtain the full sample data set.
5. The method of claim 1, wherein the sending the at least one confusion attribute information set and the at least one node splitting attribute information corresponding to the at least one node to be split to the first terminal comprises:
for each of the at least one node to be split, performing the following sending steps:
sorting the confusion attribute information set and the node splitting attribute information corresponding to the node to be split to obtain an attribute information sequence;
And transmitting the attribute information sequence to the first terminal.
6. The method of claim 5, wherein the determining, based on the first encoded data set, a node data set corresponding to each next node in a next node group corresponding to the node to be split comprises:
determining target coded data corresponding to the first coded data set according to the attribute information sequence corresponding to the node to be split;
and for each next node in a next node group corresponding to the node to be split, determining an intersection of the target coded data and the node data set corresponding to the node to be split as the node data set corresponding to the next node in response to determining that the next node is a first node.
7. The method of claim 6, wherein the method further comprises:
and in response to determining that the next node is a second node, determining a difference set of the target encoded data and the node data set corresponding to the node to be split as the node data set corresponding to the next node.
8. The method of claim 4, wherein the method further comprises:
determining an abnormal sample data set corresponding to the full sample data set based on the isolated forest model;
And controlling an associated display device to display the abnormal sample data set.
9. A data transmission method, comprising:
receiving at least one confusion attribute information set and at least one node splitting attribute information sent by a second terminal;
generating at least one confusion coded data set corresponding to the at least one confusion attribute information set and at least one split coded data corresponding to the at least one node split attribute information;
for each of the at least one aliased coded data set, determining split coded data of the aliased coded data set and the at least one split coded data that meets a preset matching condition as a first coded data set;
the determined at least one first encoded data set is transmitted to the second terminal.
10. The method of claim 9, wherein the generating at least one confusion coded data set corresponding to the at least one confusion attribute information set and at least one split coded data corresponding to the at least one node split attribute information comprises:
generating, for each confusion attribute information in the at least one confusion attribute information set, confusion coded data based on the confusion attribute information;
For each node split attribute information of the at least one node split attribute information, split encoded data is generated based on the node split attribute information.
11. An isolated forest model generation device, comprising:
an execution unit configured to execute, for each of the isolated tree initial information in the isolated tree initial information set, the following isolated tree generation step:
responding to at least one node to be split, corresponding to the initial information of the isolated tree, of which the node attribute information meets the confusion characteristic condition, and transmitting the at least one confusion attribute information set and the at least one node splitting attribute information corresponding to the at least one node to be split to a first terminal;
for each of the at least one node to be split, performing the generating steps of:
receiving a first coded data set which is sent by the first terminal and corresponds to the node to be split;
determining a node data set corresponding to each next node in the next node group and corresponding to the node to be split based on the first coded data set;
generating an isolated tree according to the next node set in response to determining that the node data set corresponding to the next node in the next node set meets a preset data condition;
And a generation unit configured to generate an isolated forest model from the obtained individual isolated trees.
12. A data transmission apparatus comprising:
a receiving unit configured to receive at least one confusion attribute information set and at least one node splitting attribute information transmitted by the second terminal;
a generating unit configured to generate at least one confusion coded data set corresponding to the at least one confusion attribute information set and at least one split coded data corresponding to the at least one node split attribute information;
a determining unit configured to determine, for each of the at least one confusion coded data set, split coded data satisfying a preset matching condition among the confusion coded data set and the at least one split coded data as a first coded data set;
and a transmitting unit configured to transmit the determined at least one first encoded data set to the second terminal.
13. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-10.
14. A computer readable medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1-10.
15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-10.
CN202311321674.7A 2023-10-12 2023-10-12 Isolated forest model generation method, data transmission method, device and electronic equipment Pending CN117474131A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311321674.7A CN117474131A (en) 2023-10-12 2023-10-12 Isolated forest model generation method, data transmission method, device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311321674.7A CN117474131A (en) 2023-10-12 2023-10-12 Isolated forest model generation method, data transmission method, device and electronic equipment

Publications (1)

Publication Number Publication Date
CN117474131A true CN117474131A (en) 2024-01-30

Family

ID=89626629

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311321674.7A Pending CN117474131A (en) 2023-10-12 2023-10-12 Isolated forest model generation method, data transmission method, device and electronic equipment

Country Status (1)

Country Link
CN (1) CN117474131A (en)

Similar Documents

Publication Publication Date Title
EP3965023A1 (en) Method and device for constructing decision trees
CN113240524A (en) Method and device for detecting abnormality of account in federal learning system and electronic equipment
CN112434620B (en) Scene text recognition method, device, equipment and computer readable medium
CN109495266B (en) Data encryption method and device based on random number
CN112182109A (en) Distributed data coding storage method based on block chain and electronic equipment
CN116167868A (en) Risk identification method, apparatus, device and storage medium based on privacy calculation
CN110705635B (en) Method and apparatus for generating an isolated forest
CN117408646A (en) Electronic signature signing method, electronic signature signing device, electronic equipment and computer readable medium
CN111610938B (en) Distributed data code storage method, electronic device and computer readable storage medium
CN117474131A (en) Isolated forest model generation method, data transmission method, device and electronic equipment
CN112507676B (en) Method and device for generating energy report, electronic equipment and computer readable medium
US20230418794A1 (en) Data processing method, and non-transitory medium and electronic device
CN111949627B (en) Method, device, electronic equipment and medium for tabulating log files
CN117633848B (en) User information joint processing method, device, equipment and computer readable medium
CN111949938B (en) Determination method and device of transaction information, electronic equipment and computer readable medium
CN117132245B (en) Method, device, equipment and readable medium for reorganizing online article acquisition business process
CN116760717B (en) Communication path query method, device, electronic equipment and computer readable medium
CN115495793B (en) Multi-set problem safety sending method, device, equipment and medium
CN116910630B (en) User identification information storage method, device, electronic equipment and medium
CN118573477B (en) Communication data transmission method, electronic device and computer readable medium
CN116934557B (en) Behavior prediction information generation method, device, electronic equipment and readable medium
CN116702168B (en) Method, device, electronic equipment and computer readable medium for detecting supply end information
CN114003188B (en) Information encryption display method, device, electronic equipment and computer readable medium
CN118627101A (en) Data distribution method, device, electronic equipment, medium and program product
CN117150455A (en) Power information digital watermark encryption authentication method, electronic equipment and computer medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination