CN105447525A - Data prediction classification method and device - Google Patents

Data prediction classification method and device Download PDF

Info

Publication number
CN105447525A
CN105447525A CN201510932807.3A CN201510932807A CN105447525A CN 105447525 A CN105447525 A CN 105447525A CN 201510932807 A CN201510932807 A CN 201510932807A CN 105447525 A CN105447525 A CN 105447525A
Authority
CN
China
Prior art keywords
attribute
classification
prediction
decision tree
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510932807.3A
Other languages
Chinese (zh)
Inventor
丁丽萍
穆海蓉
宋宇宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN201510932807.3A priority Critical patent/CN105447525A/en
Publication of CN105447525A publication Critical patent/CN105447525A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/24765Rule-based classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Abstract

The invention discloses a data prediction classification method and device relating to the data process technique field, solving the problem in the prior art that classification result itself and classification count value are possible to leak the private information of a user. The method comprises: building a random forest namely multiple decision trees through a training dataset; carrying out prediction classification to a test dataset by the decision trees in the random forest, and obtaining the classification result satisfying differential privacy. The invention can realize high accuracy prediction classification of the high-dimension large-scale data.

Description

A kind of data prediction sorting technique and device
Technical field
The present invention relates to technical field of data processing, particularly relate to a kind of data prediction sorting technique and device.
Background technology
Classification is the important data digging method of a class, its objective is the model found out and describe and distinguish data class or concept, so that the class label of the forecasting object that can use a model.The Typical Representative of disaggregated model is decision tree, and this structure is a kind of tree-like disaggregated model, and tree interior nodes represents the test on certain attribute, and leaf node represents a class.But classification results itself and differential count value all likely reveal user privacy information.Conventional privacy protection under decision tree classification mostly by disturbance of data as added random noise or K-anonymous methods; or by realizing raw data and results of intermediate calculations encryption; but when assailant possesses certain background knowledge; just there is hidden danger in classic method, assailant can utilize and identify that the attack method such as attack, background knowledge attack is to confirm user privacy information again.In addition, conventional privacy protection model cannot its secret protection level of quantitative test.
Difference privacy is as a kind of new secret protection model, two large defects of conventional privacy protection model can be solved: (1) defines a quite strict attack model, be indifferent to assailant and have how many background knowledges, even if assailant has grasped all recorded informations except a certain bar record, the privacy information of this record also cannot be disclosed; (2) rigorous definition and quantitative estimation method is given to secret protection level.
As the simplest disaggregated model, the decision tree classification under difference privacy has more correlative study.The existing representative method in conjunction with difference privacy and decision tree has SuLQ-basedID3, DiffP-C4.5 and DiffGen, although all progressively obtain certain progress in nicety of grading and practical application angle, but owing to selecting the entropy all needing to calculate each attribute during Split Attribute at every turn, when the dimension of categorical attribute is very large, the system of selection efficiency based on index mechanism can be caused very low, and likely exhaust privacy budget, existing method effectively can not be applied to the classification of a large amount of inquiries and high-dimensional attribute.For the classification of the decision tree boosting algorithm under difference privacy, existing scholar and the correlative study of related scientific research institutions conduct both at home and abroad, but ubiquity efficiently can not solve the deficiency that high-dimensional connection attribute exists in assorting process.
Summary of the invention
The invention provides a kind of data prediction sorting technique and device, the problem that classification results itself and differential count value likely reveal user privacy information can be solved.
First aspect, the invention provides a kind of data prediction sorting technique, comprising:
Random forest and many decision trees are set up by training dataset;
Utilize the decision tree in the random forest set up to carry out prediction classification to test data set, obtain the classification results meeting difference privacy.
Second aspect, the invention provides a kind of data prediction sorter, comprising:
Set up unit, for being set up random forest and many decision trees by training dataset;
Prediction taxon, for utilizing the decision tree in the random forest of foundation to carry out prediction classification to test data set, obtains the classification results meeting difference privacy.
Data prediction sorting technique provided by the invention and device, set up random forest by training dataset, and utilize the decision tree in the random forest set up to carry out prediction classification to test data set, obtain the classification results meeting difference privacy.Compared with prior art, because random forest shows well on large data sets, very high-dimensional data can be processed, and training speed is fast, thus the pin-point accuracy prediction classification to high-dimensional large-scale data can be realized.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The process flow diagram of the data prediction sorting technique that Fig. 1 embodiment of the present invention provides;
The structural representation of the data prediction sorter that Fig. 2 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making other embodiments all obtained under creative work prerequisite, belong to the scope of protection of the invention.
First some technology that the embodiment of the present invention uses are introduced below.
Difference privacy is the secret protection technology based on data distortion.By adding noise and make data distortion in inquiry or analysis result, guarantee that the operation of inserting or deleting a certain bar record in data centralization can not affect the Output rusults of any inquiry, thus reach the object of secret protection.The formal definitions of difference privacy is as follows:
ε-difference privacy: two the adjacent data collection D all difference being at most to a record 1and D 2, given privacy algorithm K, Range (K) represent K span.If algorithm K provides ε-difference privacy, then for all S ∈ Range (K), have
Pr[K(D 1)∈S]≤exp(ε)·Pr[K(D 2)∈S]
Wherein, probability P r [] represents that privacy discloses risk, and privacy budget ε represents secret protection level, and the less level of protection of ε is higher.
Laplce's mechanism is one of major technique realizing difference privacy, add noise level and overall susceptibility is closely related.
Overall situation susceptibility: for any one function f: D → R d, the overall susceptibility of f is defined as:
Δf=max D1,D2||f(D 1)-f(D 2)|| 1
Wherein, D 1and D 2for adjacent data collection, the inquiry dimension of d representative function f, R represents mapped real number space.
Laplce's mechanism: for any one function f: D → R dif the Output rusults of algorithm K meets following equalities, then K meets ε-difference privacy.
K(D)=f(D)+<Lap 1(Δf/ε),…,Lap d(Δf/ε)>
Wherein, Lap i(Δ f/ ε) (1≤i≤d) is separate Laplace variable, and corresponding probability density function is
p ( x | b ) = 1 2 b exp ( - | x | b )
Noise level is directly proportional to Δ f, is inversely proportional to ε, and namely function f overall situation susceptibility is larger, and ε is less in privacy budget, adds noise larger.Some Output rusults of the main process of Laplce's mechanism are the algorithm of Real-valued.
Index mechanism: establish random algorithm M to be input as data set D, output is entity object r ∈ Rangeq (D, r) is availability function, and Δ q is the susceptibility of function q (D, r), if algorithm M is to be proportional to probability select from Range and export r, so algorithm M provides ε-difference secret protection.
Random forest: utilize many to set sample training and a kind of sorter of prediction.Random forest is made up of many decision trees, and the mode that its classification exported is the classification exported by indivedual tree is determined.
Training process can be summarized as follows:
(a) given training set S, test set T, property set F.
Determine parameter: the quantity t of the decision tree of generation, the degree of depth d of every tree, the number of attributes f that each node uses;
End condition: the categorical attribute that node all records is consistent, or reaches depth capacity d.
B () has the random selecting size put back to be from S | training set S (i) of S|, as the sample of root node, train from root node.
If c () present node reaches end condition, then arranging present node is leaf node, then continues other nodes of training.If present node does not reach end condition, then the random selecting f dimension attribute that nothing is put back to from F dimension attribute.Utilize this f dimension attribute, find the best attribute k of classifying quality and classification results set thereof, on present node, sample kth dimension attribute is divided into child node according to classification results.Continue other nodes of training.
D () repeats (b), (c), until all nodes is all trained or be marked as leaf node.
E () repeats (b), (c), (d), until all decision trees are all trained.
Utilize the forecasting process of random forest as follows:
For a kth tree:
A (), from the root node of present tree, according to the classification results set of present node, judgement enters which child node, until arrive certain leaf node, and prediction of output value.
B () repeats (a) until all t tree all outputs predicted value.For classification problem, export as that maximum class of prediction probability summation in all trees, namely the p of each c (i) is added up.
The embodiment of the present invention provides a kind of data prediction sorting technique, and as shown in Figure 1, described method comprises:
S11, set up random forest and many decision trees by training dataset.
Input: training dataset S, property set F, categorical attribute collection C, privacy budget B, the quantity t of the decision tree generated in random forest, the degree of depth d of every tree
Export: the random forest meeting ε-difference privacy
End condition: the categorical attribute that node all records is consistent, or reaches depth capacity d
First according to the number set in parameter, privacy budget B is all given t tree; Each decision tree is recursively generated afterwards according to same rule.The strategy generating decision tree is as follows:
From S, random selecting size is | training set S (i) of S|.Every one deck (comprising leaf node) is given in the privacy budget that every is set, the privacy budget of every one deck is divided into two halves, half is used for estimating instance number, and second half is used for estimating class counting (leaf node) or evaluation attribute (other nodes).Then the function generating decision tree is recursively called.First use Laplce's mechanism to add example number to present node to make an uproar.Judge whether afterwards to reach end condition, if reach, to this leaf node mark classification, now apply Laplce's mechanism and counting of making an uproar is added to classification.If do not reach end condition, (size of f is got in general first from F attribute, to select f attribute at random ), if having connection attribute in the attribute chosen, need first to divide a part of privacy budget, in order to select the split point of each connection attribute to each connection attribute; Split Attribute is selected afterwards from all properties.Select all to use index mechanism to select when split point and Split Attribute, scoring functions q (the S (i) of this method Exponential mechanism, F) adopt information gain and maximum kind frequency and two kinds of methods, the susceptibility Δ q of scoring functions is respectively log 2| C| and 1, wherein | C| is the size of categorical attribute collection.Final generation according to the method described above meets the decision tree of ε-difference privacy.
These decision trees composition generated meets the random forest of ε-difference privacy.Because the training sample of every tree is Stochastic choice, and in tree, each nodal community is also Stochastic choice, and random forest can not produce the phenomenon of overfitting, so do not need beta pruning.On each node, the number of attribute is generally the root mean square of whole attribute number, so also just solves the high-dimensional problem brought to a certain extent.
Random forest process of establishing under difference privacy describes as shown in table 1.
Table 1
Decision tree in the random forest that S12, utilization are set up carries out prediction classification to described training dataset, obtains the classification results meeting difference privacy.
Input: test set T, categorical attribute collection C, the set of tree
Export: the classification results of every bar record in test set
To the record of each in test set, the every one tree in application forest carries out classification prediction to it.All judge which child node this record should enter according to the classification results set of present node on each node, until arrive certain leaf node, obtain a predicted value C by current leaf node b(x).That classification results of all middle maximum probabilities that predicts the outcome is obtained according to predicting the outcome of every tree in forest export the classification results of all records afterwards.
The process prescription that the random forest set up by table 1 is classified to training dataset is as shown in table 2.
Table 2
The data prediction sorting technique that the embodiment of the present invention provides, sets up random forest by training dataset, and utilizes the decision tree in the random forest set up to carry out prediction classification to test data set, obtains the classification results meeting difference privacy.Compared with prior art, because random forest shows well on large data sets, very high-dimensional data can be processed, and training speed is fast, thus the pin-point accuracy prediction classification to high-dimensional large-scale data can be realized.
The embodiment of the present invention also provides a kind of data prediction sorter, and as shown in Figure 2, described device comprises:
Set up unit 11, for being set up random forest and many decision trees by training dataset;
Prediction taxon 12, for utilizing the decision tree in the random forest of foundation to carry out prediction classification to test data set, obtains the classification results meeting difference privacy.
Further, describedly set up unit 11, also for privacy budget B is all given t tree, t is the quantity of the decision tree generated in random forest; Each decision tree is recursively generated according to same rule.
Further, describedly set up unit 11, for random selecting size from training dataset S be also | training set S (i) of S|; Every one deck is given in the privacy budget that every is set, and the privacy budget of every one deck is divided into two halves, and half is used for estimating instance number, and second half is used for estimating class counting or evaluation attribute; Recursively call the function generating decision tree; First use Laplce's mechanism to add example number to present node to make an uproar; Judge whether to reach end condition, if reach, to this leaf node mark classification, application Laplce mechanism adds to classification counting of making an uproar, if do not reach end condition, select f attribute at random in first dependency collection F, wherein, the size of f is got if have connection attribute in the attribute chosen, then divide a part of privacy budget first to each connection attribute, in order to select the split point of each connection attribute, from all properties, select Split Attribute afterwards; The decision tree meeting ε-difference privacy is generated according to said process.
Alternatively, select all to use index mechanism to select when split point and Split Attribute, the scoring functions employing information gain of described index mechanism and maximum kind frequency and two kinds of modes, the susceptibility of scoring functions is respectively log 2| C| and 1, wherein | C| is the size of categorical attribute collection.
Further, described prediction taxon 12, for to the record of each in test set, every one tree in application forest carries out classification prediction to it, all judge which child node this record should enter according to the classification results set of present node on each node, until arrive certain leaf node, a predicted value is obtained by current leaf node, obtain that classification results of all middle maximum probabilities that predicts the outcome according to predicting the outcome of every tree in forest, export the classification results of all records.
The data prediction sorter that the embodiment of the present invention provides, sets up random forest by training dataset, and utilizes the decision tree in the random forest set up to carry out prediction classification to test data set, obtains the classification results meeting difference privacy.Compared with prior art, because random forest shows well on large data sets, very high-dimensional data can be processed, and training speed is fast, thus the pin-point accuracy prediction classification to high-dimensional large-scale data can be realized.
One of ordinary skill in the art will appreciate that all or part of flow process realized in above-described embodiment method, that the hardware that can carry out instruction relevant by computer program has come, described program can be stored in computer read/write memory medium, this program, when performing, can comprise the flow process of the embodiment as above-mentioned each side method.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-OnlyMemory, ROM) or random store-memory body (RandomAccessMemory, RAM) etc.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; the change that can expect easily or replacement, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims (10)

1. a data prediction sorting technique, is characterized in that, comprising:
Random forest is set up, i.e. many decision trees by training dataset;
Utilize the decision tree in the random forest set up to carry out prediction classification to test data set, obtain the classification results meeting difference privacy.
2. data prediction sorting technique according to claim 1, is characterized in that, describedly sets up random forest by training dataset and comprises:
Privacy budget B is all given t tree, t is the quantity of the decision tree generated in random forest;
Each decision tree is recursively generated according to same rule.
3. data prediction sorting technique according to claim 2, is characterized in that, described generation decision tree comprises:
From training dataset S, random selecting size is | training set S (i) of S|;
Every one deck is given in the privacy budget that every is set, and the privacy budget of every one deck is divided into two halves, and half is used for estimating instance number, and second half is used for estimating class counting or evaluation attribute;
Recursively call the function generating decision tree;
First use Laplce's mechanism to add example number to present node to make an uproar;
Judge whether to reach end condition, if reach, to this leaf node mark classification, application Laplce mechanism adds to classification counting of making an uproar, if do not reach end condition, select f attribute at random in first dependency collection F, wherein, the size of f is got if have connection attribute in the attribute chosen, then divide a part of privacy budget first to each connection attribute, in order to select the split point of each connection attribute, from all properties, select Split Attribute afterwards;
The decision tree meeting ε-difference privacy is generated according to said process.
4. data prediction sorting technique according to claim 3, it is characterized in that, select all to use index mechanism to select when split point and Split Attribute, the scoring functions employing information gain of described index mechanism and maximum kind frequency and two kinds of modes, the susceptibility of scoring functions is respectively log 2| C| and 1, wherein | C| is the size of categorical attribute collection.
5. data prediction sorting technique according to claim 4, is characterized in that, the decision tree in the described random forest utilizing foundation carries out prediction classification to test data set, obtains the classification results meeting difference privacy and comprises:
To the record of each in test set, every one tree in application forest carries out classification prediction to it, all judge which child node this record should enter according to the classification results set of present node on each node, until arrive certain leaf node, a predicted value is obtained by current leaf node, obtain that classification results of all middle maximum probabilities that predicts the outcome according to predicting the outcome of every tree in forest, export the classification results of all records.
6. a data prediction sorter, is characterized in that, comprising:
Set up unit, for being set up random forest and many decision trees by training dataset;
Prediction taxon, for utilizing the decision tree in the random forest of foundation to carry out prediction classification to test data set, obtains the classification results meeting difference privacy.
7. data prediction sorter according to claim 6, is characterized in that, describedly sets up unit, and also for privacy budget B is all given t tree, t is the quantity of the decision tree generated in random forest; Each decision tree is recursively generated according to same rule.
8. data prediction sorter according to claim 7, is characterized in that, describedly sets up unit, for random selecting size from training dataset S is also | training set S (i) of S|; Every one deck is given in the privacy budget that every is set, and the privacy budget of every one deck is divided into two halves, and half is used for estimating instance number, and second half is used for estimating class counting or evaluation attribute; Recursively call the function generating decision tree; First use Laplce's mechanism to add example number to present node to make an uproar; Judge whether to reach end condition, if reach, to this leaf node mark classification, application Laplce mechanism adds to classification counting of making an uproar, if do not reach end condition, select f attribute at random in first dependency collection F, wherein, the size of f is got if have connection attribute in the attribute chosen, then divide a part of privacy budget first to each connection attribute, in order to select the split point of each connection attribute, from all properties, select Split Attribute afterwards; The decision tree meeting ε-difference privacy is generated according to said process.
9. data prediction sorter according to claim 8, it is characterized in that, select all to use index mechanism to select when split point and Split Attribute, the scoring functions employing information gain of described index mechanism and maximum kind frequency and two kinds of modes, the susceptibility of scoring functions is respectively log 2| C| and 1, wherein | C| is the size of categorical attribute collection.
10. data prediction sorter according to claim 9, it is characterized in that, described prediction taxon, for to the record of each in test set, every one tree in application forest carries out classification prediction to it, all judge which child node this record should enter according to the classification results set of present node on each node, until arrive certain leaf node, a predicted value is obtained by current leaf node, that classification results of all middle maximum probabilities that predicts the outcome is obtained according to predicting the outcome of every tree in forest, export the classification results of all records.
CN201510932807.3A 2015-12-15 2015-12-15 Data prediction classification method and device Pending CN105447525A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510932807.3A CN105447525A (en) 2015-12-15 2015-12-15 Data prediction classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510932807.3A CN105447525A (en) 2015-12-15 2015-12-15 Data prediction classification method and device

Publications (1)

Publication Number Publication Date
CN105447525A true CN105447525A (en) 2016-03-30

Family

ID=55557684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510932807.3A Pending CN105447525A (en) 2015-12-15 2015-12-15 Data prediction classification method and device

Country Status (1)

Country Link
CN (1) CN105447525A (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339714A (en) * 2016-08-10 2017-01-18 上海交通大学 Multi-layer differential privacy embedded decision tree model-based privacy risk control method
CN106529584A (en) * 2016-10-25 2017-03-22 福建农林大学 Flue-cured tobacco aroma type and quality judgment intelligent evaluation method
CN106570537A (en) * 2016-11-17 2017-04-19 天津大学 Random forest model selection method based on confusion matrix
CN106643722A (en) * 2016-10-28 2017-05-10 华南理工大学 Method for pet movement identification based on triaxial accelerometer
CN107423551A (en) * 2016-05-24 2017-12-01 西门子医疗有限公司 For performing the imaging method of medical inspection
CN107729555A (en) * 2017-11-07 2018-02-23 太原理工大学 A kind of magnanimity big data Distributed Predictive method and system
CN107895235A (en) * 2017-11-27 2018-04-10 安徽经邦软件技术有限公司 Financial modeling system based on decision tree
CN108364467A (en) * 2018-02-12 2018-08-03 北京工业大学 A kind of traffic information prediction technique based on modified decision Tree algorithms
CN108416368A (en) * 2018-02-08 2018-08-17 北京三快在线科技有限公司 The determination method and device of sample characteristics importance, electronic equipment
CN108830103A (en) * 2018-06-14 2018-11-16 西安交通大学 A kind of automation generates method and device thereof, the handheld device of privacy of user strategy
CN109284626A (en) * 2018-09-07 2019-01-29 中南大学 Random forests algorithm towards difference secret protection
CN109697447A (en) * 2017-10-20 2019-04-30 富士通株式会社 Disaggregated model construction device, method and electronic equipment based on random forest
CN109711428A (en) * 2018-11-20 2019-05-03 佛山科学技术学院 A kind of saturated gas pipeline internal corrosion speed predicting method and device
CN109784091A (en) * 2019-01-16 2019-05-21 福州大学 A kind of list data method for secret protection merging difference privacy GAN and PATE model
CN110084365A (en) * 2019-03-13 2019-08-02 西安电子科技大学 A kind of service provider system and method based on deep learning
CN110389952A (en) * 2019-06-06 2019-10-29 口碑(上海)信息技术有限公司 A kind of processing method and processing device of vegetable data
CN110413682A (en) * 2019-08-09 2019-11-05 云南电网有限责任公司 A kind of the classification methods of exhibiting and system of data
CN110874481A (en) * 2018-08-31 2020-03-10 阿里巴巴集团控股有限公司 GBDT model-based prediction method and device
CN110968887A (en) * 2018-09-28 2020-04-07 第四范式(北京)技术有限公司 Method and system for executing machine learning under data privacy protection
CN110991651A (en) * 2019-11-30 2020-04-10 航天科技控股集团股份有限公司 Energy consumption prediction analysis system and method for user driving habits based on TBOX
CN111191628A (en) * 2020-01-06 2020-05-22 河海大学 Remote sensing image earthquake damage building identification method based on decision tree and feature optimization
CN111222570A (en) * 2020-01-06 2020-06-02 广西师范大学 Ensemble learning classification method based on difference privacy
CN111259442A (en) * 2020-01-15 2020-06-09 广西师范大学 Differential privacy protection method for decision tree under MapReduce framework
CN111275239A (en) * 2019-12-20 2020-06-12 西安电子科技大学 Multi-mode-based networked teaching data analysis method and system
CN112101403A (en) * 2020-07-24 2020-12-18 西安电子科技大学 Method and system for classification based on federate sample network model and electronic equipment
WO2021000561A1 (en) * 2019-07-01 2021-01-07 创新先进技术有限公司 Data processing method and device, and electronic apparatus
CN112699402A (en) * 2020-12-28 2021-04-23 广西师范大学 Wearable device activity prediction method based on federal personalized random forest
CN112822167A (en) * 2020-12-31 2021-05-18 杭州立思辰安科科技有限公司 Abnormal TLS encrypted traffic detection method and system
CN114118601A (en) * 2021-12-02 2022-03-01 安徽大学 Random forest traffic flow prediction method based on differential privacy protection
CN117034151A (en) * 2023-10-09 2023-11-10 山东中力高压阀门股份有限公司 Valve life prediction method based on big data analysis

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423551A (en) * 2016-05-24 2017-12-01 西门子医疗有限公司 For performing the imaging method of medical inspection
CN106339714A (en) * 2016-08-10 2017-01-18 上海交通大学 Multi-layer differential privacy embedded decision tree model-based privacy risk control method
CN106339714B (en) * 2016-08-10 2020-12-01 上海交通大学 Privacy risk control method for multilayer embedded differential privacy to decision tree model
CN106529584A (en) * 2016-10-25 2017-03-22 福建农林大学 Flue-cured tobacco aroma type and quality judgment intelligent evaluation method
CN106643722A (en) * 2016-10-28 2017-05-10 华南理工大学 Method for pet movement identification based on triaxial accelerometer
CN106570537A (en) * 2016-11-17 2017-04-19 天津大学 Random forest model selection method based on confusion matrix
CN109697447A (en) * 2017-10-20 2019-04-30 富士通株式会社 Disaggregated model construction device, method and electronic equipment based on random forest
CN107729555A (en) * 2017-11-07 2018-02-23 太原理工大学 A kind of magnanimity big data Distributed Predictive method and system
CN107729555B (en) * 2017-11-07 2020-10-09 太原理工大学 Mass big data distributed prediction method and system
CN107895235A (en) * 2017-11-27 2018-04-10 安徽经邦软件技术有限公司 Financial modeling system based on decision tree
CN108416368A (en) * 2018-02-08 2018-08-17 北京三快在线科技有限公司 The determination method and device of sample characteristics importance, electronic equipment
CN108364467A (en) * 2018-02-12 2018-08-03 北京工业大学 A kind of traffic information prediction technique based on modified decision Tree algorithms
CN108830103A (en) * 2018-06-14 2018-11-16 西安交通大学 A kind of automation generates method and device thereof, the handheld device of privacy of user strategy
CN108830103B (en) * 2018-06-14 2020-07-28 西安交通大学 Method and device for automatically generating user privacy policy and handheld device
CN110874481B (en) * 2018-08-31 2023-06-20 创新先进技术有限公司 GBDT model-based prediction method and GBDT model-based prediction device
CN110874481A (en) * 2018-08-31 2020-03-10 阿里巴巴集团控股有限公司 GBDT model-based prediction method and device
CN109284626A (en) * 2018-09-07 2019-01-29 中南大学 Random forests algorithm towards difference secret protection
CN110968887B (en) * 2018-09-28 2022-04-05 第四范式(北京)技术有限公司 Method and system for executing machine learning under data privacy protection
CN110968887A (en) * 2018-09-28 2020-04-07 第四范式(北京)技术有限公司 Method and system for executing machine learning under data privacy protection
CN109711428A (en) * 2018-11-20 2019-05-03 佛山科学技术学院 A kind of saturated gas pipeline internal corrosion speed predicting method and device
CN109784091A (en) * 2019-01-16 2019-05-21 福州大学 A kind of list data method for secret protection merging difference privacy GAN and PATE model
CN110084365A (en) * 2019-03-13 2019-08-02 西安电子科技大学 A kind of service provider system and method based on deep learning
CN110084365B (en) * 2019-03-13 2023-08-11 西安电子科技大学 Service providing system and method based on deep learning
CN110389952A (en) * 2019-06-06 2019-10-29 口碑(上海)信息技术有限公司 A kind of processing method and processing device of vegetable data
WO2021000561A1 (en) * 2019-07-01 2021-01-07 创新先进技术有限公司 Data processing method and device, and electronic apparatus
CN110413682A (en) * 2019-08-09 2019-11-05 云南电网有限责任公司 A kind of the classification methods of exhibiting and system of data
CN110991651A (en) * 2019-11-30 2020-04-10 航天科技控股集团股份有限公司 Energy consumption prediction analysis system and method for user driving habits based on TBOX
CN110991651B (en) * 2019-11-30 2023-04-28 航天科技控股集团股份有限公司 Energy consumption predictive analysis system and method for user driving habit based on TBOX
CN111275239A (en) * 2019-12-20 2020-06-12 西安电子科技大学 Multi-mode-based networked teaching data analysis method and system
CN111275239B (en) * 2019-12-20 2023-09-29 西安电子科技大学 Multi-mode-based networked teaching data analysis method and system
CN111191628A (en) * 2020-01-06 2020-05-22 河海大学 Remote sensing image earthquake damage building identification method based on decision tree and feature optimization
CN111222570B (en) * 2020-01-06 2022-08-26 广西师范大学 Ensemble learning classification method based on difference privacy
CN111222570A (en) * 2020-01-06 2020-06-02 广西师范大学 Ensemble learning classification method based on difference privacy
CN111259442A (en) * 2020-01-15 2020-06-09 广西师范大学 Differential privacy protection method for decision tree under MapReduce framework
CN112101403A (en) * 2020-07-24 2020-12-18 西安电子科技大学 Method and system for classification based on federate sample network model and electronic equipment
CN112101403B (en) * 2020-07-24 2023-12-15 西安电子科技大学 Classification method and system based on federal few-sample network model and electronic equipment
CN112699402B (en) * 2020-12-28 2022-06-17 广西师范大学 Wearable device activity prediction method based on federal personalized random forest
CN112699402A (en) * 2020-12-28 2021-04-23 广西师范大学 Wearable device activity prediction method based on federal personalized random forest
CN112822167A (en) * 2020-12-31 2021-05-18 杭州立思辰安科科技有限公司 Abnormal TLS encrypted traffic detection method and system
CN114118601A (en) * 2021-12-02 2022-03-01 安徽大学 Random forest traffic flow prediction method based on differential privacy protection
CN114118601B (en) * 2021-12-02 2024-02-13 安徽大学 Random forest traffic prediction method based on differential privacy protection
CN117034151A (en) * 2023-10-09 2023-11-10 山东中力高压阀门股份有限公司 Valve life prediction method based on big data analysis
CN117034151B (en) * 2023-10-09 2023-12-19 山东中力高压阀门股份有限公司 Valve life prediction method based on big data analysis

Similar Documents

Publication Publication Date Title
CN105447525A (en) Data prediction classification method and device
Goh et al. Incorporating the rough sets theory into travel demand analysis
Jain et al. Data mining techniques: a survey paper
Mohamed et al. History matching and uncertainty quantification: multiobjective particle swarm optimisation approach
Liu et al. Data mining feature selection for credit scoring models
CN110346831B (en) Intelligent seismic fluid identification method based on random forest algorithm
CN110570111A (en) Enterprise risk prediction method, model training method, device and equipment
CN105378714A (en) Fast grouping of time series
Righi et al. The AI techno-economic complex System: Worldwide landscape, thematic subdomains and technological collaborations
CN107330464A (en) Data processing method and device
CN104573130A (en) Entity resolution method based on group calculation and entity resolution device based on group calculation
CN106934410A (en) The sorting technique and system of data
Gu et al. Application of fuzzy decision tree algorithm based on mobile computing in sports fitness member management
Joshi et al. Statistical downscaling of precipitation and temperature using sparse Bayesian learning, multiple linear regression and genetic programming frameworks
CN110705045A (en) Link prediction method for constructing weighting network by using network topological characteristics
CN103942604A (en) Prediction method and system based on forest discrimination model
Rana et al. A review of popular decision tree algorithms in data mining
Alyahyan et al. Decision Trees for Very Early Prediction of Student's Achievement
Ullah et al. Adaptive data balancing method using stacking ensemble model and its application to non-technical loss detection in smart grids
Elhebir et al. A novel ensemble approach to enhance the performance of web server logs classification
KR20170030016A (en) Method for analyzing promising technology using patent data
Ahlawat et al. Analysis of factors affecting enrollment pattern in Indian universities using k-means clustering
Ma The Research of Stock Predictive Model based on the Combination of CART and DBSCAN
CN104468276A (en) Network traffic identification method based on random sampling multiple classifiers
Sawant et al. Educational data mining prediction model using decision tree algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160330