CN104063230A - Rough set parallel reduction method, device and system based on MapReduce - Google Patents

Rough set parallel reduction method, device and system based on MapReduce Download PDF

Info

Publication number
CN104063230A
CN104063230A CN201410325508.9A CN201410325508A CN104063230A CN 104063230 A CN104063230 A CN 104063230A CN 201410325508 A CN201410325508 A CN 201410325508A CN 104063230 A CN104063230 A CN 104063230A
Authority
CN
China
Prior art keywords
decision table
decision
attribute
mapreduce
reduced
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410325508.9A
Other languages
Chinese (zh)
Other versions
CN104063230B (en
Inventor
席大超
王国胤
张学睿
张帆
封雷
李广砥
邓伟辉
郭义帅
谢亮
董建华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Institute of Green and Intelligent Technology of CAS
Original Assignee
Chongqing Institute of Green and Intelligent Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Institute of Green and Intelligent Technology of CAS filed Critical Chongqing Institute of Green and Intelligent Technology of CAS
Priority to CN201410325508.9A priority Critical patent/CN104063230B/en
Publication of CN104063230A publication Critical patent/CN104063230A/en
Application granted granted Critical
Publication of CN104063230B publication Critical patent/CN104063230B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a rough set parallel reduction method, device and system based on MapReduce. The method comprises the steps that after a decision table to be reduced is read, the decision table is reduced, attribute importance degree parallel calculation processing is conducted on the reduced decision table, and lastly attribute importance degree parallel reduction is conducted to obtain a final reduction result. By means of the method, the importance degrees of all attributes can be worked out through one-time MapReduce, redundant information of the reduction decision table is deleted again after the reduction result is obtained, the reduction decision table is more simplified, and thus the calculation speed can be further improved. In addition, same as the method, the rough set parallel reduction device and system can well solve the problems that certain limiting conditions exist in a knowledge reduction method and parallel reduction cannot be conducted efficiently, and further optimize the storage space.

Description

MapReduce-based rough set parallel reduction method, device and system
Technical Field
The invention relates to the field of knowledge reduction, in particular to a rough set parallel reduction method, device and system based on MapReduce.
Background
With the advent of the big data era, the classic reduction method cannot load data into a memory at one time and cannot meet the requirement of big data. Therefore, a main objective of those skilled in the art is how to accurately and rapidly perform data mining under big data.
With GoogleTMThe distributed File system GFS (Google File System), the parallel programming mode MapReduce and the distributed data storage system BigTable of the company provide a foundation for processing big data. In general, the classical approaches for data mining are primarily directed to the following.
The rough set, which is a classical tool dealing with ambiguity and uncertainty, is widely used in the fields of machine learning and data mining. Knowledge reduction is one of the important research contents in rough set theory, and is also the key step of knowledge acquisition, wherein so-called knowledge, in rough set theory, "knowledge" is considered as a sort ability. For example, people's behavior is based on the ability to distinguish real or abstract objects, such as in the ancient times, people must be able to distinguish what can be eaten and what cannot be eaten in order to survive; the doctor gives a diagnosis to the patient and must distinguish which disease the patient is suffering from. These abilities to classify things according to their characteristic differences can all be considered as some kind of "knowledge". In addition, the knowledge reduction is to delete unnecessary knowledge of the knowledge base while maintaining the classification ability of the knowledge base. By deleting redundant knowledge, the definition of the latent knowledge of the information system can be greatly improved.
MapReduce, a programming model (i.e., software framework) in a Hadoop distributed file system, based on which written applications can run on large clusters of thousands of commercial machines and process data sets at the top T level in parallel in a reliable fault-tolerant manner. A MapReduce job (job) typically splits the input dataset into several independent data blocks, which are processed in a completely parallel manner by the map task (task). The framework will sort the output of the maps first and then input the results to the reduce task. Typically both the input and output of a job will be stored in a file system. The whole framework is responsible for scheduling and monitoring tasks and re-executing tasks that have failed.
Typically, the MapReduce framework and the Hadoop distributed file system run on the same set of nodes, i.e., compute nodes and storage nodes are typically together. This configuration allows the framework to efficiently schedule tasks on those nodes that already have data in place, which can allow the network bandwidth of the entire cluster to be utilized very efficiently. In addition, the map function and the reduce function are given to the user for implementation, and the two functions define the task itself.
In the existing theory, see the literature for details:
1)【Zhang J,Li T,Ruan D,et al.A parallel method for computing rough set approximations[J].Information Sciences,2012,194:209-223】;
2)【Junbo Zhang,Jian-Syuan Wong,Tianrui Li,YiPan.A comparison of parallel large-scaleknowledge acquisition using rough set theory on different MapReduce runtime systems.International Journal of Approximate Reasoning.2013】。
in the above document, a rough set parallel approximation model and a rough set knowledge acquisition parallel model based on the model are proposed. The model gives a good demonstration in theory, demonstrates the feasibility of the rough set parallel model, but the model only parallelizes the most basic method of the rough set, and the rough set reduction method does not relate to the model.
In addition, in the literature:
3) a knowledge reduction algorithm [ J ] in a cloud computing environment, 2011,34(12): 2332-;
4) [ money in, courtesy seedling, courtesy Zhang, in a cloud computing environment, a study of a difference matrix knowledge reduction algorithm [ J ]. computer science, 2011,38(8) ].
A rough parallel reduction method model is provided, but the method has many limitations, a compatible decision table is needed to carry out reduction under big data, and the practical application is greatly limited.
Briefly, the above prior knowledge reduction methods mainly have the following drawbacks:
first, although rough parallel computing processing can be performed, reduction cannot be performed.
Secondly, although there is a method capable of performing parallelization reduction of rough sets, the limited condition is that the method only aims at a consistent decision table, and is very limited in practical application.
Finally, the existing parallel reduction method model is not high in operation efficiency and needs to be improved.
Disclosure of Invention
In view of the above disadvantages or shortcomings of the prior art, an object of the present invention is to provide a rough set parallel reduction method, apparatus and system based on MapReduce, which are used to solve the problems that the knowledge reduction method in the prior art has certain limitations and cannot efficiently perform parallelization reduction.
In order to achieve the above objects and other related objects, the present invention provides the following technical solutions:
a rough set parallel reduction method based on MapReduce comprises the following steps:
reading a decision table to be reduced;
initializing a first MapReduce model and enabling the first MapReduce model to respond to the decision table to be reduced so as to perform parallel computing processing on the decision table to be reduced to obtain a simplified decision table with a mark:
if the decision table is empty, the decision table is used as a final reduction result of the decision table to be reduced and output;
if the simplified decision table is not empty, initializing a second MapReduce model and enabling the second MapReduce model to respond to the simplified decision table with the marks so as to obtain the importance of each attribute in the simplified decision table with the marks through parallel calculation and write the result into a Hadoop distributed file system;
and reading a decision table with the highest attribute importance in the Hadoop distributed file system, deleting redundant information in the decision table to obtain a new decision table to be reduced, and enabling the new decision table to be reduced to be used as an input value of the first MapReduce model to be reduced again.
In addition, the invention also provides a rough set parallel reduction device based on MapReduce, which comprises the following steps:
the operation configuration module is used for reading a decision table to be reduced;
the task parallel simplification module is used for initializing a first MapReduce model and enabling the first MapReduce model to respond to the decision table to be reduced so as to perform parallel calculation processing on the decision table to be reduced to obtain a simplified decision table with marks, and if the simplified decision table is empty, enabling the simplified decision table to be used as a final reduction result of the decision table to be reduced and outputting the final reduction result;
the attribute importance parallel computing module is used for initializing a second MapReduce model and enabling the second MapReduce model to respond to the simplified decision table with the marks if the simplified decision table is non-empty, so as to obtain the importance of each attribute in the simplified decision table with the marks through parallel computing and write the result into a Hadoop distributed file system;
and the attribute importance degree parallel reduction module is used for reading the decision table with the highest attribute importance degree in the Hadoop distributed file system and deleting redundant information in the decision table to obtain a new decision table to be reduced, and the new decision table to be reduced is used as an input value of the first MapReduce model to be reduced again.
In addition, the invention also provides a rough set parallel reduction system based on MapReduce, which comprises the following steps:
the operation configuration unit is used for reading a decision table to be reduced;
the task parallel simplification unit is used for initializing a first MapReduce model and enabling the first MapReduce model to respond to the decision table to be reduced so as to perform parallel calculation processing on the decision table to be reduced to obtain a simplified decision table with marks, and if the simplified decision table is empty, enabling the simplified decision table to be used as a final reduction result of the decision table to be reduced and outputting the final reduction result;
the attribute importance parallel computing unit is used for initializing a second MapReduce model and enabling the second MapReduce model to respond to the simplified decision table with the marks if the simplified decision table is non-empty, so as to obtain the importance of each attribute in the simplified decision table with the marks through parallel computing and write the result into a Hadoop distributed file system;
and the attribute importance degree parallel reduction unit is used for reading the decision table with the highest attribute importance degree in the Hadoop distributed file system and deleting redundant information in the decision table to obtain a new decision table to be reduced, and the new decision table to be reduced is used as an input value of the first MapReduce model to be reduced again.
In summary, compared with the prior art, the invention has the following advantages:
firstly, the invention simplifies the decision table and then carries out attribute importance parallel computation, and then selects the reduction table with the highest attribute importance from the calculation results of the attribute importance to carry out reducibility, thereby leading the obtained reduction result to be more accurate.
Second, the present invention has no restriction on reduction tables, and has a wider application range compared with the prior art which can only reduce the compatibility decision table.
Thirdly, in the existing other methods, only the importance of one condition can be obtained in the last parallel MapReduce of the calculation of the importance of the attribute, and the importance of all condition attributes can be obtained by matching with a simple text reading calculation after the MapReduce is performed once, so that a reduction is completed once, and the efficiency of the method is improved.
Fourthly, after the decision table is reduced, the storage space can be effectively optimized, and meanwhile, the calculation efficiency of calculation by using the reduced reduction result can be improved.
Drawings
FIG. 1 is a flow chart of the working principle of MapReduce.
FIG. 2 is a flowchart illustrating the operation of the rough set parallel reduction method based on MapReduce according to the present invention.
FIG. 3 is a simplified schematic diagram of the rough set parallel reduction method based on MapReduce according to the present invention.
Description of the reference numerals
S10-S50 steps
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
The present invention is implemented based on MapReduce and rough set, and in order to make the technical solution better clear and understood by those skilled in the art, MapReduce and rough set are explained and illustrated accordingly.
MapReduce related overview
MapReduce is a programming model for parallel operations on large-scale datasets. The concepts "Map" and "Reduce" and their main ideas are borrowed from functional programming languages and also features borrowed from vector programming languages. The method greatly facilitates programmers to operate programs on the distributed system under the condition of no distributed parallel programming. Current software implementations specify a Map function to Map a set of key-value pairs into a new set of key-value pairs, and a concurrent Reduce function to ensure that each of all mapped key-value pairs share the same key-set.
Referring to fig. 1, a brief description will be made of the working principle flow of MapReduce in conjunction with fig. 1.
Map terminal
First, each input tile is processed by a Map task, and the size of a block (e.g. 64M) of the HDFS is set as a tile by default, although we can set the size of the block. The result output by the Map is temporarily placed in a ring memory buffer, when the buffer is about to overflow, an overflow file is created in the local file system, and the data in the buffer is written into the file.
Secondly, before writing into the disk, the thread firstly divides the data into the partitions with the same number according to the number of Reduce tasks, namely, one Reduce task corresponds to the data of one partition. This is done to avoid the embarrassment of having some Reduce tasks assigned large amounts of data, while some Reduce tasks may be assigned little to no data. In fact, the partitioning is a process of caching (hash) data, and then sorting the data in each partition, which is performed to write as little data as possible to the disk.
Third, when the Map task outputs the last record, there may be many overflow files that need to be merged. The purpose is two: the data volume written into the disk at each time is reduced as much as possible, the data volume transmitted by the network at the next copying stage is reduced as much as possible, and finally, the data volume is combined into a partitioned and sequenced file. Data may also be compressed in order to reduce the amount of data transmitted by the network.
Fourth, the data in the partition is copied to the corresponding Reduce task.
Reduce end
First, Reduce receives data from different Map tasks, and the data from each Map is ordered. If the data volume received by the Reduce end is quite small, the data volume is directly stored in the memory, and if the data volume exceeds a certain proportion of the size of the buffer area, the merged data is overflowed to a disk.
Second, as the number of over-written files increases, the background thread will merge them into a larger ordered file, which is done to save time for later merges. In fact, MapReduce repeatedly executes sorting and merging operations no matter at the Map end or the Reduce end.
Third, many intermediate files (written to disk) are generated during the merging process, but MapReduce allows as little data to be written to disk as possible, and the result of the last merging is not written to disk but is directly input to the Reduce function.
Summary of related Art
First, the basic concepts related to the rough set and some explanations about the MapRedce model will be described.
Definition 1: a decision table is an information table knowledge expression system S ═<U,R,V,f>R ═ C ═ D attribute set, subsets C and D become condition attribute set and result attribute set, respectively, V ═ u @r∈RVrIs a collection of attribute values, VrThe attribute range representing the attribute R ∈ R, i.e., the value range of the attribute R, f: U × R → V is an information function that specifies the attribute value of each object x in U. For each attribute subsetWe define an unresolvable binary relationship IND (B), i.e.
<math> <mrow> <mi>IND</mi> <mrow> <mo>(</mo> <mi>B</mi> <mo>)</mo> </mrow> <mo>=</mo> <mo>{</mo> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>|</mo> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msup> <mi>U</mi> <mn>2</mn> </msup> <mo>,</mo> <mo>&ForAll;</mo> <mi>b</mi> <mo>&Element;</mo> <mi>B</mi> <mrow> <mo>(</mo> <mi>b</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>b</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>}</mo> </mrow> </math>
Definition 2: given a knowledge expression system S ═<U,R,V,f>For each subsetAnd the ambiguous relationship B, the upper approximation set of X and the lower approximation set, respectively, can be defined by the basis of B as follows:
definition 3: set BNB(X)=B-(X) \ B _ (X) is referred to as the B boundary of X. POS (Point of sale)B(X) ═ B _ (X) the B positive domain referred to as X; NEGBThe (X) ═ U \ B _ (X) is referred to as the negative field of X.
Definition 4: in the decision table S ═ (U, C ═ D, V, f), P and Q are two equivalence clusters defined on U, if POSP(Q)=POS(P\{r})(Q), then r is (unnecessary) to Q can be omitted in P, Q can be omitted in P for short; otherwise, let r be non-omissible (essential) in P relative to Q.
Definition 5: in the decision table S ═ (U, C ═ D, V, f), P and Q are two equivalence clusters defined on U, if P is a Q independent subset of PWith POSs(Q)=POSP(Q), S is called Q reduction of P.
Definition 6: in the decision table S { (U, C { [ U { ]) } { [ U'1]c,[u′2]c,…[u′m]cIs a division of the corpus U into attribute sets C, U '═ U'1,u′2,…,u′m},Wherein <math> <mrow> <mo>&ForAll;</mo> <msubsup> <mi>u</mi> <msub> <mi>i</mi> <mi>s</mi> </msub> <mo>&prime;</mo> </msubsup> <mo>&Element;</mo> <msup> <mi>U</mi> <mo>&prime;</mo> </msup> </mrow> </math> And is <math> <mrow> <mo>|</mo> <msub> <mrow> <mo>[</mo> <msub> <msup> <mi>u</mi> <mo>&prime;</mo> </msup> <msub> <mi>i</mi> <mi>s</mi> </msub> </msub> <mo>]</mo> </mrow> <mi>c</mi> </msub> <mo>\</mo> <mi>D</mi> <mo>|</mo> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mrow> <mo>(</mo> <mi>s</mi> <mo>=</mo> <mn>1,2</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math> Note the book <math> <mrow> <msubsup> <mi>U</mi> <mi>pos</mi> <mo>&prime;</mo> </msubsup> <mo>=</mo> <mo>{</mo> <mo>[</mo> <msub> <msup> <mi>u</mi> <mo>&prime;</mo> </msup> <msub> <mi>i</mi> <mn>1</mn> </msub> </msub> <mo>,</mo> <msub> <msup> <mi>u</mi> <mo>&prime;</mo> </msup> <msub> <mi>i</mi> <mn>2</mn> </msub> </msub> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <msub> <msup> <mi>u</mi> <mo>&prime;</mo> </msup> <msub> <mi>i</mi> <mi>t</mi> </msub> </msub> <mo>}</mo> <mo>,</mo> </mrow> </math> U′neg=U′-U′pos,U′=U′pos∪U′neg. The simplified decision table is called S ═ (U', C ═ D, V, f).
Definition 7: in the decision table S ═ (U, C ═ D, V, f), S ═ U', C ═ D, V, f) is a simplified decision table, is defined as
sigp(a)=|U′P∪{a}-U′p|
Wherein,
based on the above summary of MapReduce and rough set, the following will describe the detailed implementation process of the rough set parallel reduction method based on MapReduce in a manner of combining with embodiments.
In the present invention, the "decision table" refers to data having a decision attribute, that is, data having a decision attribute is an object of reduction in the present invention.
Fig. 2 shows a schematic flow diagram of the rough set parallel reduction method based on MapReduce of the present invention, where the rough set parallel reduction method based on MapReduce includes:
s10 reads the decision table to be reduced: before parallel reduction is performed, the decision table to be reduced is read first, and the read mode may be to directly read the decision table to be reduced from the local (for example, a Hadoop distributed file system), or may also be to directly read the decision table to be reduced from the network node to the local.
S30 obtains a reduced decision table: initializing a first MapReduce model and enabling the first MapReduce model to respond to the decision table to be reduced so as to perform parallel computing processing on the decision table to be reduced to obtain a simplified decision table with marks,
s31, if the reduction decision table is empty, taking the reduction decision table as the final reduction result of the decision table to be reduced and outputting the final reduction result;
s32 parallel computing attribute importance: if the simplified decision table is not empty, initializing a second MapReduce model and enabling the second MapReduce model to respond to the simplified decision table with the marks so as to obtain the importance of each attribute in the simplified decision table with the marks through parallel calculation and write the result into a Hadoop distributed file system;
s50 parallel reduction of attribute importance: and reading a decision table with the highest attribute importance in the Hadoop distributed file system, deleting redundant information in the decision table to obtain a new decision table to be reduced, and enabling the new decision table to be reduced to be used as an input value of the first MapReduce model to be reduced again.
Firstly, compared with the existing reduction method, the rough set parallel reduction method based on MapReduce obtains a simplified decision table firstly, and the reduction on the simplified decision table greatly reduces the calculated amount, thereby improving the efficiency; in addition, the existing reduction method can only obtain the importance of one condition in the last parallel MapReduce for calculating the importance of the attribute, and the rough set parallel reduction method based on MapReduce can obtain the importance of all the condition attributes by matching with simple text reading calculation after one MapReduce and finish one reduction, thereby improving the reduction efficiency.
Secondly, the rough set parallel reduction method based on MapReduce is mainly an improvement on the prior art, so that it is necessary to first simply introduce and explain the traditional attribute reduction method.
According to the explanation on rough sets given above, a traditional method for rapid attribute reduction is given, which takes the importance of attributes as a reduction index, takes the attribute with the highest importance each time as a result of the reduction, and when the U' set is empty, the method stops, i.e. an optimal reduction result is found, and outputs the result.
Specifically, a specific implementation process of the traditional rough reduction method is given:
method 1
Inputting: decision table S ═ (U, C ═ D, V, f)
And (3) outputting: attribute reduction R
Firstly, calculating U/C to obtain U ', U'pos,U′neg
In the second step, the first step is that,
thirdly, any a epsilon C-R is treated as follows:
calculating the importance sig of each attribute in the set aR(a),BR(a),NBR(a) And U '/(R { U { a' }) (B)R(a) Indicates that all elements in the equivalence class are U'posAnd all elements of the equivalence class take the same value on the decision attribute, NBR(a) All the elements of the equivalent group are in U'neg);
Fourthly, recording sigR(a′)=max sigR(a) If there is more than one attribute, taking one of the attributes;
a fifth step of R ═ u { a' }; u' -BRa′-NBR(a′);
In the sixth step, ifOutputting R; otherwise, go to the next step;
seventh step, U'pos=U′pos-BR(a′),U′neg=U′neg-NBR(a′);
Eighth, the calculation of U '/R { a' } goes to, the third step.
On the basis of the above, how to implement the rough set parallel reduction method based on MapReduce of the present invention will be described in detail below.
Specifically, how to implement the parallel computation of the simplified decision table in step S30 is as follows:
definition 8: giving a decision table S ═ (U, C ^ D, V, f), and makingSi=(UiC ^ D, V, f) is a sub-decision table of S, which satisfies the following condition: <math> <mrow> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mi>U</mi> <mo>=</mo> <msubsup> <mo>&cup;</mo> <mrow> <mi>i</mi> <mo>-</mo> <mn>1</mn> </mrow> <mi>m</mi> </msubsup> <msub> <mi>U</mi> <mi>i</mi> </msub> <mo>;</mo> </mrow> </math> this means that we can split a decision table into many sub-decision tables that are not related to each other.
Theorem 1: giving a decision table S ═ (U, C ^ D, V, f), and makingSi=(UiAnd C ^ D, V, f) is a sub-decision table of S. Given an arbitrary subset of conditional attributesHaving equivalence relation U/B ═ E1,E2,…EiFor sub-decision table SiThe following conclusions can be drawn: the equivalence class of the decision table is required, and the equivalence class of each sub-decision table can be solved firstly. And then merging the same equivalence classes with the same attribute in the sub-decision tables to obtain the equivalent equivalence class.
According to the theorem 1, MapReduce can meet the requirement of obtaining equivalence class and obtain U 'of the equivalence class'pos,U′negAnd can be obtained simultaneously, so that the simplified decision table U' can be obtained by MapReduce. The parallel method PACSDT (parallel Algorithm for computing of a Simplified Decision Table) for computing the Simplified Decision table S' is given below, and the method PACSDT consists of two parts, PACSDT-Map and PACSDT-Reduce. The description is as follows:
method 2, PACSDT-Map (key, value)
Inputting: decision table Si=(Ui,C∪D,V,f),
And (3) outputting: < x _ C, x _ D > x _ C: condition attribute corresponding to object x, x _ D: the decision attribute corresponding to object x.
For example, the PACSDT-Map (key, value) input format provided by MapReduce is as follows:
after the calculation is finished by the method 2, sorting is carried out according to the key values output by the Map, and the sorted keys and values are transmitted to Reduce for further calculation, so that one key of < key, value > transmitted to Reduce comprises a plurality of values. Thus, each key is actually an equivalent class of the decision table. value is the set of decision attributes taken on the equivalence class.
Method 3, PACSDT-reduce (key value)
Inputting: < x _ C, x _ D > x _ C: value of condition attribute corresponding to object x, x _ D: the value of the decision attribute corresponding to object x;
and (3) outputting:<x_C,x_D+POSC(D)_flag+x_No>x _ C: conditional attribute corresponding to object x, x _ D + POSC(D) Flag + x _ No: decision attribute and POS corresponding to object xC(D) Flag, and object number.
For example, the PACSDT-reduce (key value) input format provided by MapReduce is as follows:
through calculation of the method 2 and the method 3, a new simplified decision table is obtained, and the decision table has one more POS besides the due characteristics of a common decision tableC(D) Flag, which plays an important role in the next determination of attribute importance. And if the simplified decision table is empty, taking the simplified decision table as a final reduction result of the decision table to be reduced and outputting the final reduction result.
Specifically, how to implement the importance of the parallel computing attribute in step S32 is as follows:
the attribute importance-based reduction method is widely applied to the traditional rough set and has good effect. Because the importance of each attribute can be calculated in parallel, the attribute importance can be used as a parallel mode of attribute reduction. But the MapReduce executed once can only obtain the importance of one attribute, and the efficiency is not high. The method for solving the attribute importance in the method 1 is improved, and the importance of all attributes can be calculated by MapReduce once, so that the efficiency is improved.
A parallel attribute importance calculation method PACAS (parallel Algorithm for calculation of Attribute Significance) is given below, and the calculation method comprises three calculation methods, namely PACAS-Map, PACAS-Reduce and PACAS. The description is as follows:
method 4, PACAS-Map (key, value)
Inputting: simplified decision list S'i=(U′i,C∪D,V,f)
And (3) outputting:<c+x_c∪R,x_D+POSC(D)_flag+x_No>c + x _ C is the combination of each attribute C ∈ C and the value of the object x on the attribute set C ≧ R in the decision table, x _ D + POSc∪R(D) The _flag + x _ No is the decision attribute corresponding to the object x and POSc∪R(D) Flag, and object number.
For example, the PACAS-Map (key, value) input format provided by MapReduce is as follows:
the decision value corresponding to each category of each attribute in each decision table can be obtained by the method 4. And after Map is finished, all the < key, value > pairs are sorted, and each classification of each attribute is output together and is used as the input of Reduce.
Method 5, PACAS-Reduce (key, value)
Inputting:<c+x_c,x_D+POSc∪R(D)_flag+x_No>c + x _ C is the combination of each attribute C belonging to C and the value of the object x in the attribute C in the decision table, and x _ D + POSc∪R(D) The _ flag + x _ No is the decision attribute corresponding to the object x andPOSc∪R(D) flag, and object number.
And (3) outputting:<c,sig(c)+BR(c)+NBR(c)>c attribute C in decision table is belonged to C, sig (C) + BR(c)+NBR(c) Is the importance of the attribute and B calculated when calculating the importanceR(c) And NBR(c)。
For example, the PACAS-Map (key, value) input format provided by MapReduce is as follows:
by the method 5, B taken by each equivalence class of each attribute can be obtainedR(c) And NBR(c) And | BR(c) | and | NBR(c) L. And saves the result in the text of the HDFS. The importance of each attribute can be calculated according to the content of the text, and the most important attribute is selected as a reduction result. The complete method for calculating the importance of the attribute is described by method 6.
Method 6PACAS
Inputting: simplified decision list S'i=(U′i,C∪D,V,f)
And (3) outputting: reduction result reduction
For example, the PACAS input format provided by MapReduce is as follows:
begin
let reduction←0;
namely, a MapReduce job is initialized, and sig (c) is obtained by calculating each equivalence class of each attribute through a method 4 and a method 5.
Specifically, how to implement the parallel reduction of the attribute importance in step S50 is as follows:
by the method 6, a reduction result of one-time calculation is obtained, the result is added into a reduction set, and then the attribute importance is solved for the next time, but before solving, the simplified decision table needs to be adjusted again to remove redundant information. This step may also use parallelized PACDT (parallel Algorithm for calculation of Decision Table). The method only comprises one PACDT-Map. The description is as follows:
method 7PACDT-Map (key, value)
Inputting: simplified decision list S'i=(U′i,C∪D,V,f)
And (3) outputting: new reduced decision Table S'i=(U′i,C∪D,V,f)
For example, the PACDT-Map (key, value) input format provided by MapReduce is as follows:
by the method 7, a new simplified decision table can be obtained, and the decision table is used as an input decision table for solving the attribute importance next time to calculate the attribute importance. The complete Attribute-importance-Based parallel Reduction method PACARBAS (parallel Algorithm for calculation of Attribute Reduction Based on Attribute Significance) will be given below. The method is described as follows:
method 8PACARBAS
Inputting: decision table Si ═ Ui,C∪D,V,f),
And (3) outputting: reduction of Reductions
For example, the pacarbase input format provided by MapReduce is as follows:
begin
let Reductions←0;
obtaining a simplified decision table S' by a method 2 and a method 3;
while S′is not empty do
reduction calculated by method 6
let Reductions←reduction;
Recalculating the reduced decision table by method 7;
end
Reductions
end
the method 8 gives a complete calculation reduction process, and the method adjusts the simplified decision table through multiple iterations to finally obtain a reduction result.
By introducing the above description of the methods 1 to 8, the implementation of the present invention can be summarized as the execution flow shown in fig. 3.
Specifically, the following is an example of how the reduction is realized by the above method by exemplifying a decision table, and it is a technical solution that can be more clearly understood by those skilled in the art.
Examples
First, a decision table S ═ (U, C ═ D, V, f) is given, and the table can be divided into two sub-decision tables, S1=(U1C ^ U ^ D, V, f) and S2=(U2C £ D, V, f), as shown in tables 1, 2:
TABLE 1 sub-decision Table S1
TABLE 2 sub-decision Table S2
Second, how to compute the reduced decision table and the importance of the attributes in parallel
Table 3 simplified decision table U'
And (3) Map stage: condition attribute and decision attribute separation < x _ C, x _ D >
Examples are:
Key={1,1,1,2}
Value={1}
and (3) Map stage: adding POSp(D) And line number<x_C,x_D+POSC(D)_flag+x_No>:
Examples are:
Key={1,1,1,2}
Value={1_1_1}
parallel computing attribute importance
And Map: inputting: simplified decision list S'i=(U′i,C∪D,V,f)
And (3) outputting:<c+x_c∪R,x_D+POSC(D)_flag+x_No>
examples are:
for NO1
Output < key, value > = { a _1,1_1_1}
{b_1,1_1_1}
{c_1,1_1_1}
{d_1,1_1_1}
Reduce: inputting:<c+x_c,x_D+POSc∪R(D)_flag+x_No>
and (3) outputting:<c,sig(c)+BR(c)+NBR(c)>
the importance of each attribute is calculated. Here, when the output of Map is collected to Reduce, the Map is sorted according to each attribute, and therefore, the keys of the same attribute are sorted together. Reduce can calculate the importance of all attributes at one time, while the existing method can only calculate the importance of one attribute at one time by mapreduce
Examples are:
after calculation:
sigR(a)=1
sigR(a)=0
sigR(c)=0
sigR(d)=0
method 6PACAS
Reading the result from HDFS, calculating the most important one as a reduction, and selecting attribute A as output
Finally, how to reduce the method based on attribute importance in parallel
B according to attribute aR(a) And NBR(a) The reduced decision table is recalculated, and the redundant information is deleted, so that the information with No1 is deleted.
The reduced decision table becomes:
then recalculate attribute importance:
BR(b)={X3,X4,X5},NBR(b)={X2,X9},sigR(b)=5
BR(c)={X3,X5},NBR(c)={X2},sigR(c)=3
BR(d)={X3,X4,X5},NBR(d)={X2,X9},sigR(d)=5
the same importance of the attributes occurs and then the same one is selected as the reduction, where b is selected as the output. And then recalculating the simplified decision table, wherein the result of the simplified decision table is null, and the reduction is finished to obtain the results redactions ═ a, b }.
In addition, the invention also provides a rough set parallel reduction device based on MapReduce, which comprises the following steps:
the operation configuration module is used for reading a decision table to be reduced;
the task parallel simplification module is used for initializing a first MapReduce model and enabling the first MapReduce model to respond to the decision table to be reduced so as to perform parallel calculation processing on the decision table to be reduced to obtain a simplified decision table with marks, and if the simplified decision table is empty, enabling the simplified decision table to be used as a final reduction result of the decision table to be reduced and outputting the final reduction result;
the attribute importance parallel computing module is used for initializing a second MapReduce model and enabling the second MapReduce model to respond to the simplified decision table with the marks if the simplified decision table is non-empty, so as to obtain the importance of each attribute in the simplified decision table with the marks through parallel computing and write the result into a Hadoop distributed file system;
and the attribute importance degree parallel reduction module is used for reading the decision table with the highest attribute importance degree in the Hadoop distributed file system and deleting redundant information in the decision table to obtain a new decision table to be reduced, and the new decision table to be reduced is used as an input value of the first MapReduce model to be reduced again.
Specifically, the task parallel simplification module is specifically configured to perform job configuration on the decision table to be simplified to obtain a plurality of sub-decision tables; enabling a Map function of the first MapReduce model to perform parallel calculation on the plurality of sub-decision tables to obtain condition attributes and decision attributes in the decision table to be reduced, and outputting the condition attributes and the decision attributes; and calculating the condition attribute and the decision attribute by using a Reduce function of the first MapReduce model to obtain a simplified decision table with marks.
Specifically, the attribute importance parallel computing module is specifically configured to initialize a second MapReduce model; enabling a Map function of the second MapReduce model to respond to the simplified decision table with the marks, and obtaining a decision value corresponding to each classification of each attribute in the simplified decision table with the marks through parallel calculation; and enabling a Reduce function of the second MapReduce model to respond to the decision value to obtain the attribute importance degree obtained by each equivalent class of each attribute, and writing the result into a Hadoop distributed file system.
Specifically, the attribute importance parallel reduction module is further configured to, when there are a plurality of decision tables with the highest attribute importance read in the Hadoop distributed file system, randomly select one of the decision tables with the highest attribute importance and delete redundant information therein to obtain a new decision table to be reduced.
Further, the invention also provides a rough set parallel reduction system based on MapReduce, which comprises:
the operation configuration unit is used for reading a decision table to be reduced;
the task parallel simplification unit is used for initializing a first MapReduce model and enabling the first MapReduce model to respond to the decision table to be reduced so as to perform parallel calculation processing on the decision table to be reduced to obtain a simplified decision table with marks, and if the simplified decision table is empty, enabling the simplified decision table to be used as a final reduction result of the decision table to be reduced and outputting the final reduction result;
the attribute importance parallel computing unit is used for initializing a second MapReduce model and enabling the second MapReduce model to respond to the simplified decision table with the marks if the simplified decision table is non-empty, so as to obtain the importance of each attribute in the simplified decision table with the marks through parallel computing and write the result into a Hadoop distributed file system;
and the attribute importance degree parallel reduction unit is used for reading the decision table with the highest attribute importance degree in the Hadoop distributed file system and deleting redundant information in the decision table to obtain a new decision table to be reduced, and the new decision table to be reduced is used as an input value of the first MapReduce model to be reduced again.
Specifically, the task parallel simplification unit is specifically configured to perform job configuration on the decision table to be simplified to obtain a plurality of sub-decision tables; enabling a Map function of the first MapReduce model to perform parallel calculation on the plurality of sub-decision tables to obtain condition attributes and decision attributes in the decision table to be reduced, and outputting the condition attributes and the decision attributes; calculating the condition attribute and the decision attribute by a Reduce function of the first MapReduce model to obtain a simplified decision table with marks;
specifically, the attribute importance parallel computing unit is specifically configured to initialize a second MapReduce model; enabling a Map function of the second MapReduce model to respond to the simplified decision table with the marks, and obtaining a decision value corresponding to each classification of each attribute in the simplified decision table with the marks through parallel calculation; enabling a Reduce function of the second MapReduce model to respond to the decision value to obtain the attribute importance degree obtained by each equivalent class of each attribute, and writing the result into a Hadoop distributed file system;
specifically, the attribute importance parallel reduction unit is further configured to, when there are a plurality of decision tables with the highest attribute importance read in the Hadoop distributed file system, randomly select one of the decision tables with the highest attribute importance and delete redundant information therein to obtain a new decision table to be reduced.
In summary, compared with the prior art, the invention has the following advantages:
first, the existing parallel reduction methods cannot accurately obtain the parallel reduction result because the methods proposed by them directly reduce the sub-decision tables after cutting in the map and then merge the reduction results, but the reduction methods need complete equivalence classes. Therefore, the proposed method actually obtains the reduction result from the one-sided data, and the reduction result has inaccuracy and inaccuracy.
Secondly, the parallel reduction method has limitations, and the currently proposed parallel reduction method requires the decision table to be a consistent decision table, and a definition of the consistent decision table is given below: for one decision table S, all objects are at the POSC(D) Then the decision table is a consistent decision table. If there is an object in the U-POSC(D) In (3), it is an incompatible decision table. POS hereC(D) Is just the POS in our paperC(D) For the tag given in method 3, the consistent decision list is the POS in My methodC(D) Decision table of all 1. I propose a method without these limitations. Fitting all decision tables.
Thirdly, the present invention is highly efficient. Firstly, the existing method is mainly to reduce on an original decision table, but the method firstly obtains a simplified decision table, and the reduction on the simplified decision table can greatly reduce the calculated amount, thereby improving the efficiency; secondly, the importance of only one condition can be obtained in the last parallel MapReduce for calculating the importance of the attribute by other methods, and the importance of all the condition attributes can be obtained by matching with a simple text reading calculation after the MapReduce is performed once by the method, so that the reduction is completed once, and the efficiency of the method is improved.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A rough set parallel reduction method based on MapReduce is characterized by comprising the following steps:
reading a decision table to be reduced;
initializing a first MapReduce model and enabling the first MapReduce model to respond to the decision table to be reduced so as to perform parallel computing processing on the decision table to be reduced to obtain a simplified decision table with a mark:
if the decision table is empty, the decision table is used as a final reduction result of the decision table to be reduced and output;
if the simplified decision table is not empty, initializing a second MapReduce model and enabling the second MapReduce model to respond to the simplified decision table with the marks so as to obtain the importance of each attribute in the simplified decision table with the marks through parallel calculation and write the result into a Hadoop distributed file system;
and reading a decision table with the highest attribute importance in the Hadoop distributed file system, deleting redundant information in the decision table to obtain a new decision table to be reduced, and enabling the new decision table to be reduced to be used as an input value of the first MapReduce model to be reduced again.
2. The MapReduce-based rough set parallel reduction method as recited in claim 1, wherein the specific method for performing parallel computation processing on the decision table to be reduced by using a first MapReduce model to obtain the reduction decision table comprises the following steps:
performing operation configuration on the decision table to be reduced to obtain a plurality of sub-decision tables;
enabling a Map function of the first MapReduce model to perform parallel calculation on the plurality of sub-decision tables to obtain condition attributes and decision attributes in the decision table to be reduced, and outputting the condition attributes and the decision attributes;
and calculating the condition attribute and the decision attribute by using a Reduce function of the first MapReduce model to obtain a simplified decision table with marks.
3. The MapReduce-based rough set parallel reduction method as set forth in claim 1 or 2, wherein the specific method for calculating the importance of each attribute in the simplified decision table in parallel by using a second MapReduce model comprises:
initializing a second MapReduce model;
enabling a Map function of the second MapReduce model to respond to the simplified decision table with the marks, and obtaining a decision value corresponding to each classification of each attribute in the simplified decision table with the marks through parallel calculation;
and enabling a Reduce function of the second MapReduce model to respond to the decision value to obtain the attribute importance degree obtained by each equivalent class of each attribute, and writing the result into a Hadoop distributed file system.
4. The MapReduce-based rough set parallel reduction method as claimed in claim 1 or 3, wherein if there are a plurality of decision tables with the highest attribute importance read in the Hadoop distributed file system, one decision table with the highest attribute importance is randomly selected and redundant information in the decision table is deleted to obtain a new decision table to be reduced.
5. A rough set parallel reduction device based on MapReduce is characterized by comprising:
the operation configuration module is used for reading a decision table to be reduced;
the task parallel simplification module is used for initializing a first MapReduce model and enabling the first MapReduce model to respond to the decision table to be reduced so as to perform parallel calculation processing on the decision table to be reduced to obtain a simplified decision table with marks, and if the simplified decision table is empty, enabling the simplified decision table to be used as a final reduction result of the decision table to be reduced and outputting the final reduction result;
the attribute importance parallel computing module is used for initializing a second MapReduce model and enabling the second MapReduce model to respond to the simplified decision table with the marks if the simplified decision table is non-empty, so as to obtain the importance of each attribute in the simplified decision table with the marks through parallel computing and write the result into a Hadoop distributed file system;
and the attribute importance degree parallel reduction module is used for reading the decision table with the highest attribute importance degree in the Hadoop distributed file system and deleting redundant information in the decision table to obtain a new decision table to be reduced, and the new decision table to be reduced is used as an input value of the first MapReduce model to be reduced again.
6. The MapReduce-based rough set parallel reduction device of claim 1, wherein:
the task parallel simplification module is specifically used for performing operation configuration on the decision table to be reduced to obtain a plurality of sub-decision tables; enabling a Map function of the first MapReduce model to perform parallel calculation on the plurality of sub-decision tables to obtain condition attributes and decision attributes in the decision table to be reduced, and outputting the condition attributes and the decision attributes; and calculating the condition attribute and the decision attribute by using a Reduce function of the first MapReduce model to obtain a simplified decision table with marks.
7. The MapReduce-based rough set parallel reduction device of claim 1, wherein:
the attribute importance parallel computing module is specifically used for initializing a second MapReduce model; enabling a Map function of the second MapReduce model to respond to the simplified decision table with the marks, and obtaining a decision value corresponding to each classification of each attribute in the simplified decision table with the marks through parallel calculation; and enabling a Reduce function of the second MapReduce model to respond to the decision value to obtain the attribute importance degree obtained by each equivalent class of each attribute, and writing the result into a Hadoop distributed file system.
8. The MapReduce-based rough set parallel reduction device of claim 1, wherein: the attribute importance degree parallel reduction module is further used for randomly selecting one decision table with the highest attribute importance degree and deleting redundant information in the decision table with the highest attribute importance degree to obtain a new decision table to be reduced when a plurality of decision tables with the highest attribute importance degree are read in the Hadoop distributed file system.
9. A rough set parallel reduction system based on MapReduce is characterized by comprising:
the operation configuration unit is used for reading a decision table to be reduced;
the task parallel simplification unit is used for initializing a first MapReduce model and enabling the first MapReduce model to respond to the decision table to be reduced so as to perform parallel calculation processing on the decision table to be reduced to obtain a simplified decision table with marks, and if the simplified decision table is empty, enabling the simplified decision table to be used as a final reduction result of the decision table to be reduced and outputting the final reduction result;
the attribute importance parallel computing unit is used for initializing a second MapReduce model and enabling the second MapReduce model to respond to the simplified decision table with the marks if the simplified decision table is non-empty, so as to obtain the importance of each attribute in the simplified decision table with the marks through parallel computing and write the result into a Hadoop distributed file system;
and the attribute importance degree parallel reduction unit is used for reading the decision table with the highest attribute importance degree in the Hadoop distributed file system and deleting redundant information in the decision table to obtain a new decision table to be reduced, and the new decision table to be reduced is used as an input value of the first MapReduce model to be reduced again.
10. The MapReduce-based rough set parallel reduction system of claim 9, wherein:
the task parallel simplification unit is specifically used for performing operation configuration on the decision table to be reduced to obtain a plurality of sub-decision tables; enabling a Map function of the first MapReduce model to perform parallel calculation on the plurality of sub-decision tables to obtain condition attributes and decision attributes in the decision table to be reduced, and outputting the condition attributes and the decision attributes; calculating the condition attribute and the decision attribute by a Reduce function of the first MapReduce model to obtain a simplified decision table with marks;
the attribute importance parallel computing unit is specifically used for initializing a second MapReduce model; enabling a Map function of the second MapReduce model to respond to the simplified decision table with the marks, and obtaining a decision value corresponding to each classification of each attribute in the simplified decision table with the marks through parallel calculation; enabling a Reduce function of the second MapReduce model to respond to the decision value to obtain the attribute importance degree obtained by each equivalent class of each attribute, and writing the result into a Hadoop distributed file system;
the attribute importance degree parallel reduction unit is also used for randomly selecting one decision table with the highest attribute importance degree and deleting redundant information in the decision table with the highest attribute importance degree to obtain a new decision table to be reduced when a plurality of decision tables with the highest attribute importance degree are read in the Hadoop distributed file system.
CN201410325508.9A 2014-07-09 2014-07-09 The parallel reduction method of rough set based on MapReduce, apparatus and system Expired - Fee Related CN104063230B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410325508.9A CN104063230B (en) 2014-07-09 2014-07-09 The parallel reduction method of rough set based on MapReduce, apparatus and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410325508.9A CN104063230B (en) 2014-07-09 2014-07-09 The parallel reduction method of rough set based on MapReduce, apparatus and system

Publications (2)

Publication Number Publication Date
CN104063230A true CN104063230A (en) 2014-09-24
CN104063230B CN104063230B (en) 2017-03-01

Family

ID=51550954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410325508.9A Expired - Fee Related CN104063230B (en) 2014-07-09 2014-07-09 The parallel reduction method of rough set based on MapReduce, apparatus and system

Country Status (1)

Country Link
CN (1) CN104063230B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598567A (en) * 2015-01-12 2015-05-06 北京中交兴路车联网科技有限公司 Data statistics and de-duplication method based on Hadoop MapReduce programming frame
CN106202278A (en) * 2016-07-01 2016-12-07 武汉泰迪智慧科技有限公司 A kind of public sentiment based on data mining technology monitoring system
CN109992587A (en) * 2019-04-09 2019-07-09 中南大学 Blast furnace molten iron silicon content based on big data forecasts determinant attribute decision method
CN115392582A (en) * 2022-09-01 2022-11-25 广东工业大学 Crop yield prediction method based on incremental fuzzy rough set attribute reduction

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336791A (en) * 2013-06-06 2013-10-02 湖州师范学院 Hadoop-based fast rough set attribute reduction method
CN103336790A (en) * 2013-06-06 2013-10-02 湖州师范学院 Hadoop-based fast neighborhood rough set attribute reduction method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336791A (en) * 2013-06-06 2013-10-02 湖州师范学院 Hadoop-based fast rough set attribute reduction method
CN103336790A (en) * 2013-06-06 2013-10-02 湖州师范学院 Hadoop-based fast neighborhood rough set attribute reduction method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘琼 等: "Map/Reduce框架下的粗糙集空间数据挖掘改进算法", 《测绘科学》 *
肖大伟 等: "一种基于粗糙集理论的快速并行属性约简算法", 《计算机科学》 *
陈鑫影 等: "基于粗糙集理论的并行约简算法", 《计算机应用》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598567A (en) * 2015-01-12 2015-05-06 北京中交兴路车联网科技有限公司 Data statistics and de-duplication method based on Hadoop MapReduce programming frame
CN104598567B (en) * 2015-01-12 2018-01-09 北京中交兴路车联网科技有限公司 A kind of method of the data statistics re-scheduling based on Hadoop MapReduce programming frameworks
CN106202278A (en) * 2016-07-01 2016-12-07 武汉泰迪智慧科技有限公司 A kind of public sentiment based on data mining technology monitoring system
CN106202278B (en) * 2016-07-01 2019-08-13 武汉泰迪智慧科技有限公司 A kind of public sentiment monitoring system based on data mining technology
CN109992587A (en) * 2019-04-09 2019-07-09 中南大学 Blast furnace molten iron silicon content based on big data forecasts determinant attribute decision method
CN109992587B (en) * 2019-04-09 2021-04-13 中南大学 Blast furnace molten iron silicon content prediction key attribute judgment method based on big data
CN115392582A (en) * 2022-09-01 2022-11-25 广东工业大学 Crop yield prediction method based on incremental fuzzy rough set attribute reduction
CN115392582B (en) * 2022-09-01 2023-11-14 广东工业大学 Crop yield prediction method based on increment fuzzy rough set attribute reduction

Also Published As

Publication number Publication date
CN104063230B (en) 2017-03-01

Similar Documents

Publication Publication Date Title
US9053067B2 (en) Distributed data scalable adaptive map-reduce framework
JP2021517295A (en) High-efficiency convolutional network for recommender systems
CN104881466B (en) The processing of data fragmentation and the delet method of garbage files and device
Wen et al. Exploiting GPUs for efficient gradient boosting decision tree training
JP2018522343A (en) Method, computer device and storage device for building a decision model
CN104063230B (en) The parallel reduction method of rough set based on MapReduce, apparatus and system
CN104820708B (en) A kind of big data clustering method and device based on cloud computing platform
CN104809244B (en) Data digging method and device under a kind of big data environment
JP6598996B2 (en) Signature-based cache optimization for data preparation
Prasad et al. GPU-based Parallel R-tree Construction and Querying
CN106598743A (en) Attribute reduction method for information system based on MPI parallel solving
CN103440246A (en) Intermediate result data sequencing method and system for MapReduce
JP2011170774A (en) Device and method for generation of decision tree, and program
CN103996216A (en) Power efficient attribute handling for tessellation and geometry shaders
CN105808582A (en) Parallel generation method and device of decision tree on the basis of layered strategy
Daoudi et al. Parallel diffrential evolution clustering algorithm based on mapreduce
Lin et al. A parallel Cop-Kmeans clustering algorithm based on MapReduce framework
Jensen et al. Feature grouping-based fuzzy-rough feature selection
JP2013242675A (en) Dispersion information control device, dispersion information search method, data dispersion arrangement method and program
Raj et al. PartEclat: an improved Eclat-based frequent itemset mining algorithm on spark clusters using partition technique
CN110325984B (en) System and method for hierarchical community detection in graphics
CN107291541A (en) Towards the compaction coarseness process level parallel optimization method and system of Key Value systems
Lee et al. Primitives for dynamic big model parallelism
Keswani et al. Enhanced approach to attain competent Big Data pre-processing
Tzacheva et al. MR-Apriori count distribution algorithm for parallel Action Rules discovery

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170301

CF01 Termination of patent right due to non-payment of annual fee