CN106033425A - A data processing device and a data processing method - Google Patents
A data processing device and a data processing method Download PDFInfo
- Publication number
- CN106033425A CN106033425A CN201510106455.6A CN201510106455A CN106033425A CN 106033425 A CN106033425 A CN 106033425A CN 201510106455 A CN201510106455 A CN 201510106455A CN 106033425 A CN106033425 A CN 106033425A
- Authority
- CN
- China
- Prior art keywords
- data
- denoising
- sub
- training
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a data processing device and a data processing method. The data processing device comprises an acquiring device and a training dataset selecting device. The acquiring device is used for selecting historical data as training datasets according to preset rules and dividing the historical data into sub-training-datasets and sub-testing-datasets, acquiring the information related to data types according to the attributes or the combination of the attributes of data in the sub-training-datasets, performing, according to each data type, prediction on the sub-testing-datasets by using a classifier trained by the sub-training datasets denoised in the data type and verifying the prediction result to acquire an optimal data type with the optimal prediction result. The training dataset selecting device is used for denoising data in the training datasets under the optimal data type to acquire training datasets with type proportions meeting a preset condition and classifying testing datasets by using a classifier trained by the training datasets meeting the preset condition.
Description
Technical field
The present invention relates to a kind of data processing technique, more particularly to for use in data prediction
Data handling equipment and data processing method.
Background technology
Along with development and the application demand of big data of the Internet, the data sharp increase of Suresh Kumar, its
In have a kind of data for seasonal effect in time series data, by the data having occurred and that or historical data,
Related algorithm (classify, predict and proposed algorithm) can be used following data to be predicted point
Analysis.
Although related algorithm can use all data of history as training dataset, but whether institute
Some training datasets have effect, part data to there is a certain amount of noise data.
Summary of the invention
In view of problem above makes the present invention.A kind of number is embodiment there is provided according to disclosure one
According to processing equipment, described messaging device includes: acquisition device, for selecting according to pre-defined rule
Historical data is divided into sub-training dataset and son as training dataset and by described historical data
Test data set, according to described sub-training data concentrate the combination of the attribute of data or attribute obtain about
The information of data type, and for each data type, by utilizing denoising under this data type
After the grader trained of sub-training dataset described sub-test data set is predicted and verifies
Predict the outcome, to obtain the optimal data type with optimum prediction result;And training dataset choosing
Select device, carry out denoising by the data under this optimal data type, described training data concentrated,
Obtain classification ratio and meet the training dataset of predetermined condition, with by utilizing this to meet predetermined condition
The grader trained of training dataset test set data are classified.
According to disclosure another embodiment, also provide for a kind of data processing method, including following step
Rapid: to select historical data to divide as training dataset and by described historical data according to pre-defined rule
For sub-training dataset and sub-test data set, according to described sub-training data concentrate data attribute or
The combination of attribute obtains the information about data type, and for each data type, by utilizing
The grader that sub-training dataset after denoising is trained under this data type is to described sub-test number
It is predicted according to collection and verifies predicting the outcome, to obtain the optimal data class with optimum prediction result
Type;And carry out denoising by the data under this optimal data type, described training data concentrated,
Obtain classification ratio and meet the training dataset of predetermined condition, with by utilizing this to meet predetermined condition
The grader trained of training dataset test data set is classified.
According to disclosure of the invention, at least obtain following Advantageous Effects: relative to prior art more
Obtain the classification of Future Data exactly.
Accompanying drawing explanation
Below with reference to the accompanying drawings illustrate embodiments of the invention, the present invention can be more readily understood that
Above and other objects, features and advantages.Parts in accompanying drawing are not proportional draftings, and only
It it is the principle in order to illustrate the present invention.For the ease of illustrating and describe the some parts of the present invention, accompanying drawing
Middle corresponding part may be exaggerated, i.e. makes it exemplary relative to manufacture according to reality of the present invention
Other parts in device become much larger.In the accompanying drawings, same or similar technical characteristic or parts
Same or similar reference will be used to represent.
Fig. 1 shows the schematic block diagram of the data handling equipment according to disclosure embodiment.
Fig. 2 is the figure of the example illustrating training dataset and test data set.
Fig. 3 illustrates the further configuration of the acquisition device in data handling equipment.
Fig. 4 illustrates an example concrete configuration of classifier training unit.
Fig. 5 (a) and Fig. 5 (b) illustrate the tool of the operation that denoising subelement and prediction subelement carry out
Body example.
Fig. 6 illustrates optimal data type acquiring unit further concrete configuration example.
Fig. 7 illustrates the flow chart of the data processing method according to embodiment of the present invention.
Fig. 8 illustrates the detailed substeps of the step of denoising and optimum type acquisition.
Fig. 9 illustrates denoising and the detailed substeps of predicted operation step.
Figure 10 illustrates the detailed substeps of optimum type obtaining step.
Figure 11 illustrates the detailed substeps of test data set classifying step.
Figure 12 shows can be as the number for realizing data processing method according to embodiments of the present invention
Structure diagram according to the general-purpose computing system of processing equipment.
Detailed description of the invention
Illustrate with reference to the accompanying drawings embodiment of the disclosure.An accompanying drawing or a kind of real in the disclosure
Execute the element described in mode and feature can with in one or more other accompanying drawing or embodiment
The element and the feature that illustrate combine.It should be noted that, for purposes of clarity, accompanying drawing and explanation save
Omit unrelated with the disclosure, parts known to persons of ordinary skill in the art and the expression of process and retouched
State.
Fig. 1 shows the schematic block diagram of the data handling equipment 100 according to disclosure embodiment.Should
Data handling equipment 100 includes that acquisition device 110 and training dataset select device 120.Wherein,
Acquisition device 110 selects historical data as training dataset and by history number according to pre-defined rule
According to being divided into sub-training dataset and sub-test data set, concentrate the attribute of data according to sub-training data
Or the combination of attribute obtains the information about data type, and for each data type, by profit
It is used under this data type the grader that the sub-training dataset after denoising is trained, antithetical phrase test data
Collection is predicted and verifies predicting the outcome, to obtain the optimal data type with optimum prediction result.
Wherein, the example of pre-defined rule can be such as the number ratio according to training dataset and test data set
Example etc..Training dataset selects device 120 by concentrating training data under this optimal data type
Data carry out denoising, obtain classification ratio and meet the training dataset of predetermined condition, with by profit
Test set data are classified by the grader trained with this training dataset meeting predetermined condition.
The present invention is unrestricted in terms of using which kind of grader, can use such as Naive Bayes Classification
The grader such as device, decision tree.
The example of above-mentioned training dataset and test data set is described below in conjunction with Fig. 2.At Fig. 2
In shown example, test data set can be time series data collection.
In fig. 2 it is shown that the example of test data set, it is that in January, 2014 was to 2014 5
The data of the moon, the classification ratio that it is predicted in advance is shown in phantom with dark color.It is noted that this dotted line institute
Show be prediction ratio Ry for determining training dataset use after a while, it is not finally to determine
Classification ratio.In the example in figure 2, before taking such as in January, 2014, such as 2010 1
The moon is historical data, i.e. training dataset to the data of in December, 2013.As in figure 2 it is shown, will instruction
Practice data set and be divided into two parts: be sub-training dataset and sub-test data set respectively, its classification ratio
Example is respectively with solid line and shown in phantom.Wherein, sub-training dataset is to 2013 from January, 2010
August in year, and sub-test data set is in December, 2013 from JIUYUE, 2013.Divide son training
The method of data set and sub-test data set can be such as according to historical data (training dataset) and
The ratio of the data volume of Future Data (test data set), it would however also be possible to employ additive method divides,
Can be such as empirical data or divided by limited experiment method, the present invention be not herein limited by limit
System.
The further configuration of above-mentioned acquisition device 110 is described below in conjunction with Fig. 3.As it is shown on figure 3, obtain
Fetching is put 110 and is included classifier training unit 310 and optimal data type acquiring unit 320.
Classifier training unit 310 is for for each data type, concentrating above-mentioned sub-training data
Data carry out denoising, be trained by the sub-training data set pair grader after denoising, Yi Jili
With housebroken grader, above-mentioned sub-test data set is predicted.
Optimal data type acquiring unit 320 can compare prediction test under each data type
Predicting the outcome of data set, and select the data type corresponding with optimum prediction result based on comparative result
As optimal data type.This can be relatively based on Auc (Area Under ROC Curve,
Area under ROC curve), the evaluation index such as accuracy rate or recall rate.
An example concrete configuration below with reference to Fig. 4 interpretive classification device training unit 310.
As shown in Figure 4, classifier training unit 310 includes denoising subelement 410 and prediction subelement
420.Wherein denoising subelement 410 is for under data type, included by sub-training dataset
Multiple data sets in each data set, perform denoising operation.This denoising operation includes these data
Data in group cluster, utilize from cluster centre apart from away from scope remove the noise in data set
Data, and the data in each data set eliminating noise data are incorporated as denoising data
Collection, wherein the data in data set have identical data attribute for same data type.Prediction
Unit 420 is used for performing predicted operation, it was predicted that operation includes utilizing denoising data set to train grader,
And it is predicted by the data of housebroken grader antithetical phrase test data set.
Referring to Fig. 5 (a) and Fig. 5 (b), denoising subelement 410 and prediction subelement are described
The concrete example of 420 operations carried out.In this example, it is assumed that learning algorithm is two sorting algorithms,
And in order to describe simplicity, it is assumed that data only have two kinds of label label 1 and label 2.?
In the historical data obtained, class label is nonequilibrium.Task herein is i.e. according to historical data
Label information, it was predicted that the label information of Future Data.
The targeted data type of denoising subelement 410 can be the attribute in data set, and this attribute can
To be referred to as group, can be such as exabyte, entry name, the such cyclic attributes of date hour.
Owing to the feature of data is relevant with data type, according to different data types, the spy carried from data
Levy and also differ.Under conditions of each data type or a combination thereof, extract the feature of sample point,
Thus data are divided into the multiple data sets under each data type.The denoising behaviour of denoising subelement 410
Make the multiple data sets under each data type that targeted object can be formed by.
The concrete example of denoising operation is described below.First the data in data set are clustered, will
It is 2 classes that data in data set are gathered.Clustering method can use k-means based on division methods,
Hierarchical clustering, or based on Density Clustering DBSCAN etc., the present invention is not herein limited by restriction.Through poly-
The data set of class is divided into two bunches, is denoted as bunch 1 He respectively as shown in Fig. 5 (a) and Fig. 5 (b)
Bunches 2.For ease of describing, feature is shown as by Fig. 5 (a) and Fig. 5 (b) two dimension, so
And it should be understood that it is can also be applied to the situation of multidimensional characteristic.
As it has been described above, the present invention illustrates as a example by uneven two sorting algorithms.At uneven two points
In class algorithm, there are two different label (label): label 1 and label 2, wherein label 1
The number number much larger than label 2.At Fig. 5 (a) and Fig. 5 (b), for understanding diagram, with reality
Heart circle represents label1, with triangular representation label2.In above-mentioned sorting procedure, if certain cluster
Middle label is that the number of samples of label 1 occupies the majority, then this bunch is referred to as just bunch, and another bunch is claimed
For negative bunch.The most in Figure 5, bunch 1 is just bunch, and bunches 2 are negative bunch.Due to label label 1 sample
This number much larger than the number that label is label 2 sample, for trying to achieve optimal classification ratio thus
Training grader, is referred to as noise sample point with the sample that label is label 1, just in negative bunch
Do not consider further that in just bunch with the sample that label is label 2.
To cluster after data carry out denoising method can according to cluster after bunch central point away from
From.The scope away from cluster centre is utilized to remove the noise data in data set after cluster.Below
The concrete example removing noise is described in conjunction with Fig. 5 (a) and Fig. 5 (b).Such as Fig. 5 (a) institute
Show, as in bunch 1 just bunch, the center found bunch, calculate noise and the distance of central point, press
Noise is removed according to distance.The center set bunch is to the maximum distance of noise as L.Set step-length t,
The scope of t is from 0 to 100%, obtains new maximum distance Lnew, Lnew=L* by step-length t
(1-t).By with bunch center as the center of circle, radius from Lnew to L in the middle of point be designated as to be removed
Noise, and this point is removed.As shown in Fig. 5 (b), the noise of removal represents with open circles.
Noting, distance computing formula is unrestricted, such as, can use the Euclidean distance of sample point i and j:
Wherein, xi1Represent the first dimensional feature value of sample point i, xj1Represent first Wei Te of sample point j
Value indicative, the rest may be inferred.
Data in each data set eliminating noise data are merged by denoising subelement 410
As denoising data set, the data in the most each data set have identical number for same data type
According to attribute.
Describe prediction subelement 420 below and perform the example of predicted operation.Prediction subelement 420 is going
Make an uproar and set up disaggregated model on data set to train grader, and be predicted in sub-test data set,
To obtain its evaluation index mark such as Auc (Area Under roc Curve), accuracy rate, to recall
The evaluation indexes such as rate.
The son of prediction under each data type that above-mentioned optimal data type acquiring unit 320 is carried out
The comparison predicted the outcome of test data set can be to comment based on above-mentioned Auc, accuracy rate or recall rate etc.
Valency index.As it has been described above, training dataset selection device 120 can be right under this optimal data type
The data concentrated including the training data of sub-training dataset and sub-test data set carry out denoising again, come
Obtain classification ratio and meet the training dataset of predetermined condition, with by utilizing this to meet predetermined condition
Test data set is classified by the grader that training dataset is trained.
According to a kind of embodiment, its optimum number can be obtained at optimal data type acquiring unit 320
After type, the most directly optimal data type is presented to training dataset and selects device 120.And
It is that classifier training unit 310 continues executing with for each data type, antithetical phrase training dataset
State denoising operation and predicted operation, i.e. perform repeatedly denoising operation and predicted operation.Such as denoising list
Each data set that unit 410 includes for the denoising data set obtained after operating the 1st denoising
Carry out the 2nd denoising operation, to obtain the denoising data set corresponding to the 2nd denoising operation.Specifically
Ground, denoising subelement 410 performs cluster operation again, is analyzing just bunch and negative bunch, calculate current in
Heart point, to the maximum distance L ' of noise, obtains new distance L according to step-length t ' new, L ' new=L ' *
(1-t), thus noise is removed according to this new distance.Prediction subelement 420 is by utilizing corresponding to this
The denoising data set training grader of the 2nd denoising operation, and by housebroken grader antithetical phrase
The data of test data set are predicted performing the 2nd predicted operation, and obtain its evaluation index and divide
The evaluation indexes such as number such as Auc, accuracy rate, recall rate.Classifier training unit 310 can be more
Perform above-mentioned denoising operation and predicted operation secondaryly, be not limited to twice.
Then in repeatedly denoising and predicted operation is selected by optimal data type acquiring unit 320
Secondary corresponding the predicting the outcome as the optimum prediction result under current data type of operation.Below with reference to figure
6 describe optimal data type acquiring unit 320 operates further concrete configuration example for carrying out this.
As shown in Figure 6, optimal data type acquiring unit 320 includes the checking subelement that predicts the outcome
610 and data type selecting subelement 620.Wherein, it was predicted that result verification subelement 610 is for pin
To each data type, n (n >=2) denoising by performing for this data type is operated and
Each predicting the outcome during the n that predicted operation is obtained predicts the outcome compares, and based on comparing
As a result, wherein predict the outcome as this data type corresponding with predicted operation of a denoising operation is selected
Under predict the outcome, wherein predict the outcome as this data class corresponding with predicted operation of this denoising operation
Predicting the outcome of described n under type the middle optimum that predicts the outcome.Data type selects subelement 620
For predicting the outcome under each data type is compared, and select corresponding with optimum prediction result
Data type as optimal data type.
Describing the checking subelement 610 that predicts the outcome below selects the optimum prediction under each data type to tie
The concrete example of the operation of fruit.Above-mentioned cluster, denoising iterative process in, it was predicted that result verification son
Unit 610 calculates, according to same sub-test data set and same evaluation index, the difference predicted the outcome
Value, when difference meets | Eva-Evalast| < during D, this training pattern can be selected for optimum, it is possible to
To select training pattern last time as optimum, wherein Eva is the evaluation score of this training pattern, Evalast
Being the evaluation score of training pattern last time, D is the difference upper limit parameter pre-set.Predict the outcome and test
Card subelement 610 selects the process of optimum prediction result to be not limited to this, it is also possible to be such as to select to meet
Predicting the outcome as optimum prediction result of the threshold value arranged in advance.Such as, threshold value 0.8 is set in advance,
When the class prediction ratio of label 1/label 2 meets 0.8 or with the absolute value of the difference of 0.8 at certain model
When enclosing interior, it is possible to think that current predictive result is for optimum.
The foregoing describe the example selecting optimum prediction result.The operation carrying out selecting is not limited to above public affairs
The example opened, the evaluation index being evaluated is the most unrestricted.Can meet | Eva-Evalast|<D
Or other stop iteration when arranging condition, it is also possible to not stopping iteration, the present invention is the most unrestricted at this.
Application-specific data type selects subelement 620 to select the example of operation of optimal data type below.Number
The data file used under optimal models, optimal models can be recorded according to type selecting subelement 620
The positive and negative ratio of classification that the sub-training data of lower use is concentrated, it is also possible to record is deleted in an iterative process
Noise data, and calculate in the case of optimal models, central point is to the noise data sample still suffered from
Minimum range Lmin of point.Data type selects subelement 620 by the optimum under each data type
Predict the outcome and compare, and select the data type corresponding with optimum prediction result as optimal data
Type G*, and classification ratio R* (G*) optimum under optimum data type.Data type is selected
Select subelement 620 and can also obtain combination of data types G*, and under optimal data type combination
Optimum classification ratio R* (G*).
After so have selected optimal data type, training dataset selects device 120 to perform
Operation, i.e. under optimal data type, (includes that sub-training dataset and son are surveyed for training dataset
Examination data set) all data of including carry out clustering and denoising, and further cluster and denoising operate for these
One or many can be performed, until the number that the training data that cluster and denoising operation obtain is concentrated
According to classification ratio Rt meet predetermined condition till.
Select the cluster that performs for optimal data type of device 120 at above-mentioned training dataset and go
During making an uproar, it is possible to use Lmin/Lmax carrys out step-length t of optimum option, thus in known optimization
Step-length t in the case of reduce iterative process.This is because iteration is required for component classification mould each time
Type, elapsed time, and step-length t obtaining optimizing can aid in and calculates quickly.
Above-mentioned predetermined condition may is that | Rt/Ry-R*/Rx | < D'
Wherein, Rx is the classification ratio of data in described sub-test data set, Ry for use ARMA,
The classification ratio of data in the test data set that SVR algorithm etc. is predicted in advance, R* is the most as mentioned above
Classification ratio optimum under excellent combination of data types.D ' is the error upper limit parameter pre-set, and it can
To be empirical value, by those skilled in the art's iterations as required, classification ratio is wanted
Asking etc. and to determine, the present invention is not herein limited by restriction.
Select it addition, predetermined condition can also carry out other as the case may be, such as can also is that
Rt/Ry≥R*/Rx.Those skilled in the art can be set according to actual needs.
According to another embodiment of the present invention, also provide for a kind of data processing method S700.Such as Fig. 7
Shown in, this data processing method includes: in step S710, selects history number according to pre-defined rule
According to as training dataset, historical data is divided into sub-training dataset and sub-test data set.Its
In, the method dividing sub-training dataset and sub-test data set can use as described above for data process
Equipment 100 is with reference to the method shown in Fig. 2.In step S720, concentrate number according to sub-training data
According to attribute or the combination of attribute obtain about the information of data type, and for each data class
Type is by utilizing the grader that sub-training dataset after denoising is trained under this data type, right
Sub-test data set is predicted and verifies predicting the outcome, to obtain the optimum with optimum prediction result
Data type.In step S730, the data under this optimal data type, training data concentrated
Carry out denoising, obtain classification ratio and meet the training dataset of predetermined condition, with by utilizing this full
Test data set is classified by the grader that the training dataset of foot predetermined condition is trained.
Wherein, in step S710, test data set can be time series data collection, and historical data
Selection can be based on the time period residing for data in test data set.
Wherein, as shown in Figure 8, step S720 carrying out denoising and optimum type acquisition may include that
In step S810, for each data type, the data that antithetical phrase training data is concentrated carry out denoising,
It is trained by the sub-training data set pair grader after denoising, utilizes housebroken grader antithetical phrase
Test data set is predicted.In step S820, utilize sub-test data set in each data
Predict under type that predicting the outcome of sub-test data set compares, based on comparative result, select with
The data type of the excellent correspondence that predicts the outcome is as optimal data type.The wherein ratio in step S820
Can be relatively based on evaluation indexes such as above-mentioned Auc, accuracy rate or recall rates.
As it is shown in figure 9, the denoising operation of step S810 is under this data type, sub-training
The each data set in multiple data sets included by data set, and perform, and can farther include:
In step S910, the data in this data set are clustered.In step S920, utilize from
Cluster centre removes the noise data in this data set apart from remote scope.The concrete denoising behaviour of this step
Refer to the above-mentioned description to Fig. 5 (a) He Fig. 5 (b).In step S930, will eliminate
Data in each data set of noise data merge, as denoising data set, wherein in data set
Data have identical data attribute for data type.In step S940, utilize denoising data
Collection training grader, is predicted by the data of housebroken grader antithetical phrase test data set.Should
Prediction can obtain evaluation index mark such as Auc, accuracy rate or recall rate etc..
Wherein, the denoising of above-mentioned steps S920 to S940 and predicted operation can perform repeatedly, directly
To meeting predetermined condition.These concrete operations being performed a plurality of times and iteration stopping condition refer to
On about the description of data handling equipment 100.
As shown in Figure 10, above optimum type obtaining step S820 can also include: in step S1010
In, for each data type, the repeatedly denoising by performing for this data type is operated and pre-
Survey that operation obtained multiple predict the outcome in each predicting the outcome compare, and based on comparing knot
Really, wherein predict the outcome as this data type under corresponding with predicted operation of a denoising operation is selected
Predict the outcome, predict the outcome as this data type under corresponding with predicted operation of this denoising operation
The predicting the outcome of multiple middle optimum that predicts the outcome.The concrete operations of this selection optimum prediction result are permissible
With reference to the description above with respect to the checking subelement 610 that predicts the outcome.In step S1020, the most every
Predicting the outcome under individual data type, selects the data type corresponding with optimum prediction result as optimum
Data type.Its concrete operations refer to select the description of subelement 620 above with respect to data type.
As shown in figure 11, above-mentioned step S730 may further include following training dataset and selects
Sub-step: in step S1110, under optimal data type, the institute included for training dataset
Data are had to carry out clustering and denoising;In step S1120, it is judged that the class of the data that training data is concentrated
Whether other ratio Rt meets predetermined condition.If it is satisfied, iteration terminates, if be unsatisfactory for, return
Perform step S1110.
Above-mentioned predetermined condition may is that Rt/Ry >=R*/Rx, and wherein Rx is described sub-test data set
The classification ratio of middle data, Ry is in the test data set using the predictions such as ARMA, SVR algorithm
The classification ratio of data, R* is classification ratio optimum under optimal data type combination as mentioned above.
Therefore, in step S1020, it is also possible to include following sub-step: obtain and optimum prediction result
Corresponding data category ratio.
It should be noted that the concrete steps of the method described in the embodiment of the present invention can be with data above
The operation of components of processing equipment similarly configures, therefore, non-detailed portion in embodiment of the method, please join
See in apparatus embodiments and describe accordingly, repeat no more here.
According to other embodiments of the invention, basis can be equipped with in such as computer, server etc.
The data handling equipment of the above embodiment of the present invention, so that it can possess above-mentioned various data
Process function.
The most it is described in detail by block diagram, flow chart and/or embodiment, has illustrated basis
The equipment of embodiments of the invention and/or the different embodiments of method.When these block diagrams, flow chart
And/or embodiment comprise one or more function and/or operation time, it will be obvious to those skilled in the art that
Each function in these block diagrams, flow chart and/or embodiment and/or operation can pass through various hardware,
Software, firmware or substantially their combination in any and implement individually and/or jointly.In one
In embodiment, several parts of the theme described in this specification can pass through application-specific IC
(ASIC), field programmable gate array (FPGA), digital signal processor (DSP) or other are integrated
Form realizes.But, it will be understood by those skilled in the art that the embodiment party described in this specification
Some aspects of formula can be the most in integrated circuits with on one or more computers
The form of the one or more computer programs run is (such as, with in one or more computer systems
The form of one or more computer programs of upper operation), to run on the one or more processors
One or more programs form (such as, with on one or more microprocessors run one
Or the form of multiple program), the form with firmware or the form with substantially their combination in any
Equally implement, and, according to the content disclosed in this specification, it is designed for the circuit of the disclosure
And/or write and be entirely those skilled in the art's for the software of the disclosure and/or the code of firmware
Within limit of power.
In the case of being realized by software or firmware, can be special to having from storage medium or network
The computer (the such as general purpose computer 1200 shown in Figure 12) of hardware configuration is installed and is constituted this software
Program, this computer is when being provided with various program, it is possible to perform various function.
Figure 12 shows can be as realizing data processing method according to embodiments of the present invention
The structure diagram of general-purpose computing system of data handling equipment.Computer system 1200 is one
Individual example, does not implies that the limitation of the range to methods and apparatus of the present invention or function.Also
Computer system 1200 should be construed to any component shown in Exemplary operating system 1200
Or a combination thereof has dependence or demand.
In fig. 12, CPU (CPU) 1201 is according in read only memory (ROM) 1202
The program stored or the program being loaded into random access memory (RAM) 1203 from storage part 1208
Perform various process.In RAM 1203, perform various always according to needs storage as CPU 1201
Data required during process etc..CPU 1201, ROM 1202 and RAM 1203 are via bus
1204 are connected to each other.Input/output interface 1205 is also connected to bus 1204.
Components described below is also connected to input/output interface 1205: importation 1206 (include keyboard,
Mouse etc.), output part 1207 (include display, such as cathode ray tube (CRT), liquid crystal
Display (LCD) etc., and speaker etc.), storage part 1208 (including hard disk etc.), communications portion
1209 (including NIC such as LAN card, modem etc.).Communications portion 1209 warp
Communication process is performed by network such as the Internet.As required, driver 1210 can be connected to defeated
Enter/output interface 1205.Detachable media 1211 such as disk, CD, magneto-optic disk, quasiconductor are deposited
Reservoir etc. can be installed in driver 1210 as required so that the computer read out
Program can be installed to store in part 1208 as required.
In the case of realizing above-mentioned series of processes by software, can from network such as the Internet or from
Storage medium such as detachable media 1211 installs the program constituting software.
It will be understood by those of skill in the art that this storage medium is not limited to its shown in Figure 12
In have program stored therein and equipment distributes the detachable media of the program that provides a user with separately
1211.The example of detachable media 1211 comprises disk (comprising floppy disk), CD (comprises CD read-only
Memorizer (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprise mini-disk (MD) (registration
Trade mark)) and semiconductor memory.Or, storage medium can be ROM 1202, storage part 1208
In the hard disk that comprises etc., wherein computer program stored, and being distributed to together with the equipment comprising them
User.
Therefore, the invention allows for the program product that a kind of storage has the instruction code of machine-readable
Product.When described instruction code is read by machine and performs, can perform above-mentioned according to embodiments of the present invention
Image processing method.Correspondingly, for carrying above-named various storage Jie of this program product
Matter is also included within disclosure of the invention.
In description to the specific embodiment of the invention above, describe for a kind of embodiment and/or
The feature illustrated can make in one or more other embodiment in same or similar mode
With, combined with the feature in other embodiment, or substitute the feature in other embodiment.
It should be emphasized that term " includes/comprises " referring to when using feature, key element, step or group herein
The existence of part, but it is not precluded from the existence of one or more further feature, key element, step or assembly
Or it is additional.Relating to the term " first " of ordinal number, " second " etc. is not offered as what these terms were limited
The enforcement order of feature, key element, step or assembly or importance degree, and be only used to describe
For being identified between these features, key element, step or assembly for the sake of Qing Chu.
Additionally, the method for various embodiments of the present invention be not limited to specifications described in or accompanying drawing
Shown in time sequencing perform, it is also possible to according to other time sequencing, concurrently or independently
Perform.Therefore, the execution sequence of the method described in this specification not technical scope structure to the present invention
Become to limit.
Understanding according with disclosure above, the solution of the present invention includes but not limited to:
1, a kind of data handling equipment, described data handling equipment includes:
Acquisition device, for selecting historical data as training dataset and by institute according to pre-defined rule
State historical data and be divided into sub-training dataset and sub-test data set, according to described sub-training dataset
The attribute of middle data or the combination of attribute obtain the information about data type, and for each data
Type, by utilizing the grader pair that sub-training dataset after denoising is trained under this data type
Described sub-test data set is predicted and verifies predicting the outcome, and has optimum prediction result to obtain
Optimal data type;And
Training dataset selects device, by concentrating described training data under this optimal data type
Data carry out denoising, obtain classification ratio and meet the training dataset of predetermined condition, with by profit
Test data set is classified by the grader trained with this training dataset meeting predetermined condition.
2, the data handling equipment as described in scheme 1, wherein, ordinal number when described test data set is
According to collection, described acquisition device is gone through based on described in the time period selection residing for data in described test data set
History data and described historical data is divided into sub-training dataset and sub-test data set.
3, the data handling equipment as described in scheme 1 or 2, wherein, described acquisition device includes:
Classifier training unit, for for each data type, concentrating described sub-training data
Data carry out denoising, are trained by the sub-training data set pair grader after denoising, and utilize
Described sub-test data set is predicted by housebroken grader;And
Optimal data type acquiring unit, is used for utilizing described sub-test data set at each data class
Predict under type that described sub-predicting the outcome of test set data is verified, and based on the result, select
The data type corresponding with optimum prediction result is as described optimal data type.
4, the data handling equipment as described in scheme 3, wherein, described classifier training unit includes:
Denoising subelement, for under this data type, included by described sub-training dataset
Each data set in multiple data sets, performs denoising operation, and described denoising operation includes these data
Data in group cluster, utilize from cluster centre apart from away from scope remove in described data set
Noise data, and the data in each data set eliminating described noise data are incorporated as
Making an uproar data set, the data in wherein said data set have identical data for described data type and belong to
Property;
Prediction subelement, is used for performing predicted operation, and described predicted operation includes utilizing described denoising number
According to collection training grader, and by housebroken grader, the data of described sub-test data set are entered
Row prediction.
5, the data handling equipment as described in scheme 4, wherein, described classifier training unit is used for
For each data type, described sub-training dataset is performed n described denoising operation with described pre-
Survey operation, including:
Described denoising subelement is in the denoising data set obtained after operating (n-1)th denoising
Including each data set carry out n-th denoising operation, with obtain corresponding to n-th denoising operation
Denoising data set;
Described prediction subelement is trained by utilizing the denoising data set corresponding to n-th denoising operation
Grader and the data of described sub-test data set being predicted by housebroken grader
Performing n-th predicted operation, wherein n is the integer more than or equal to 2.
6, the data handling equipment as described in scheme 5, wherein, described optimal data type obtains single
Unit includes:
Predict the outcome checking subelement, for for each data type, to by for this data class
N described denoising operation that type performs and the n that obtained of described predicted operation is individual predict the outcome in every
Individual predicting the outcome is verified, and based on the result, selects x: th denoising operation and predicted operation
Corresponding predicts the outcome as predicting the outcome under this data type, and wherein x: th denoising operation is with pre-
Survey corresponding the predicting the outcome as the described n under this data type the pre-of middle optimum that predict the outcome of operation
Surveying result, wherein x is the integer more than or equal to 2 less than or equal to n;And
Data type selects subelement, for predicting the outcome under each data type is compared,
And select the data type corresponding with optimum prediction result as described optimal data type.
7, the data handling equipment as described in any one in scheme 1-6, wherein, training dataset
Selection device is configured to: under described optimal data type, include for described training dataset
All data carry out clustering and denoising;
Described training dataset selects unit to be configured to perform m cluster and denoising, until m
Classification ratio Rt of the data that the training data obtained after secondary cluster and denoising is concentrated meets described pre-
Fixed condition, wherein m is the integer more than or equal to 1.
8, the data handling equipment as described in scheme 7, wherein, described predetermined condition is:
| Rt/Ry-R*/Rx | < D'
Wherein, Rx is the classification ratio of data in described sub-test data set, and Ry is the described of prediction
The classification ratio of data in test data set, R* is the data category ratio corresponding with optimum prediction result
Example, D ' is the error upper limit parameter pre-set.
9, the data handling equipment as described in scheme 8, wherein, described optimal data type obtains single
Unit obtains the data category corresponding with described optimum prediction result when obtaining described optimum prediction result
Ratio.
10, the data handling equipment as according to any one of scheme 4-9, wherein, described historical data
Being non-equilibrium data with described test set data, described denoising subelement is for by under this data type
It is 2 classes that data in each data set are gathered.
11, a kind of data processing method, comprises the following steps:
Historical data is selected to draw as training dataset and by described historical data according to pre-defined rule
It is divided into sub-training dataset and sub-test data set, concentrates the attribute of data according to described sub-training data
Or the combination of attribute obtains the information about data type, and for each data type, by profit
It is used under this data type grader that the sub-training dataset after denoising trained to described sub-test
Data set is predicted and verifies predicting the outcome, to obtain the optimal data class with optimum prediction result
Type;And
Carry out denoising by the data under this optimal data type, described training data concentrated, obtain
Obtain classification ratio and meet the training dataset of predetermined condition, with by utilizing this to meet the instruction of predetermined condition
Test data set is classified by the grader that white silk data set is trained.
12, the data processing method as described in scheme 11, wherein, described test data set is sequential
Data set, described historical data is to select based on the time period residing for data in described test data set
, and described historical data is divided into sub-training dataset and sub-test data set.
13, the data processing method as described in scheme 11 or 12, wherein, for each data type,
The data concentrating described sub-training data carry out denoising, are divided by the sub-training data set pair after denoising
Class device is trained, and utilizes housebroken grader to be predicted described sub-test data set;
And
Utilize described sub-test data set to predicting described sub-test set data under each data type
Predict the outcome and verify, and based on the result, select the data corresponding with optimum prediction result
Type is as described optimal data type.
14, the data processing method as described in scheme 13, wherein, carries out denoising, trains and predict
Operation include:
Every in the multiple data sets under this data type, included by described sub-training dataset
Individual data set, performs denoising operation, and described denoising operation includes gathering the data in this data set
Class, utilize from cluster centre apart from away from scope remove the noise data in described data set, and will
The data eliminated in each data set of described noise data are incorporated as denoising data set, Qi Zhongsuo
The data stated in data set have identical data attribute for described data type;
Performing predicted operation, described predicted operation includes utilizing described denoising data set to train grader,
And by housebroken grader, the data of described sub-test data set are predicted.
15, the data processing method as described in scheme 14 is wherein, for each data type, right
Described sub-training dataset performs n described denoising operation and described predicted operation, including:
The each data included for the denoising data set obtained after operating (n-1)th denoising
Group carries out n-th denoising operation, to obtain the denoising data set corresponding to n-th denoising operation;
By utilizing the denoising data set training grader corresponding to n-th denoising operation and passing through
The data of described sub-test data set are predicted performing n-th prediction by housebroken grader
Operation, wherein n is the integer more than or equal to 2.
16, the data processing method as described in scheme 15, wherein, selects optimal data type to include:
For each data type, denoising described to n time by performing for this data type operates
Each predicting the outcome during n obtained with described predicted operation predicts the outcome is verified, and base
In the result, select predict the outcome as this data corresponding with predicted operation of x: th denoising operation
Predicting the outcome under type, what wherein x: th denoising operation was corresponding with predicted operation predicts the outcome as this
Predicting the outcome of described n under data type the middle optimum that predicts the outcome, wherein x is more than or equal to 2
Integer less than or equal to n;And
Predicting the outcome under each data type is compared, and selects corresponding with optimum prediction result
Data type as described optimal data type.
17, the data processing method as described in any one in scheme 11-16, wherein, described
Under excellent data type, all data included for described training dataset carry out clustering and denoising;
Perform m cluster and denoising, until the training data that the m time cluster and denoising obtain
Classification ratio Rt of data concentrated meets described predetermined condition, and wherein m is whole more than or equal to 1
Number.
18, the data processing method as described in scheme 17, wherein, described predetermined condition is:
|Rt/R y- R*/R x|< D'
Wherein, Rx is the classification ratio of data in described sub-test data set, and Ry is the described of prediction
The classification ratio of data in test data set, R* is the data category ratio corresponding with optimum prediction result
Example, D ' is the error upper limit parameter pre-set.
19, the data processing method as described in scheme 18, wherein, is obtaining described optimum prediction knot
The data category ratio corresponding with described optimum prediction result is obtained time really.
Although the most by the present invention is draped over one's shoulders by the description of the specific embodiment of the present invention
Dew, however, it is to be understood that those skilled in the art can be in spirit and scope of the appended claims
The interior design various amendments, improvement or equivalent to the present invention.These are revised, improve or be equal to
Thing should also be as being to be considered as included in protection scope of the present invention.
Claims (10)
1. a data handling equipment, described data handling equipment includes:
Acquisition device, for selecting historical data as training dataset and by institute according to pre-defined rule
State historical data and be divided into sub-training dataset and sub-test data set, according to described sub-training dataset
The attribute of middle data or the combination of attribute obtain the information about data type, and for each data
Type, by utilizing the grader pair that sub-training dataset after denoising is trained under this data type
Described sub-test data set is predicted and verifies predicting the outcome, and has optimum prediction result to obtain
Optimal data type;And
Training dataset selects device, by concentrating described training data under this optimal data type
Data carry out denoising, obtain classification ratio and meet the training dataset of predetermined condition, with by profit
Test data set is classified by the grader trained with this training dataset meeting predetermined condition.
2. data handling equipment as claimed in claim 1, wherein, when described test data set is
Sequence data set, described acquisition device selects institute based on the time period residing for data in described test data set
State historical data and described historical data is divided into sub-training dataset and sub-test data set.
3. data handling equipment as claimed in claim 1 or 2, wherein, described acquisition device bag
Include:
Classifier training unit, for for each data type, concentrating described sub-training data
Data carry out denoising, are trained by the sub-training data set pair grader after denoising, and utilize
Described sub-test data set is predicted by housebroken grader;And
Optimal data type acquiring unit, is used for utilizing described sub-test data set at each data class
Predict under type that described sub-predicting the outcome of test set data is verified, and based on the result, select
The data type corresponding with optimum prediction result is as described optimal data type.
4. data handling equipment as claimed in claim 3, wherein, described classifier training unit
Including:
Denoising subelement, for under this data type, included by described sub-training dataset
Each data set in multiple data sets, performs denoising operation, and described denoising operation includes these data
Data in group cluster, utilize from cluster centre apart from away from scope remove in described data set
Noise data, and the data in each data set eliminating described noise data are incorporated as
Making an uproar data set, the data in wherein said data set have identical data for described data type and belong to
Property;
Prediction subelement, is used for performing predicted operation, and described predicted operation includes utilizing described denoising number
According to collection training grader, and by housebroken grader, the data of described sub-test data set are entered
Row prediction.
5. data handling equipment as claimed in claim 4, wherein, described classifier training unit
For for each data type, described sub-training dataset being performed n described denoising operation and institute
State predicted operation, including:
Described denoising subelement is in the denoising data set obtained after operating (n-1)th denoising
Including each data set carry out n-th denoising operation, with obtain corresponding to n-th denoising operation
Denoising data set;
Described prediction subelement is trained by utilizing the denoising data set corresponding to n-th denoising operation
Grader and the data of described sub-test data set being predicted by housebroken grader
Performing n-th predicted operation, wherein n is the integer more than or equal to 2.
6. data handling equipment as claimed in claim 5, wherein, described optimal data type obtains
Take unit to include:
Predict the outcome checking subelement, for for each data type, to by for this data class
N described denoising operation that type performs and the n that obtained of described predicted operation is individual predict the outcome in every
Individual predicting the outcome is verified, and based on the result, selects x: th denoising operation and predicted operation
Corresponding predicts the outcome as predicting the outcome under this data type, and wherein x: th denoising operation is with pre-
Survey corresponding the predicting the outcome as the described n under this data type the pre-of middle optimum that predict the outcome of operation
Surveying result, wherein x is the integer more than or equal to 2 less than or equal to n;And
Data type selects subelement, for predicting the outcome under each data type is compared,
And select the data type corresponding with optimum prediction result as described optimal data type.
7. the data handling equipment as described in any one in claim 1-6, wherein, trains number
Device is selected to be configured to according to collection: under described optimal data type, for described training dataset bag
The all data included carry out clustering and denoising;
Described training dataset selects unit to be configured to perform m cluster and denoising, until m
Classification ratio Rt of the data that the training data obtained after secondary cluster and denoising is concentrated meets described pre-
Fixed condition, wherein m is the integer more than or equal to 1.
8. data handling equipment as claimed in claim 7, wherein, described predetermined condition is:
| Rt/Ry-R*/Rx | < D'
Wherein, Rx is the classification ratio of data in described sub-test data set, and Ry is the described of prediction
The classification ratio of data in test data set, R* is the data category ratio corresponding with optimum prediction result
Example, D ' is the error upper limit parameter pre-set.
9. data handling equipment as claimed in claim 8, wherein, described optimal data type obtains
Take unit and obtain the data corresponding with described optimum prediction result when obtaining described optimum prediction result
Classification ratio.
10. a data processing method, comprises the following steps:
Historical data is selected to draw as training dataset and by described historical data according to pre-defined rule
It is divided into sub-training dataset and sub-test data set, concentrates the attribute of data according to described sub-training data
Or the combination of attribute obtains the information about data type, and for each data type, by profit
It is used under this data type grader that the sub-training dataset after denoising trained to described sub-test
Data set is predicted and verifies predicting the outcome, to obtain the optimal data class with optimum prediction result
Type;And
Carry out denoising by the data under this optimal data type, described training data concentrated, obtain
Obtain classification ratio and meet the training dataset of predetermined condition, with by utilizing this to meet the instruction of predetermined condition
Test data set is classified by the grader that white silk data set is trained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510106455.6A CN106033425A (en) | 2015-03-11 | 2015-03-11 | A data processing device and a data processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510106455.6A CN106033425A (en) | 2015-03-11 | 2015-03-11 | A data processing device and a data processing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106033425A true CN106033425A (en) | 2016-10-19 |
Family
ID=57149786
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510106455.6A Pending CN106033425A (en) | 2015-03-11 | 2015-03-11 | A data processing device and a data processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106033425A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107715273A (en) * | 2017-10-12 | 2018-02-23 | 西南大学 | Alarm clock implementing method, apparatus and system |
CN108073154A (en) * | 2016-11-11 | 2018-05-25 | 横河电机株式会社 | Information processing unit, information processing method and recording medium |
CN108897829A (en) * | 2018-06-22 | 2018-11-27 | 广州多益网络股份有限公司 | Modification method, device and the storage medium of data label |
CN109300310A (en) * | 2018-11-26 | 2019-02-01 | 平安科技(深圳)有限公司 | A kind of vehicle flowrate prediction technique and device |
CN110378352A (en) * | 2019-07-11 | 2019-10-25 | 河海大学 | The anti-interference two-dimensional filtering navigation data denoising method of high-precision in complicated underwater environment |
WO2021035412A1 (en) * | 2019-08-23 | 2021-03-04 | 华为技术有限公司 | Automatic machine learning (automl) system, method and device |
CN112465124A (en) * | 2020-12-15 | 2021-03-09 | 武汉智能装备工业技术研究院有限公司 | Twin depth space-time neural network model acquisition/fault diagnosis method and device |
CN113287107A (en) * | 2019-01-15 | 2021-08-20 | 索尼集团公司 | Data processing device, data processing method, data processing program, terminal device, and data processing system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102567391A (en) * | 2010-12-20 | 2012-07-11 | 中国移动通信集团广东有限公司 | Method and device for building classification forecasting mixed model |
US20130086072A1 (en) * | 2011-10-03 | 2013-04-04 | Xerox Corporation | Method and system for extracting and classifying geolocation information utilizing electronic social media |
CN103744928A (en) * | 2013-12-30 | 2014-04-23 | 北京理工大学 | Network video classification method based on historical access records |
CN103854068A (en) * | 2013-12-06 | 2014-06-11 | 国家电网公司 | Method for forecasting residential quarter short-term loads |
CN104183135A (en) * | 2014-09-05 | 2014-12-03 | 广州市香港科大霍英东研究院 | Estimation method and system of vehicle traveling overhead |
-
2015
- 2015-03-11 CN CN201510106455.6A patent/CN106033425A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102567391A (en) * | 2010-12-20 | 2012-07-11 | 中国移动通信集团广东有限公司 | Method and device for building classification forecasting mixed model |
US20130086072A1 (en) * | 2011-10-03 | 2013-04-04 | Xerox Corporation | Method and system for extracting and classifying geolocation information utilizing electronic social media |
CN103854068A (en) * | 2013-12-06 | 2014-06-11 | 国家电网公司 | Method for forecasting residential quarter short-term loads |
CN103744928A (en) * | 2013-12-30 | 2014-04-23 | 北京理工大学 | Network video classification method based on historical access records |
CN104183135A (en) * | 2014-09-05 | 2014-12-03 | 广州市香港科大霍英东研究院 | Estimation method and system of vehicle traveling overhead |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108073154A (en) * | 2016-11-11 | 2018-05-25 | 横河电机株式会社 | Information processing unit, information processing method and recording medium |
US11126150B2 (en) | 2016-11-11 | 2021-09-21 | Yokogawa Electric Corporation | Information processing device, information processing method, and storage medium |
CN107715273A (en) * | 2017-10-12 | 2018-02-23 | 西南大学 | Alarm clock implementing method, apparatus and system |
CN108897829A (en) * | 2018-06-22 | 2018-11-27 | 广州多益网络股份有限公司 | Modification method, device and the storage medium of data label |
CN108897829B (en) * | 2018-06-22 | 2020-08-04 | 广州多益网络股份有限公司 | Data label correction method, device and storage medium |
CN109300310A (en) * | 2018-11-26 | 2019-02-01 | 平安科技(深圳)有限公司 | A kind of vehicle flowrate prediction technique and device |
CN109300310B (en) * | 2018-11-26 | 2021-09-17 | 平安科技(深圳)有限公司 | Traffic flow prediction method and device |
CN113287107A (en) * | 2019-01-15 | 2021-08-20 | 索尼集团公司 | Data processing device, data processing method, data processing program, terminal device, and data processing system |
CN110378352A (en) * | 2019-07-11 | 2019-10-25 | 河海大学 | The anti-interference two-dimensional filtering navigation data denoising method of high-precision in complicated underwater environment |
WO2021035412A1 (en) * | 2019-08-23 | 2021-03-04 | 华为技术有限公司 | Automatic machine learning (automl) system, method and device |
CN112465124A (en) * | 2020-12-15 | 2021-03-09 | 武汉智能装备工业技术研究院有限公司 | Twin depth space-time neural network model acquisition/fault diagnosis method and device |
CN112465124B (en) * | 2020-12-15 | 2023-03-10 | 武汉智能装备工业技术研究院有限公司 | Twin depth space-time neural network model acquisition/fault diagnosis method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ma et al. | High performance graph convolutional networks with applications in testability analysis | |
CN106033425A (en) | A data processing device and a data processing method | |
US12034747B2 (en) | Unsupervised learning to simplify distributed systems management | |
US20200401939A1 (en) | Systems and methods for preparing data for use by machine learning algorithms | |
US11488055B2 (en) | Training corpus refinement and incremental updating | |
CN107430611B (en) | Filtering data lineage graph | |
US20230139783A1 (en) | Schema-adaptable data enrichment and retrieval | |
CN107251021B (en) | Filtering data lineage graph | |
CN110008259A (en) | The method and terminal device of visualized data analysis | |
CN114722746B (en) | Chip aided design method, device and equipment and readable medium | |
US11775610B2 (en) | Flexible imputation of missing data | |
CN110995459A (en) | Abnormal object identification method, device, medium and electronic equipment | |
CN109189876B (en) | Data processing method and device | |
CN109948680B (en) | Classification method and system for medical record data | |
CN104516879A (en) | Method and system for managing database containing record with missing value | |
KR101909420B1 (en) | Device and method for constructing monolithic application as microservice based unit | |
WO2015180340A1 (en) | Data mining method and device | |
CN111552509A (en) | Method and device for determining dependency relationship between interfaces | |
CN106919380A (en) | Programmed using the data flow of the computing device of the figure segmentation estimated based on vector | |
CN106326904A (en) | Device and method of acquiring feature ranking model and feature ranking method | |
CN112420125A (en) | Molecular attribute prediction method and device, intelligent equipment and terminal | |
CN114139636B (en) | Abnormal operation processing method and device | |
KR102039244B1 (en) | Data clustering method using firefly algorithm and the system thereof | |
US12079214B2 (en) | Estimating computational cost for database queries | |
KR101953479B1 (en) | Group search optimization data clustering method and system using the relative ratio of distance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20161019 |