CN110427351A - Active data modeling - Google Patents
Active data modeling Download PDFInfo
- Publication number
- CN110427351A CN110427351A CN201810395943.7A CN201810395943A CN110427351A CN 110427351 A CN110427351 A CN 110427351A CN 201810395943 A CN201810395943 A CN 201810395943A CN 110427351 A CN110427351 A CN 110427351A
- Authority
- CN
- China
- Prior art keywords
- model
- data
- variable
- object module
- subset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Biology (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Physiology (AREA)
- Biomedical Technology (AREA)
- Genetics & Genomics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
In embodiment of the disclosure, method, equipment and the computer program product of a kind of active data modeling for data sets are proposed.It for data-oriented collection, initiatively selects the first subset to generate at least using the first variable as the first model of independent variable, and initiatively selects second subset to generate at least using the second variable as the second model of independent variable.Then, the first model and the second model are merged, to generate the object module of the data constraint condition of designation date concentration, for being predicted based on data set.In embodiment of the disclosure, it initiatively selects multiple data subsets to generate multiple models for multiple independents variable, and merges multiple models to generate final object module.Therefore, embodiment of the disclosure can reduce the number of the independent variable in modeling process, to effectively improve modeling efficiency for data sets.
Description
Background technique
Data modeling refers to generates model based on data, the data object concentrated by analysis data-oriented, really
Relationship or constraint condition between these fixed data objects, then generate the model for being most suitable for data-oriented collection.Data modeling
Method include regression analysis, statistical analysis, machine learning, deep learning, gray prediction, principal component analysis, neural network with
And time series analysis, etc..
Regression analysis is used to find the relationship between dependent variable and independent variable as a kind of most common modeling method.
Regression analysis can be divided into simple regression and multiple regression analysis according to the number of related independent variable;According to dependent variable
How much, simple regression analysis and multiple regression analysis can be divided into;It, can be with according to the relationship type between independent variable and dependent variable
It is divided into linear regression analysis and nonlinear regression analysis.Symbolic Regression (symbolic regression) is a type of time
Return analysis, the model (such as function) of most suitable data-oriented collection, symbol are found by evolutionary search (such as genetic programming)
The target of recurrence is mode, constraint condition or the rule automatically found in data set.
Summary of the invention
In embodiment of the disclosure, propose method, the equipment of a kind of active data modeling for data sets with
And computer program product.For data-oriented collection, the first subset is selected initiatively to generate at least with the first variable for change certainly
First model of amount, and initiatively select second subset to generate at least using the second variable as the second model of independent variable.So
Afterwards, the first model and the second model are merged, to generate the object module of the data constraint condition of designation date concentration, with
For being predicted based on data set.In embodiment of the disclosure, it is multiple to be directed to initiatively to select multiple data subsets
Independent variable generates multiple models, and generates final object module by merging multiple models.Therefore, the implementation of the disclosure
Example can reduce the number of the independent variable in modeling process, to effectively improve modeling efficiency for data sets.
There is provided Summary is their below specific in order to introduce the selection to concept in simplified form
It will be further described in embodiment.The Summary is not intended to identify the key feature or main feature of the disclosure,
It is not intended to limit the scope of the present disclosure.
Detailed description of the invention
It refers to the following detailed description in conjunction with the accompanying drawings, the above and other feature, advantage and aspect of each embodiment of the disclosure
It will be apparent.In the accompanying drawings, the same or similar appended drawing reference indicates the same or similar element, in which:
Fig. 1 is shown in which that the block diagram of the calculating device/server of one or more other embodiments of the present disclosure can be implemented;
Fig. 2 shows the flow charts of the method according to an embodiment of the present disclosure for active data modeling;
Fig. 3 A shows according to an embodiment of the present disclosure for generating the flow chart of the method for the first model;
Fig. 3 B shows according to an embodiment of the present disclosure for generating the flow chart of the method for the second model;
Fig. 3 C shows the flow chart according to an embodiment of the present disclosure for matching by tree and generating the method for object module;
Fig. 4 A shows the schematic diagram of uniformly accelrated rectilinear motion according to an embodiment of the present disclosure;
Fig. 4 B shows the schematic diagram of data set related with uniformly accelrated rectilinear motion according to an embodiment of the present disclosure;
Fig. 4 C shows the schematic diagram of the data subset in data set shown in Fig. 4 B;
Fig. 4 D shows according to an embodiment of the present disclosure for generating the schematic diagram for the tree for indicating each model;
Fig. 4 E shows the schematic diagram of the goal tree generated and matching each tree in Fig. 4 D;And
Fig. 5 shows active modeling method the showing compared with the experimental result of deep learning method according to the disclosure
It is intended to.
Specific embodiment
Embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the certain of the disclosure in attached drawing
Embodiment, it should be understood that, the disclosure can be realized by various forms, and should not be construed as being limited to this
In the embodiment that illustrates, providing these embodiments on the contrary is in order to more thorough and be fully understood by the disclosure.It should be understood that
It is that being given for example only property of the accompanying drawings and embodiments effect of the disclosure is not intended to limit the protection scope of the disclosure.
Terms used herein " comprising " and its deformation are that opening includes, i.e., " including but not limited to ".Term "based"
It is " being based at least partially on ".Term " one embodiment " expression " at least one embodiment ";Term " another embodiment " indicates
" at least one other embodiment ";Term " some embodiments " expression " at least some embodiments ".The correlation of other terms is fixed
Justice provides in will be described below.
Traditionally, for data-oriented collection, a model is generated by evolutionary search, it is made to be most suitable for data-oriented
Collection.In general, being directed to single independent variable, it is easy to obtain object module by simple regression analysis.However, may in data set
In the presence of many independents variable, and the solution of multiple linear regression is usually more complex, needs to solve using such as least square method
Multiple parameters, in some instances it may even be possible to the accurate target model for meeting data-oriented collection can not be found out.For a kind of improvement of conventional method
It is to carry out Function Fitting using deep learning method, is learnt and trained using complicated neural network structure.However, holding
Row deep learning needs to take a substantial amount of time cost and computing resource, and obtained model may be also not accurate enough.Therefore,
The data in data set are only passively used due to traditional modeling method, thus the efficiency of its data modeling is lower.
For this purpose, embodiment of the disclosure proposes a kind of active data modeling method for data sets.In the disclosure
Embodiment in, initiatively select multiple data subsets to generate multiple models for multiple independents variable, and pass through merging
Multiple models generate final object module.Therefore, embodiment of the disclosure can reduce the independent variable in modeling process
Number, to effectively improve modeling efficiency for data sets.
Illustrate below with reference to Fig. 1 to Fig. 5 the disclosure basic principle and several sample implementations.Fig. 1 shows it
In can be implemented one or more other embodiments of the present disclosure calculating device/server 100 block diagram.It should be appreciated that shown in Fig. 1
Calculating device/server 100 out is only exemplary, without should constitute to the function of embodiment described herein and
Any restrictions of range.
As shown in Figure 1, calculating the form that device/server 100 is universal computing device.Calculate device/server 100
Component can include but is not limited to one or more processors or processing unit 110, memory 120, storage equipment 130, one
Or multiple communication units 140, one or more input equipments 150 and one or more output equipments 160.Processing unit 110
It can be reality or virtual processor and can persistently execute various processing according to what is stored in memory 120.In many places
Manage device system in, multiple processing unit for parallel execution computer executable instructions, with improve calculate device/server 100 and
Row processing capacity.
It calculates device/server 100 and generally includes multiple computer storage mediums.Such medium can be calculating and set
It is the addressable any medium that can be obtained of standby/server 100, including but not limited to volatile and non-volatile media, removable
It unloads and non-dismountable medium.Memory 120 can be volatile memory (such as register, cache, random access storage
Device (RAM)), nonvolatile memory is (for example, read-only memory (ROM), electrically erasable programmable read-only memory
(EEPROM), flash memory) or they certain combination.Storage equipment 130 can be detachable or non-removable medium, and can
To include machine readable media, such as flash drive, disk or any other medium can be used in storing information
And/or data (such as data set 180) and can calculate device/server 100 in be accessed.
Calculating device/server 100 may further include other detachable/non-dismountable, volatile, nonvolatile
Storage medium.Although not shown in FIG. 1, can provide for being carried out from detachable, non-volatile magnetic disk (such as " floppy disk ")
Reading or the disk drive being written and the disc drives for being read out or being written from detachable, anonvolatile optical disk.At this
In a little situations, each driving can be connected to bus (not shown) by one or more data media interfaces.Memory 120 can
To include Modeling engine 125, there are one or more program module collections, these program modules are configured as executing this paper institute
The method or function of the various embodiments of description.
Communication unit 140 realizes that calculating equipment with other by communication media is communicated.Additionally, equipment/clothes are calculated
The function of component of device 100 of being engaged in can realize that these computing machines can lead to single computing cluster or multiple computing machines
It crosses and is communicated.Therefore, calculating device/server 100 can be used and one or more other servers, network
The logical connection of personal computer (PC) or another network node is operated in networked environment.
Input equipment 150 can be one or more various input equipments, such as mouse, keyboard, trackball etc..Output is set
Standby 160 can be one or more output equipments, such as display, loudspeaker, printer etc..Calculate device/server 100 also
It can according to need and communicated by communication unit 140 with one or more external equipment (not shown), external equipment is such as
Equipment, display equipment etc. are stored, with one or more user is led to the equipment that device/server 100 interacts is calculated
Letter, or with make any equipment for calculating device/server 100 and other one or more computing device communications (for example, net
Card, modem etc.) it is communicated.Such communication can be executed via input/output (I/O) interface (not shown).
As shown in Figure 1, being stored with data set 180 in storage equipment 130 comprising be related to the mass data of multiple variables.
According to the embodiment of main body described herein, Modeling engine 125 initiatively selects multiple subsets in data set 180 to generate
Multiple models, and multiple models are merged, to establish the model for being suitable for data set 180.It is detailed below with reference to Fig. 2-5
Carefully describe the example embodiment that Modeling engine 125 generates model based on data set 180.
Fig. 2 shows the flow charts of the method 200 according to an embodiment of the present disclosure for active data modeling.It should
Understand, method 200 can be executed by the calculating device/server 100 with reference to described in Fig. 1.
202, based on the first subset in data set, generate at least using the first variable as the first model of independent variable.Number
It may include various types of data according to the set of set representations data, such as traffic data, medical data, finance data, work
Industry data etc..As an example, in the scene that object does uniformly accelrated rectilinear motion, data set may include and object itself
Or the related data of object of which movement, data set can be related to multiple variables, external force F that quality m, object including object are subject to,
The initial velocity v of object0And the run duration t of object, it hereafter reference will also be made to the example of Fig. 4 B descriptor data set 180.
In some embodiments, the first subset may include meeting a part of data of pre-defined rule in data set.For example,
First subset may include multi-group data, and only one variable (such as the first variable) or only a part become in every group of data
The value of amount changes, and the value of its dependent variable is fixed.Continue in the scene that object does uniformly accelrated rectilinear motion, first
Every group of data in subset can change (its dependent variable in every group of data beginning speed v as before for only time t0It keeps not
Become) one group of data, describe an example of data subset 410 below with reference to Fig. 4 B.
In accordance with an embodiment of the present disclosure, the first model indicates the first constraint condition that the data in the first subset meet.Example
Such as, in the scene that object does uniformly accelrated rectilinear motion, the first subset of the multi-group data based on only time t variation is generated
The first model can be by the constraint condition (such as functional relation) that meets between displacement d and the time t of object.Therefore,
One model can be indicated in other independents variable (such as initial velocity v0Deng) remain unchanged in the case where, dependent variable be displaced d with from
Functional relation between variant time t.
204, based on the second subset in data set, generate at least using the second variable as the second model of independent variable.Example
Such as, also may include multi-group data in second subset, and in every group of data also only one variable (such as the second variable) or
The value of only a part variable changes, and the value of its dependent variable is fixed.Second model indicates the data in second subset
The second constraint condition met.
For example, in the scene that object does uniformly accelrated rectilinear motion, based on only initial velocity v0The multi-group data of variation
Second subset, the second model generated can be displacement d and initial velocity v0Between constraint condition (such as the function that is met
Relationship).Therefore, the second model can be indicated in the case where other independents variable (time t etc.) remain unchanged, dependent variable position
Move d and independent variable initial velocity v0Between functional relation.It will be appreciated by those skilled in the art that frame 202 and 204 can sequence
It executes, can also be executed in parallel.
206, generate that the data of designation date concentration meet by least merging the first model and the second model the
The object module of three constraint conditions, wherein object module is used to be predicted based on data set.First subset and second subset are made
For a part in data set, the constraint condition in each subset meets the constraint condition of entire data set.In some implementations
In example, by generating corresponding model for each variable or every group of variable, and multiple models generated are passed through into mode
Match or set alignment to merge, so as to generate the object module for being directed to entire data set, simulated target can be by table
It is shown as function, formula etc..Object module generated can be used for predicting, such as based on giving birth to the traffic data collection collected
At object module, can predict the following traffic condition sometime.
For example, the first mould met between d and time t will be displaced in the scene that object does uniformly accelrated rectilinear motion
Type (such as first function relationship) and displacement d and initial velocity v0Between met the second model (such as second function close
System) etc. be combined, displacement d and time t and initial velocity v can be generated0Object module (such as target met Deng between
Functional relation).Merge the example reality that multiple models 455 generate object module 495 for example, describing below with reference to Fig. 4 D and 4E
It is existing.
Method 200 according to an embodiment of the present disclosure, selects multiple data subsets initiatively to generate multiple models respectively,
And multiple models are merged and generate object module for data sets.Therefore, the method 200 of embodiment of the disclosure
Traditional passive type models fitting is replaced using active data modeling, is realized and is actively looked for data to generate model.
The example implementation of method 200 in Fig. 2 is described below in conjunction with Fig. 3 A- Fig. 3 C.
Fig. 3 A shows according to an embodiment of the present disclosure for generating the flow chart of the method 300 of the first model.Method
300 can be executed by the calculating device/server 100 with reference to described in Fig. 1, it is the example implementation of the frame 202 in Fig. 2.
For convenience's sake, one is described by taking the related data set of uniformly accelrated rectilinear motion as an example below with reference to Fig. 4 A-4E and Fig. 5
A little example embodiments.
302, based on first group of data in the first subset in data set, generate at least using the first variable as independent variable
The first submodel.304, based on second group of data in the first subset, at least using the first variable as independent variable is generated
Two submodels.Bivariate value in first group of data is the first value, and the bivariate value in second group of data
For second value, and the first value is different from second value.
In some embodiments, in first group of data, only the value of the first variable changes, thus generated first
Submodel thereby guarantees that each submodel is easily fast and accurately generated only using the first variable as independent variable.It should
Understand, first group of data not only can be the changed one group of data of only the first variable selected from data set, can also be with
The value of its dependent variable other than the first variable is concentrated actively to be acquired by fixed data, thereby guaranteeing that can obtain
The only changed one group of data of the first variable.It alternatively, can also be with multiple variables (non-data concentration in first group of data
Whole variables) value change, thus the first submodel generated is only with multiple variables (such as the first variable and third
Variable) it is independent variable.It should be appreciated that in the case where the first submodel is related to multiple independents variable, due in first group of data
Variables number is reduced, thus still is able to improve the efficiency of data modeling.
For example, the schematic diagram 400 of the uniformly accelrated rectilinear motion with reference to shown in Fig. 4 A is described, it should be understood that Fig. 4 A-4E
Just for the sake of facilitating some examples for understanding embodiment of the disclosure, and it is not construed as limiting the scope of the present disclosure.
As shown, object 405 is carrying out even acceleration straight-line travelling on road 402, object 405 is shown in Fig. 4 A in difference
The displaced position at moment.For example, being displaced 2 meters in the 1st second moment object 405,8 meters have been displaced in the 2nd second moment object 405
Deng.In uniformly accelrated rectilinear motion, the displacement of object 405 is associated with Multiple factors.
Fig. 4 B shows the data of the Multiple factors of acquisition uniformly accelrated rectilinear motion and forms data set 180.Such as Fig. 4 B institute
Show, in data set 180, the initial velocity v including object0, external force F that object is subject to, object quality m, run duration t with
And the value of displacement d under different conditions, wherein time t can be referred to as the first variable, initial velocity v0It can be referred to as
Second variable, external force F can be referred to as third variable, and quality m can be referred to as the 4th variable.For example, the data set in Fig. 4 B
The physical meaning of the first data in 180 can be with are as follows: as initial velocity v0It is 0.5 newton, quality m for 0.1 meter per second, external force F
For 1.0 kilograms, time t be 1.0 seconds when, the corresponding displacement d of object is 0.35 meter.
Different from passively carrying out regression analysis or deep learning, embodiment of the disclosure using entire data set 180
Initiatively select data subset (i.e. a part of data) to establish the model of each data subset using active modeling method,
And each model is combined to generate the model for being suitable for data set 180.
For example, Fig. 4 C shows the first subset 410 in the data set 180 in Fig. 4 B comprising first group of 411 He of data
Second group of data 412.In first group of data 411, only the first variable t changes in multiple independents variable, and its dependent variable
Value remains unchanged, wherein the second variable v0Value is 0.5.It is also only the first change in second group of data 411, in multiple independents variable
Amount t changes, and the value of its dependent variable remains unchanged, wherein the second variable v0Value is 1.0.
Due to only existing an independent variable t and a dependent variable d in first group of data 411, and its dependent variable v0, F, m
Value remains unchanged (wherein v0=0.5, F=1.5, m=2.0), by simple unitary Symbolic Regression, it can determine first group of number
The first submodel met according to 411 is following formula (1):
ft(t)=0.5t+0.375t2 (1)
In general, during generating submodel by unitary Symbolic Regression, the model for selecting degree of fitting best, if more
The degree of fitting of a candidate's submodel is consistent, then selects the shortest formula of length as submodel.Further, since second group of data
Also an independent variable t and a dependent variable d are only existed in 412, and the value v of its dependent variable0, F, m remain unchanged (wherein v0=
1.0, F=1.5, m=2.0), by unitary Symbolic Regression, the second submodel that can determine that second group of data 412 is met is
Following formula (2):
ft(t)=1.0t+0.375t2 (2)
Fig. 3 A is returned, 306, the first submodel and the second submodel is based on, generates the first model.For example, showing in Fig. 4 C
In data subset 410 out, based on the model (such as formula (1)) generated according to first group of data 411 and according to second group of data
412 models (such as formula (2)) generated, can determine that the data in data subset 410 meet following formula (3) namely the first mould
Type.That is, the first mould for the first variable can be obtained by selecting the more wheels of multi-group data operation in the first subset
Type (such as function).
ft(t)=X0t+X1t2 (3)
Wherein, ft(t) it indicates using time t as the displacement function of independent variable, X0And X1For unknown parameter.
Fig. 3 B shows according to an embodiment of the present disclosure for generating the flow chart of the method 350 of the second model.It should
Understand, method 350 can be executed by the calculating device/server 100 with reference to described in Fig. 1, it is the frame 204 in Fig. 2
Example implementation.
308, based on the third group data in the second subset in data set, generate at least using the second variable as independent variable
Third submodel.310, based on the 4th group of data in second subset, at least using the second variable as independent variable is generated
Four submodels.312, it is based on third submodel and the 4th submodel, generates the second model.First is generated with movement 302-306
Model is similar, can initiatively select the second subset of only the second variable change, and generates second subset is met second
Model.For example, generating in the data set 180 described in Fig. 4 B with initial velocity v0It is all as follows for the second model of independent variable
Formula (4):
Wherein,It indicates with initial velocity v0For the displacement function of independent variable, X2And X3For unknown parameter.
Optionally, in some embodiments, other than generating the first model and the second model, it is also based on data set
In third subset generate at least using third variable as the third model of independent variable.For example, the data set described in Fig. 4 B
In 180, the third model using external force F as independent variable, such as following formula (5) can be generated:
fF(F)=X4+X5F (5)
Wherein, fF(F) it indicates using external force F as the displacement function of independent variable, X4And X5For unknown parameter.
In some embodiments, the 4th subset being also based in data set is at least from change with the 4th variable to generate
4th model of amount.For example, the 4th mould using quality m as independent variable can be generated in the data set 180 described in Fig. 4 B
Type, such as following formula (6):
Wherein, fm(m) it indicates using quality m as the displacement function of independent variable, X6And X7For unknown parameter.
Fig. 3 C shows the process according to an embodiment of the present disclosure for matching by tree and generating the method 390 of object module
Figure.It should be appreciated that method 390 can be executed by the calculating device/server 100 with reference to described in Fig. 1, it is in Fig. 2
The example implementation of frame 206.
314, generates the first tree for indicating the first model and indicate the second tree of the second model.For example, Fig. 4 D is shown
It is according to an embodiment of the present disclosure for generating the schematic diagram 450 of the tree that indicates each model, in each multiple models of generation
After 455 (such as formula (3)-(6)), corresponding tree construction can be generated respectively for each model, such as indicate the first submodule
The tree 460 of type, the tree 470 for indicating the second submodel, the tree 480 for indicating third submodel and the tree for indicating the 4th submodel
490.It is matched by using tree construction, can quickly and efficiently merge multiple models.
316, object module is generated by least matching the first tree and the second tree.For example, first can be at least based on
Model and the second model generate the model template including unknown parameter.Fig. 4 E is shown and matching each tree in Fig. 4 D
The goal tree 499 of generation is obtained and the tree (such as setting 460,470,480,490) of each model is carried out subgraph match
.Next, the target template 495 including unknown parameter can be generated based on goal tree 499, as shown in following formula (7):
Wherein, f (t, v0, F, m) and it indicates with time t, initial velocity v0, displacement letter that external force F and quality m are independent variable
Number, Y0And Y1For unknown parameter.
It include unknown parameter Y obtaining0And Y1Model template (such as formula (7)) after, can be used in data set 180
Data solve unknown parameter Y0And Y1, in this illustration, Y0Value can be determined that 1, and Y1Value can be determined
It is 1/2.Therefore, unknown parameter Y is being determined0And Y1Value after, can determine the final target mould that data set 180 is met
Type is formula (8):
Formula (8) is actually the displacement calculation formula of uniformly accelrated rectilinear motion, therefore, based on the active in data set 180
Formula modeling method can quickly and efficiently find the rule of the data in data set, and generate most suitable object module.
In the merging process of each model, multiple candidate families may be generated, wherein each candidate family is at least
Using the first variable and the second variable as independent variable.In some embodiments, data set can be used to assess each candidate family
With the appropriateness of determination each model and data set, and final object module is selected based on assessment.In this way,
In the case where model can be merged into multiple object modules, accuracy can be selected highest as mesh based on data set
Mark model.
Fig. 5 shows active modeling method the showing compared with the experimental result of deep learning method according to the disclosure
It is intended to 500.Show that accuracy in jumping characteristic improves according to the experimental result 510 of the active modeling method of the disclosure, and
The 21st second at 511 points error be 0, that is, have found most accurate object module.It is compared with this, in use ratio CPU more multicore
GPU optimization machine learning algorithm in the case where, the experimental result 520 of deep learning method shows that accuracy is smoothly mentioning
Height, and the 100th second point 521 at error be 0.18, the 181st second point 522 at error be 0.09.It can be seen that with
Deep learning method is compared, the active modeling method of embodiment of the disclosure not only modeling speed faster, but also accuracy
It is higher.
Method and function described herein can be executed at least partly by one or more hardware logic components.
Such as, but not limited to, the exemplary types for the hardware logic component that can be used include field programmable gate array (FPGA), specially
With integrated circuit (ASIC), Application Specific Standard Product (ASSP), system on chip (SOC), Complex Programmable Logic Devices (CPLD) etc..
For implement disclosed method program code can using any combination of one or more programming languages come
It writes.These program codes can be supplied to the place of general purpose computer, special purpose computer or other programmable data processing units
Device or controller are managed, so that program code makes defined in flowchart and or block diagram when by processor or controller execution
Function/operation is carried out.Program code can be executed completely on machine, partly be executed on machine, as stand alone software
Is executed on machine and partly execute or executed on remote machine or server completely on the remote machine to packet portion.
In the context of present disclosure, machine readable media can be tangible medium, may include or stores
The program for using or being used in combination with instruction execution system, device or equipment for instruction execution system, device or equipment.Machine
Device readable medium can be machine-readable signal medium or machine-readable storage medium.Machine readable media may include but unlimited
In times of electronics, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or above content
What appropriate combination.The more specific example of machine readable storage medium will include the electrical connection of line based on one or more, portable
Formula computer disks, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM
(EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage are set
Standby or above content any appropriate combination.
Although this should be understood as requiring operating in this way with shown in addition, depicting each operation using certain order
Certain order out executes in sequential order, or requires the operation of all diagrams that should be performed to obtain desired result.
Under certain environment, multitask and parallel processing be may be advantageous.Similarly, although containing several tools in being discussed above
Body realizes details, but these are not construed as the limitation to the scope of the present disclosure.In the context individually realized
Certain features of description can also be realized in combination in single realize.On the contrary, described in the context individually realized
Various features can also be realized individually or in any suitable subcombination in multiple realizations.
It is listed below some sample implementations of the disclosure.
In one aspect, a method of computer implementation is provided.This method comprises: based on the first son in data set
Collection generates at least using the first variable as the first model of independent variable, and the first model indicates that the data in the first subset meet
One constraint condition;Based on the second subset in data set, generate at least using the second variable as the second model of independent variable, the second mould
Type indicates the second constraint condition that the data in second subset meet, and the first variable and the second variable are in data set
Variable;And the third constraint that the data of designation date concentration meet is generated by least merging the first model and the second model
The object module of condition, object module are used to be predicted based on data set.
In some embodiments, wherein generating the first model includes: to be generated extremely based on first group of data in the first subset
Less using the first variable as the first submodel of independent variable, the bivariate value in first group of data is the first value;Based on
Second group of data in one subset generate at least using the first variable as the second submodel of independent variable, in second group of data
Bivariate value is second value, and the first value is different from second value;And it is based on the first submodel and the second submodel, it is raw
At the first model.
In some embodiments, wherein only the value of the first variable changes in first group of data, and first is generated
Submodel includes: to be generated based on first group of data only using the first variable as the first submodel of independent variable.
In some embodiments, this method further include: other changes other than the first variable are concentrated by fixed data
The value of amount acquires first group of data.
In some embodiments, wherein generating object module includes: to generate the first tree for indicating the first model and indicate
Second tree of the second model;And object module is generated by the first tree of matching and the second tree.
In some embodiments, wherein generating object module includes: based on the first model and the second model, generating includes not
Know the model template of parameter;Carry out the unknown parameter in solving model template using data set;And based on model template and unknown
Parameter determines object module.
In some embodiments, wherein generating object module includes: to be generated by merging the first model and the second model
The first candidate family and the second candidate family for data sets, the first candidate family and the second candidate family are at least with first
Variable and the second variable are independent variable;The first candidate family and the second candidate family are assessed using data set;And based on pair
The assessment of first candidate family and the second candidate family, determines object module.
In some embodiments, this method further include: based on the third subset in data set, generate at least with third variable
For the third model of variable, third model indicates the 4th constraint condition that the data in third subset meet, and generates target
Model includes: to generate object module by merging the first model, the second model and third model.
On the other hand, a kind of electronic equipment is provided.The equipment includes: processing unit and memory, memory
It is coupled to processing unit and is stored with instruction, instruction executes following movement when being executed by processing unit: based in data set
The first subset, generate at least using the first variable as the first model of independent variable, the first model indicate the first subset in data
The first constraint condition met;Based on the second subset in data set, generate at least using the second variable as the second mould of independent variable
Type, the second model indicates the second constraint condition that the data in second subset meet, and the first variable and the second variable are
Variable in data set;And the data satisfaction of designation date concentration is generated by least merging the first model and the second model
Third constraint condition object module, object module is used to be predicted based on data set.
In some embodiments, wherein generating the first model includes: to be generated extremely based on first group of data in the first subset
Less using the first variable as the first submodel of independent variable, the bivariate value in first group of data is the first value;Based on
Second group of data in one subset generate at least using the first variable as the second submodel of independent variable, in second group of data
Bivariate value is second value, and the first value is different from second value;And it is based on the first submodel and the second submodel, it is raw
At the first model.
In some embodiments, wherein only the value of the first variable changes in first group of data, and first is generated
Submodel includes: to be generated based on first group of data only using the first variable as the first submodel of independent variable.
In some embodiments, it acts further include: its dependent variable other than the first variable is concentrated by fixed data
Value acquire first group of data.
In some embodiments, wherein generating object module includes: to generate the first tree for indicating the first model and indicate
Second tree of the second model;And object module is generated by the first tree of matching and the second tree.
In some embodiments, wherein generating object module includes: based on the first model and the second model, generating includes not
Know the model template of parameter;Carry out the unknown parameter in solving model template using data set;And based on model template and unknown
Parameter determines object module.
In some embodiments, wherein generating object module includes: to be generated by merging the first model and the second model
The first candidate family and the second candidate family for data sets, the first candidate family and the second candidate family are at least with first
Variable and the second variable are independent variable;The first candidate family and the second candidate family are assessed using data set;And based on pair
The assessment of first candidate family and the second candidate family, determines object module.
In some embodiments, it acts further include: based on the third subset in data set, generation is at least with third variable
The third model of variable, third model indicates the 4th constraint condition that the data in third subset meet, and generates target mould
Type includes: to generate object module by merging the first model, the second model and third model.
In yet another aspect, a kind of computer program product is provided, computer program product is stored in non-transient meter
In calculation machine storage medium and including machine-executable instruction, equipment is made when machine-executable instruction is run in a device: base
The first subset in data set generates at least using the first variable as the first model of independent variable, the first son of the first model instruction
The first constraint condition that the data of concentration meet;Based on the second subset in data set, generating at least is from change with the second variable
Second model of amount, the second model indicate the second constraint condition that the data in second subset meet, and the first variable and the
Two variables are the variable in data set;And designation date concentration is generated by least merging the first model and the second model
The object module of third constraint condition that meets of data, object module is used to be predicted based on data set.
In some embodiments, wherein generating the first model includes: to be generated extremely based on first group of data in the first subset
Less using the first variable as the first submodel of independent variable, the bivariate value in first group of data is the first value;Based on
Second group of data in one subset generate at least using the first variable as the second submodel of independent variable, in second group of data
Bivariate value is second value, and the first value is different from second value;And it is based on the first submodel and the second submodel, it is raw
At the first model.
In some embodiments, wherein only the value of the first variable changes in first group of data, and first is generated
Submodel includes: to be generated based on first group of data only using the first variable as the first submodel of independent variable.
In some embodiments, equipment is also made when machine-executable instruction is run in a device: passing through fixed data set
In the value of its dependent variable other than the first variable acquire first group of data.
In some embodiments, wherein generating object module includes: to generate the first tree for indicating the first model and indicate
Second tree of the second model;And object module is generated by the first tree of matching and the second tree.
In some embodiments, wherein generating object module includes: based on the first model and the second model, generating includes not
Know the model template of parameter;Carry out the unknown parameter in solving model template using data set;And based on model template and unknown
Parameter determines object module.
In some embodiments, wherein generating object module includes: to be generated by merging the first model and the second model
The first candidate family and the second candidate family for data sets, the first candidate family and the second candidate family are at least with first
Variable and the second variable are independent variable;The first candidate family and the second candidate family are assessed using data set;And based on pair
The assessment of first candidate family and the second candidate family, determines object module.
In some embodiments, make equipment when machine-executable instruction is run in a device: based in data set
Three subsets generate at least using third variable as the third model of variable, and third model indicates what the data in third subset met
4th constraint condition, and generating object module includes: to be generated by merging the first model, the second model and third model
Object module.
Although having used specific to the language description of the structure feature and/or method logical action disclosure, answer
When understanding that theme defined in the appended claims is not necessarily limited to special characteristic described above or movement.On on the contrary,
Special characteristic described in face and movement are only to realize the exemplary forms of claims.
Claims (20)
1. a method of computer implementation, comprising:
Based on the first subset in data set, generate at least using the first variable as the first model of independent variable, first model
Indicate the first constraint condition that the data in first subset meet;
The second subset concentrated based on the data generates at least using the second variable as the second model of independent variable, described second
Model indicates the second constraint condition that the data in the second subset meet, and first variable and second variable
It is the variable in the data set;And
It is generated by least merging first model and second model and indicates what the data in the data set met
The object module of third constraint condition, the object module are predicted for collection based on the data.
2. according to the method described in claim 1, wherein generating first model and including:
Based on first group of data in first subset, generate at least using first variable as the first submodule of independent variable
Type, the bivariate value in first group of data are the first value;
Based on second group of data in first subset, generate at least using first variable as the second submodule of independent variable
Type, the bivariate value in second group of data is second value, and first value and the second value are not
Together;And
Based on first submodel and second submodel, first model is generated.
3. according to the method described in claim 2, only the value of first variable becomes wherein in first group of data
Change, and generates first submodel and include:
Based on first group of data, generate only using first variable as first submodel of independent variable.
4. according to the method described in claim 3, further include:
Described first group is acquired by the value of its dependent variable in the fixation data set other than first variable
Data.
5. according to the method described in claim 1, wherein generating the object module and including:
Generate the first tree for indicating first model and the second tree for indicating second model;And
The object module is generated by matching first tree and second tree.
6. according to the method described in claim 1, wherein generating the object module and including:
Based on first model and second model, the model template including unknown parameter is generated;
The unknown parameter in the model template is solved using the data set;And
Based on the model template and the unknown parameter, the object module is determined.
7. according to the method described in claim 1, wherein generating the object module and including:
The first candidate family for the data set and the are generated by merging first model and second model
Two candidate families, first candidate family and second candidate family are at least become with first variable and described second
Amount is independent variable;
First candidate family and second candidate family are assessed using the data set;And
Based on the assessment to the first candidate family and second candidate family, the object module is determined.
8. according to the method described in claim 1, further include:
The third subset concentrated based on the data generates at least using third variable as the third model of variable, the third mould
Type indicates the 4th constraint condition that the data in the third subset meet,
And generating the object module includes: by merging first model, second model and the third mould
Type generates the object module.
9. a kind of electronic equipment, comprising:
Processing unit;And
Memory is coupled to the processing unit and is stored with instruction, and described instruction is held when being executed by the processing unit
The following movement of row:
Based on the first subset in data set, generate at least using the first variable as the first model of independent variable, first model
Indicate the first constraint condition that the data in first subset meet;
The second subset concentrated based on the data generates at least using the second variable as the second model of independent variable, described second
Model indicates the second constraint condition that the data in the second subset meet, and first variable and second variable
It is the variable in the data set;And
It is generated by least merging first model and second model and indicates what the data in the data set met
The object module of third constraint condition, the object module are predicted for collection based on the data.
10. equipment according to claim 9, wherein generating first model and including:
Based on first group of data in first subset, generate at least using first variable as the first submodule of independent variable
Type, the bivariate value in first group of data are the first value;
Based on second group of data in first subset, generate at least using first variable as the second submodule of independent variable
Type, the bivariate value in second group of data is second value, and first value and the second value are not
Together;And
Based on first submodel and second submodel, first model is generated.
11. equipment according to claim 10, wherein only the value of first variable occurs in first group of data
Variation, and generate first submodel and include:
Based on first group of data, generate only using first variable as first submodel of independent variable.
12. equipment according to claim 11, the movement further include:
Described first group is acquired by the value of its dependent variable in the fixation data set other than first variable
Data.
13. equipment according to claim 9, wherein generating the object module and including:
Generate the first tree for indicating first model and the second tree for indicating second model;And
The object module is generated by matching first tree and second tree.
14. equipment according to claim 9, wherein generating the object module and including:
Based on first model and second model, the model template including unknown parameter is generated;
The unknown parameter in the model template is solved using the data set;And
Based on the model template and the unknown parameter, the object module is determined.
15. equipment according to claim 9, wherein generating the object module and including:
The first candidate family for the data set and the are generated by merging first model and second model
Two candidate families, first candidate family and second candidate family are at least become with first variable and described second
Amount is independent variable;
First candidate family and second candidate family are assessed using the data set;And
Based on the assessment to the first candidate family and second candidate family, the object module is determined.
16. equipment according to claim 9, the movement further include:
The third subset concentrated based on the data generates at least using third variable as the third model of variable, the third mould
Type indicates the 4th constraint condition that the data in the third subset meet,
And generating the object module includes: by merging first model, second model and the third mould
Type generates the object module.
17. a kind of computer program product, the computer program product is stored in non-transitory, computer storage medium simultaneously
And including machine-executable instruction, the machine-executable instruction makes the equipment when running in a device:
Based on the first subset in data set, generate at least using the first variable as the first model of independent variable, first model
Indicate the first constraint condition that the data in first subset meet;
The second subset concentrated based on the data generates at least using the second variable as the second model of independent variable, described second
Model indicates the second constraint condition that the data in the second subset meet, and first variable and second variable
It is the variable in the data set;And
It is generated by least merging first model and second model and indicates what the data in the data set met
The object module of third constraint condition, the object module are predicted for collection based on the data.
18. computer program product according to claim 17, wherein generating first model and including:
Based on first group of data in first subset, generate at least using first variable as the first submodule of independent variable
Type, the bivariate value in first group of data are the first value;
Based on second group of data in first subset, generate at least using first variable as the second submodule of independent variable
Type, the bivariate value in second group of data is second value, and first value and the second value are not
Together;And
Based on first submodel and second submodel, first model is generated.
19. computer program product according to claim 17, wherein generating the object module and including:
Generate the first tree for indicating first model and the second tree for indicating second model;And
The object module is generated by matching first tree and second tree.
20. computer program product according to claim 17, wherein generating the object module and including:
Based on first model and second model, the model template including unknown parameter is generated;
The unknown parameter in the model template is solved using the data set;And
Based on the model template and the unknown parameter, the object module is determined.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810395943.7A CN110427351A (en) | 2018-04-27 | 2018-04-27 | Active data modeling |
PCT/US2019/027570 WO2019209571A1 (en) | 2018-04-27 | 2019-04-16 | Proactive data modeling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810395943.7A CN110427351A (en) | 2018-04-27 | 2018-04-27 | Active data modeling |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110427351A true CN110427351A (en) | 2019-11-08 |
Family
ID=66323996
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810395943.7A Pending CN110427351A (en) | 2018-04-27 | 2018-04-27 | Active data modeling |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110427351A (en) |
WO (1) | WO2019209571A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220027775A1 (en) * | 2020-07-21 | 2022-01-27 | International Business Machines Corporation | Symbolic model discovery based on a combination of numerical learning methods and reasoning |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090018982A1 (en) * | 2007-07-13 | 2009-01-15 | Is Technologies, Llc | Segmented modeling of large data sets |
-
2018
- 2018-04-27 CN CN201810395943.7A patent/CN110427351A/en active Pending
-
2019
- 2019-04-16 WO PCT/US2019/027570 patent/WO2019209571A1/en active Application Filing
Also Published As
Publication number | Publication date |
---|---|
WO2019209571A1 (en) | 2019-10-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10032114B2 (en) | Predicting application performance on hardware accelerators | |
Sahu et al. | Predicting software bugs of newly and large datasets through a unified neuro-fuzzy approach: Reliability perspective | |
Fazanaro et al. | Numerical characterization of nonlinear dynamical systems using parallel computing: The role of GPUs approach | |
Fan et al. | Sketch-based fast and accurate querying of time series using parameter-sharing LSTM networks | |
Zou et al. | Correcting model misspecification in physics-informed neural networks (PINNs) | |
Mochurad | Optimization of Regression Analysis by Conducting Parallel Calculations. | |
Prats et al. | Automatic generation of workload profiles using unsupervised learning pipelines | |
Martinez-Gil et al. | Sustainable semantic similarity assessment | |
Wu et al. | Broad fuzzy cognitive map systems for time series classification | |
Heese et al. | Explaining quantum circuits with shapley values: Towards explainable quantum machine learning | |
CN110427351A (en) | Active data modeling | |
Wezeman et al. | Distance-based classifier on the Quantum Inspire | |
CN115917562A (en) | Inference method and device of deep learning model, computer equipment and storage medium | |
CN114898815A (en) | Homogeneous interaction prediction method and device based on spatial structure in field of drug discovery | |
Sha et al. | Estimating minimum operation steps via memory-based recurrent calculation network | |
Wang et al. | Deep learning-based state prediction of the Lorenz system with control parameters | |
Delianidi et al. | KT-Bi-GRU: Student Performance Prediction with a Bi-Directional Recurrent Knowledge Tracing Neural Network. | |
CN114358011A (en) | Named entity extraction method and device and electronic equipment | |
CN110415006B (en) | Advertisement click rate estimation method and device | |
Zuluaga et al. | Predicting best design trade-offs: A case study in processor customization | |
Serban | Learning from large-scale neural simulations | |
Wu et al. | Explainable Network Pruning for Model Acceleration Based on Filter Similarity and Importance | |
Karimov et al. | About the speed of work of the human brain | |
Shealy et al. | Intelligent Resource Provisioning for Scientific Workflows and HPC | |
CN111989662A (en) | Autonomous hybrid analysis modeling platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |