CN110427351A - Active data modeling - Google Patents

Active data modeling Download PDF

Info

Publication number
CN110427351A
CN110427351A CN201810395943.7A CN201810395943A CN110427351A CN 110427351 A CN110427351 A CN 110427351A CN 201810395943 A CN201810395943 A CN 201810395943A CN 110427351 A CN110427351 A CN 110427351A
Authority
CN
China
Prior art keywords
model
data
variable
object module
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810395943.7A
Other languages
Chinese (zh)
Inventor
邵斌
夏欢欢
刘铁岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Priority to CN201810395943.7A priority Critical patent/CN110427351A/en
Priority to PCT/US2019/027570 priority patent/WO2019209571A1/en
Publication of CN110427351A publication Critical patent/CN110427351A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Physiology (AREA)
  • Biomedical Technology (AREA)
  • Genetics & Genomics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

In embodiment of the disclosure, method, equipment and the computer program product of a kind of active data modeling for data sets are proposed.It for data-oriented collection, initiatively selects the first subset to generate at least using the first variable as the first model of independent variable, and initiatively selects second subset to generate at least using the second variable as the second model of independent variable.Then, the first model and the second model are merged, to generate the object module of the data constraint condition of designation date concentration, for being predicted based on data set.In embodiment of the disclosure, it initiatively selects multiple data subsets to generate multiple models for multiple independents variable, and merges multiple models to generate final object module.Therefore, embodiment of the disclosure can reduce the number of the independent variable in modeling process, to effectively improve modeling efficiency for data sets.

Description

Active data modeling
Background technique
Data modeling refers to generates model based on data, the data object concentrated by analysis data-oriented, really Relationship or constraint condition between these fixed data objects, then generate the model for being most suitable for data-oriented collection.Data modeling Method include regression analysis, statistical analysis, machine learning, deep learning, gray prediction, principal component analysis, neural network with And time series analysis, etc..
Regression analysis is used to find the relationship between dependent variable and independent variable as a kind of most common modeling method. Regression analysis can be divided into simple regression and multiple regression analysis according to the number of related independent variable;According to dependent variable How much, simple regression analysis and multiple regression analysis can be divided into;It, can be with according to the relationship type between independent variable and dependent variable It is divided into linear regression analysis and nonlinear regression analysis.Symbolic Regression (symbolic regression) is a type of time Return analysis, the model (such as function) of most suitable data-oriented collection, symbol are found by evolutionary search (such as genetic programming) The target of recurrence is mode, constraint condition or the rule automatically found in data set.
Summary of the invention
In embodiment of the disclosure, propose method, the equipment of a kind of active data modeling for data sets with And computer program product.For data-oriented collection, the first subset is selected initiatively to generate at least with the first variable for change certainly First model of amount, and initiatively select second subset to generate at least using the second variable as the second model of independent variable.So Afterwards, the first model and the second model are merged, to generate the object module of the data constraint condition of designation date concentration, with For being predicted based on data set.In embodiment of the disclosure, it is multiple to be directed to initiatively to select multiple data subsets Independent variable generates multiple models, and generates final object module by merging multiple models.Therefore, the implementation of the disclosure Example can reduce the number of the independent variable in modeling process, to effectively improve modeling efficiency for data sets.
There is provided Summary is their below specific in order to introduce the selection to concept in simplified form It will be further described in embodiment.The Summary is not intended to identify the key feature or main feature of the disclosure, It is not intended to limit the scope of the present disclosure.
Detailed description of the invention
It refers to the following detailed description in conjunction with the accompanying drawings, the above and other feature, advantage and aspect of each embodiment of the disclosure It will be apparent.In the accompanying drawings, the same or similar appended drawing reference indicates the same or similar element, in which:
Fig. 1 is shown in which that the block diagram of the calculating device/server of one or more other embodiments of the present disclosure can be implemented;
Fig. 2 shows the flow charts of the method according to an embodiment of the present disclosure for active data modeling;
Fig. 3 A shows according to an embodiment of the present disclosure for generating the flow chart of the method for the first model;
Fig. 3 B shows according to an embodiment of the present disclosure for generating the flow chart of the method for the second model;
Fig. 3 C shows the flow chart according to an embodiment of the present disclosure for matching by tree and generating the method for object module;
Fig. 4 A shows the schematic diagram of uniformly accelrated rectilinear motion according to an embodiment of the present disclosure;
Fig. 4 B shows the schematic diagram of data set related with uniformly accelrated rectilinear motion according to an embodiment of the present disclosure;
Fig. 4 C shows the schematic diagram of the data subset in data set shown in Fig. 4 B;
Fig. 4 D shows according to an embodiment of the present disclosure for generating the schematic diagram for the tree for indicating each model;
Fig. 4 E shows the schematic diagram of the goal tree generated and matching each tree in Fig. 4 D;And
Fig. 5 shows active modeling method the showing compared with the experimental result of deep learning method according to the disclosure It is intended to.
Specific embodiment
Embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the certain of the disclosure in attached drawing Embodiment, it should be understood that, the disclosure can be realized by various forms, and should not be construed as being limited to this In the embodiment that illustrates, providing these embodiments on the contrary is in order to more thorough and be fully understood by the disclosure.It should be understood that It is that being given for example only property of the accompanying drawings and embodiments effect of the disclosure is not intended to limit the protection scope of the disclosure.
Terms used herein " comprising " and its deformation are that opening includes, i.e., " including but not limited to ".Term "based" It is " being based at least partially on ".Term " one embodiment " expression " at least one embodiment ";Term " another embodiment " indicates " at least one other embodiment ";Term " some embodiments " expression " at least some embodiments ".The correlation of other terms is fixed Justice provides in will be described below.
Traditionally, for data-oriented collection, a model is generated by evolutionary search, it is made to be most suitable for data-oriented Collection.In general, being directed to single independent variable, it is easy to obtain object module by simple regression analysis.However, may in data set In the presence of many independents variable, and the solution of multiple linear regression is usually more complex, needs to solve using such as least square method Multiple parameters, in some instances it may even be possible to the accurate target model for meeting data-oriented collection can not be found out.For a kind of improvement of conventional method It is to carry out Function Fitting using deep learning method, is learnt and trained using complicated neural network structure.However, holding Row deep learning needs to take a substantial amount of time cost and computing resource, and obtained model may be also not accurate enough.Therefore, The data in data set are only passively used due to traditional modeling method, thus the efficiency of its data modeling is lower.
For this purpose, embodiment of the disclosure proposes a kind of active data modeling method for data sets.In the disclosure Embodiment in, initiatively select multiple data subsets to generate multiple models for multiple independents variable, and pass through merging Multiple models generate final object module.Therefore, embodiment of the disclosure can reduce the independent variable in modeling process Number, to effectively improve modeling efficiency for data sets.
Illustrate below with reference to Fig. 1 to Fig. 5 the disclosure basic principle and several sample implementations.Fig. 1 shows it In can be implemented one or more other embodiments of the present disclosure calculating device/server 100 block diagram.It should be appreciated that shown in Fig. 1 Calculating device/server 100 out is only exemplary, without should constitute to the function of embodiment described herein and Any restrictions of range.
As shown in Figure 1, calculating the form that device/server 100 is universal computing device.Calculate device/server 100 Component can include but is not limited to one or more processors or processing unit 110, memory 120, storage equipment 130, one Or multiple communication units 140, one or more input equipments 150 and one or more output equipments 160.Processing unit 110 It can be reality or virtual processor and can persistently execute various processing according to what is stored in memory 120.In many places Manage device system in, multiple processing unit for parallel execution computer executable instructions, with improve calculate device/server 100 and Row processing capacity.
It calculates device/server 100 and generally includes multiple computer storage mediums.Such medium can be calculating and set It is the addressable any medium that can be obtained of standby/server 100, including but not limited to volatile and non-volatile media, removable It unloads and non-dismountable medium.Memory 120 can be volatile memory (such as register, cache, random access storage Device (RAM)), nonvolatile memory is (for example, read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory) or they certain combination.Storage equipment 130 can be detachable or non-removable medium, and can To include machine readable media, such as flash drive, disk or any other medium can be used in storing information And/or data (such as data set 180) and can calculate device/server 100 in be accessed.
Calculating device/server 100 may further include other detachable/non-dismountable, volatile, nonvolatile Storage medium.Although not shown in FIG. 1, can provide for being carried out from detachable, non-volatile magnetic disk (such as " floppy disk ") Reading or the disk drive being written and the disc drives for being read out or being written from detachable, anonvolatile optical disk.At this In a little situations, each driving can be connected to bus (not shown) by one or more data media interfaces.Memory 120 can To include Modeling engine 125, there are one or more program module collections, these program modules are configured as executing this paper institute The method or function of the various embodiments of description.
Communication unit 140 realizes that calculating equipment with other by communication media is communicated.Additionally, equipment/clothes are calculated The function of component of device 100 of being engaged in can realize that these computing machines can lead to single computing cluster or multiple computing machines It crosses and is communicated.Therefore, calculating device/server 100 can be used and one or more other servers, network The logical connection of personal computer (PC) or another network node is operated in networked environment.
Input equipment 150 can be one or more various input equipments, such as mouse, keyboard, trackball etc..Output is set Standby 160 can be one or more output equipments, such as display, loudspeaker, printer etc..Calculate device/server 100 also It can according to need and communicated by communication unit 140 with one or more external equipment (not shown), external equipment is such as Equipment, display equipment etc. are stored, with one or more user is led to the equipment that device/server 100 interacts is calculated Letter, or with make any equipment for calculating device/server 100 and other one or more computing device communications (for example, net Card, modem etc.) it is communicated.Such communication can be executed via input/output (I/O) interface (not shown).
As shown in Figure 1, being stored with data set 180 in storage equipment 130 comprising be related to the mass data of multiple variables. According to the embodiment of main body described herein, Modeling engine 125 initiatively selects multiple subsets in data set 180 to generate Multiple models, and multiple models are merged, to establish the model for being suitable for data set 180.It is detailed below with reference to Fig. 2-5 Carefully describe the example embodiment that Modeling engine 125 generates model based on data set 180.
Fig. 2 shows the flow charts of the method 200 according to an embodiment of the present disclosure for active data modeling.It should Understand, method 200 can be executed by the calculating device/server 100 with reference to described in Fig. 1.
202, based on the first subset in data set, generate at least using the first variable as the first model of independent variable.Number It may include various types of data according to the set of set representations data, such as traffic data, medical data, finance data, work Industry data etc..As an example, in the scene that object does uniformly accelrated rectilinear motion, data set may include and object itself Or the related data of object of which movement, data set can be related to multiple variables, external force F that quality m, object including object are subject to, The initial velocity v of object0And the run duration t of object, it hereafter reference will also be made to the example of Fig. 4 B descriptor data set 180.
In some embodiments, the first subset may include meeting a part of data of pre-defined rule in data set.For example, First subset may include multi-group data, and only one variable (such as the first variable) or only a part become in every group of data The value of amount changes, and the value of its dependent variable is fixed.Continue in the scene that object does uniformly accelrated rectilinear motion, first Every group of data in subset can change (its dependent variable in every group of data beginning speed v as before for only time t0It keeps not Become) one group of data, describe an example of data subset 410 below with reference to Fig. 4 B.
In accordance with an embodiment of the present disclosure, the first model indicates the first constraint condition that the data in the first subset meet.Example Such as, in the scene that object does uniformly accelrated rectilinear motion, the first subset of the multi-group data based on only time t variation is generated The first model can be by the constraint condition (such as functional relation) that meets between displacement d and the time t of object.Therefore, One model can be indicated in other independents variable (such as initial velocity v0Deng) remain unchanged in the case where, dependent variable be displaced d with from Functional relation between variant time t.
204, based on the second subset in data set, generate at least using the second variable as the second model of independent variable.Example Such as, also may include multi-group data in second subset, and in every group of data also only one variable (such as the second variable) or The value of only a part variable changes, and the value of its dependent variable is fixed.Second model indicates the data in second subset The second constraint condition met.
For example, in the scene that object does uniformly accelrated rectilinear motion, based on only initial velocity v0The multi-group data of variation Second subset, the second model generated can be displacement d and initial velocity v0Between constraint condition (such as the function that is met Relationship).Therefore, the second model can be indicated in the case where other independents variable (time t etc.) remain unchanged, dependent variable position Move d and independent variable initial velocity v0Between functional relation.It will be appreciated by those skilled in the art that frame 202 and 204 can sequence It executes, can also be executed in parallel.
206, generate that the data of designation date concentration meet by least merging the first model and the second model the The object module of three constraint conditions, wherein object module is used to be predicted based on data set.First subset and second subset are made For a part in data set, the constraint condition in each subset meets the constraint condition of entire data set.In some implementations In example, by generating corresponding model for each variable or every group of variable, and multiple models generated are passed through into mode Match or set alignment to merge, so as to generate the object module for being directed to entire data set, simulated target can be by table It is shown as function, formula etc..Object module generated can be used for predicting, such as based on giving birth to the traffic data collection collected At object module, can predict the following traffic condition sometime.
For example, the first mould met between d and time t will be displaced in the scene that object does uniformly accelrated rectilinear motion Type (such as first function relationship) and displacement d and initial velocity v0Between met the second model (such as second function close System) etc. be combined, displacement d and time t and initial velocity v can be generated0Object module (such as target met Deng between Functional relation).Merge the example reality that multiple models 455 generate object module 495 for example, describing below with reference to Fig. 4 D and 4E It is existing.
Method 200 according to an embodiment of the present disclosure, selects multiple data subsets initiatively to generate multiple models respectively, And multiple models are merged and generate object module for data sets.Therefore, the method 200 of embodiment of the disclosure Traditional passive type models fitting is replaced using active data modeling, is realized and is actively looked for data to generate model.
The example implementation of method 200 in Fig. 2 is described below in conjunction with Fig. 3 A- Fig. 3 C.
Fig. 3 A shows according to an embodiment of the present disclosure for generating the flow chart of the method 300 of the first model.Method 300 can be executed by the calculating device/server 100 with reference to described in Fig. 1, it is the example implementation of the frame 202 in Fig. 2. For convenience's sake, one is described by taking the related data set of uniformly accelrated rectilinear motion as an example below with reference to Fig. 4 A-4E and Fig. 5 A little example embodiments.
302, based on first group of data in the first subset in data set, generate at least using the first variable as independent variable The first submodel.304, based on second group of data in the first subset, at least using the first variable as independent variable is generated Two submodels.Bivariate value in first group of data is the first value, and the bivariate value in second group of data For second value, and the first value is different from second value.
In some embodiments, in first group of data, only the value of the first variable changes, thus generated first Submodel thereby guarantees that each submodel is easily fast and accurately generated only using the first variable as independent variable.It should Understand, first group of data not only can be the changed one group of data of only the first variable selected from data set, can also be with The value of its dependent variable other than the first variable is concentrated actively to be acquired by fixed data, thereby guaranteeing that can obtain The only changed one group of data of the first variable.It alternatively, can also be with multiple variables (non-data concentration in first group of data Whole variables) value change, thus the first submodel generated is only with multiple variables (such as the first variable and third Variable) it is independent variable.It should be appreciated that in the case where the first submodel is related to multiple independents variable, due in first group of data Variables number is reduced, thus still is able to improve the efficiency of data modeling.
For example, the schematic diagram 400 of the uniformly accelrated rectilinear motion with reference to shown in Fig. 4 A is described, it should be understood that Fig. 4 A-4E Just for the sake of facilitating some examples for understanding embodiment of the disclosure, and it is not construed as limiting the scope of the present disclosure. As shown, object 405 is carrying out even acceleration straight-line travelling on road 402, object 405 is shown in Fig. 4 A in difference The displaced position at moment.For example, being displaced 2 meters in the 1st second moment object 405,8 meters have been displaced in the 2nd second moment object 405 Deng.In uniformly accelrated rectilinear motion, the displacement of object 405 is associated with Multiple factors.
Fig. 4 B shows the data of the Multiple factors of acquisition uniformly accelrated rectilinear motion and forms data set 180.Such as Fig. 4 B institute Show, in data set 180, the initial velocity v including object0, external force F that object is subject to, object quality m, run duration t with And the value of displacement d under different conditions, wherein time t can be referred to as the first variable, initial velocity v0It can be referred to as Second variable, external force F can be referred to as third variable, and quality m can be referred to as the 4th variable.For example, the data set in Fig. 4 B The physical meaning of the first data in 180 can be with are as follows: as initial velocity v0It is 0.5 newton, quality m for 0.1 meter per second, external force F For 1.0 kilograms, time t be 1.0 seconds when, the corresponding displacement d of object is 0.35 meter.
Different from passively carrying out regression analysis or deep learning, embodiment of the disclosure using entire data set 180 Initiatively select data subset (i.e. a part of data) to establish the model of each data subset using active modeling method, And each model is combined to generate the model for being suitable for data set 180.
For example, Fig. 4 C shows the first subset 410 in the data set 180 in Fig. 4 B comprising first group of 411 He of data Second group of data 412.In first group of data 411, only the first variable t changes in multiple independents variable, and its dependent variable Value remains unchanged, wherein the second variable v0Value is 0.5.It is also only the first change in second group of data 411, in multiple independents variable Amount t changes, and the value of its dependent variable remains unchanged, wherein the second variable v0Value is 1.0.
Due to only existing an independent variable t and a dependent variable d in first group of data 411, and its dependent variable v0, F, m Value remains unchanged (wherein v0=0.5, F=1.5, m=2.0), by simple unitary Symbolic Regression, it can determine first group of number The first submodel met according to 411 is following formula (1):
ft(t)=0.5t+0.375t2 (1)
In general, during generating submodel by unitary Symbolic Regression, the model for selecting degree of fitting best, if more The degree of fitting of a candidate's submodel is consistent, then selects the shortest formula of length as submodel.Further, since second group of data Also an independent variable t and a dependent variable d are only existed in 412, and the value v of its dependent variable0, F, m remain unchanged (wherein v0= 1.0, F=1.5, m=2.0), by unitary Symbolic Regression, the second submodel that can determine that second group of data 412 is met is Following formula (2):
ft(t)=1.0t+0.375t2 (2)
Fig. 3 A is returned, 306, the first submodel and the second submodel is based on, generates the first model.For example, showing in Fig. 4 C In data subset 410 out, based on the model (such as formula (1)) generated according to first group of data 411 and according to second group of data 412 models (such as formula (2)) generated, can determine that the data in data subset 410 meet following formula (3) namely the first mould Type.That is, the first mould for the first variable can be obtained by selecting the more wheels of multi-group data operation in the first subset Type (such as function).
ft(t)=X0t+X1t2 (3)
Wherein, ft(t) it indicates using time t as the displacement function of independent variable, X0And X1For unknown parameter.
Fig. 3 B shows according to an embodiment of the present disclosure for generating the flow chart of the method 350 of the second model.It should Understand, method 350 can be executed by the calculating device/server 100 with reference to described in Fig. 1, it is the frame 204 in Fig. 2 Example implementation.
308, based on the third group data in the second subset in data set, generate at least using the second variable as independent variable Third submodel.310, based on the 4th group of data in second subset, at least using the second variable as independent variable is generated Four submodels.312, it is based on third submodel and the 4th submodel, generates the second model.First is generated with movement 302-306 Model is similar, can initiatively select the second subset of only the second variable change, and generates second subset is met second Model.For example, generating in the data set 180 described in Fig. 4 B with initial velocity v0It is all as follows for the second model of independent variable Formula (4):
Wherein,It indicates with initial velocity v0For the displacement function of independent variable, X2And X3For unknown parameter.
Optionally, in some embodiments, other than generating the first model and the second model, it is also based on data set In third subset generate at least using third variable as the third model of independent variable.For example, the data set described in Fig. 4 B In 180, the third model using external force F as independent variable, such as following formula (5) can be generated:
fF(F)=X4+X5F (5)
Wherein, fF(F) it indicates using external force F as the displacement function of independent variable, X4And X5For unknown parameter.
In some embodiments, the 4th subset being also based in data set is at least from change with the 4th variable to generate 4th model of amount.For example, the 4th mould using quality m as independent variable can be generated in the data set 180 described in Fig. 4 B Type, such as following formula (6):
Wherein, fm(m) it indicates using quality m as the displacement function of independent variable, X6And X7For unknown parameter.
Fig. 3 C shows the process according to an embodiment of the present disclosure for matching by tree and generating the method 390 of object module Figure.It should be appreciated that method 390 can be executed by the calculating device/server 100 with reference to described in Fig. 1, it is in Fig. 2 The example implementation of frame 206.
314, generates the first tree for indicating the first model and indicate the second tree of the second model.For example, Fig. 4 D is shown It is according to an embodiment of the present disclosure for generating the schematic diagram 450 of the tree that indicates each model, in each multiple models of generation After 455 (such as formula (3)-(6)), corresponding tree construction can be generated respectively for each model, such as indicate the first submodule The tree 460 of type, the tree 470 for indicating the second submodel, the tree 480 for indicating third submodel and the tree for indicating the 4th submodel 490.It is matched by using tree construction, can quickly and efficiently merge multiple models.
316, object module is generated by least matching the first tree and the second tree.For example, first can be at least based on Model and the second model generate the model template including unknown parameter.Fig. 4 E is shown and matching each tree in Fig. 4 D The goal tree 499 of generation is obtained and the tree (such as setting 460,470,480,490) of each model is carried out subgraph match .Next, the target template 495 including unknown parameter can be generated based on goal tree 499, as shown in following formula (7):
Wherein, f (t, v0, F, m) and it indicates with time t, initial velocity v0, displacement letter that external force F and quality m are independent variable Number, Y0And Y1For unknown parameter.
It include unknown parameter Y obtaining0And Y1Model template (such as formula (7)) after, can be used in data set 180 Data solve unknown parameter Y0And Y1, in this illustration, Y0Value can be determined that 1, and Y1Value can be determined It is 1/2.Therefore, unknown parameter Y is being determined0And Y1Value after, can determine the final target mould that data set 180 is met Type is formula (8):
Formula (8) is actually the displacement calculation formula of uniformly accelrated rectilinear motion, therefore, based on the active in data set 180 Formula modeling method can quickly and efficiently find the rule of the data in data set, and generate most suitable object module.
In the merging process of each model, multiple candidate families may be generated, wherein each candidate family is at least Using the first variable and the second variable as independent variable.In some embodiments, data set can be used to assess each candidate family With the appropriateness of determination each model and data set, and final object module is selected based on assessment.In this way, In the case where model can be merged into multiple object modules, accuracy can be selected highest as mesh based on data set Mark model.
Fig. 5 shows active modeling method the showing compared with the experimental result of deep learning method according to the disclosure It is intended to 500.Show that accuracy in jumping characteristic improves according to the experimental result 510 of the active modeling method of the disclosure, and The 21st second at 511 points error be 0, that is, have found most accurate object module.It is compared with this, in use ratio CPU more multicore GPU optimization machine learning algorithm in the case where, the experimental result 520 of deep learning method shows that accuracy is smoothly mentioning Height, and the 100th second point 521 at error be 0.18, the 181st second point 522 at error be 0.09.It can be seen that with Deep learning method is compared, the active modeling method of embodiment of the disclosure not only modeling speed faster, but also accuracy It is higher.
Method and function described herein can be executed at least partly by one or more hardware logic components. Such as, but not limited to, the exemplary types for the hardware logic component that can be used include field programmable gate array (FPGA), specially With integrated circuit (ASIC), Application Specific Standard Product (ASSP), system on chip (SOC), Complex Programmable Logic Devices (CPLD) etc..
For implement disclosed method program code can using any combination of one or more programming languages come It writes.These program codes can be supplied to the place of general purpose computer, special purpose computer or other programmable data processing units Device or controller are managed, so that program code makes defined in flowchart and or block diagram when by processor or controller execution Function/operation is carried out.Program code can be executed completely on machine, partly be executed on machine, as stand alone software Is executed on machine and partly execute or executed on remote machine or server completely on the remote machine to packet portion.
In the context of present disclosure, machine readable media can be tangible medium, may include or stores The program for using or being used in combination with instruction execution system, device or equipment for instruction execution system, device or equipment.Machine Device readable medium can be machine-readable signal medium or machine-readable storage medium.Machine readable media may include but unlimited In times of electronics, magnetic, optical, electromagnetism, infrared or semiconductor system, device or equipment or above content What appropriate combination.The more specific example of machine readable storage medium will include the electrical connection of line based on one or more, portable Formula computer disks, hard disk, random access memory (RAM), read-only memory (ROM), Erasable Programmable Read Only Memory EPROM (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage are set Standby or above content any appropriate combination.
Although this should be understood as requiring operating in this way with shown in addition, depicting each operation using certain order Certain order out executes in sequential order, or requires the operation of all diagrams that should be performed to obtain desired result. Under certain environment, multitask and parallel processing be may be advantageous.Similarly, although containing several tools in being discussed above Body realizes details, but these are not construed as the limitation to the scope of the present disclosure.In the context individually realized Certain features of description can also be realized in combination in single realize.On the contrary, described in the context individually realized Various features can also be realized individually or in any suitable subcombination in multiple realizations.
It is listed below some sample implementations of the disclosure.
In one aspect, a method of computer implementation is provided.This method comprises: based on the first son in data set Collection generates at least using the first variable as the first model of independent variable, and the first model indicates that the data in the first subset meet One constraint condition;Based on the second subset in data set, generate at least using the second variable as the second model of independent variable, the second mould Type indicates the second constraint condition that the data in second subset meet, and the first variable and the second variable are in data set Variable;And the third constraint that the data of designation date concentration meet is generated by least merging the first model and the second model The object module of condition, object module are used to be predicted based on data set.
In some embodiments, wherein generating the first model includes: to be generated extremely based on first group of data in the first subset Less using the first variable as the first submodel of independent variable, the bivariate value in first group of data is the first value;Based on Second group of data in one subset generate at least using the first variable as the second submodel of independent variable, in second group of data Bivariate value is second value, and the first value is different from second value;And it is based on the first submodel and the second submodel, it is raw At the first model.
In some embodiments, wherein only the value of the first variable changes in first group of data, and first is generated Submodel includes: to be generated based on first group of data only using the first variable as the first submodel of independent variable.
In some embodiments, this method further include: other changes other than the first variable are concentrated by fixed data The value of amount acquires first group of data.
In some embodiments, wherein generating object module includes: to generate the first tree for indicating the first model and indicate Second tree of the second model;And object module is generated by the first tree of matching and the second tree.
In some embodiments, wherein generating object module includes: based on the first model and the second model, generating includes not Know the model template of parameter;Carry out the unknown parameter in solving model template using data set;And based on model template and unknown Parameter determines object module.
In some embodiments, wherein generating object module includes: to be generated by merging the first model and the second model The first candidate family and the second candidate family for data sets, the first candidate family and the second candidate family are at least with first Variable and the second variable are independent variable;The first candidate family and the second candidate family are assessed using data set;And based on pair The assessment of first candidate family and the second candidate family, determines object module.
In some embodiments, this method further include: based on the third subset in data set, generate at least with third variable For the third model of variable, third model indicates the 4th constraint condition that the data in third subset meet, and generates target Model includes: to generate object module by merging the first model, the second model and third model.
On the other hand, a kind of electronic equipment is provided.The equipment includes: processing unit and memory, memory It is coupled to processing unit and is stored with instruction, instruction executes following movement when being executed by processing unit: based in data set The first subset, generate at least using the first variable as the first model of independent variable, the first model indicate the first subset in data The first constraint condition met;Based on the second subset in data set, generate at least using the second variable as the second mould of independent variable Type, the second model indicates the second constraint condition that the data in second subset meet, and the first variable and the second variable are Variable in data set;And the data satisfaction of designation date concentration is generated by least merging the first model and the second model Third constraint condition object module, object module is used to be predicted based on data set.
In some embodiments, wherein generating the first model includes: to be generated extremely based on first group of data in the first subset Less using the first variable as the first submodel of independent variable, the bivariate value in first group of data is the first value;Based on Second group of data in one subset generate at least using the first variable as the second submodel of independent variable, in second group of data Bivariate value is second value, and the first value is different from second value;And it is based on the first submodel and the second submodel, it is raw At the first model.
In some embodiments, wherein only the value of the first variable changes in first group of data, and first is generated Submodel includes: to be generated based on first group of data only using the first variable as the first submodel of independent variable.
In some embodiments, it acts further include: its dependent variable other than the first variable is concentrated by fixed data Value acquire first group of data.
In some embodiments, wherein generating object module includes: to generate the first tree for indicating the first model and indicate Second tree of the second model;And object module is generated by the first tree of matching and the second tree.
In some embodiments, wherein generating object module includes: based on the first model and the second model, generating includes not Know the model template of parameter;Carry out the unknown parameter in solving model template using data set;And based on model template and unknown Parameter determines object module.
In some embodiments, wherein generating object module includes: to be generated by merging the first model and the second model The first candidate family and the second candidate family for data sets, the first candidate family and the second candidate family are at least with first Variable and the second variable are independent variable;The first candidate family and the second candidate family are assessed using data set;And based on pair The assessment of first candidate family and the second candidate family, determines object module.
In some embodiments, it acts further include: based on the third subset in data set, generation is at least with third variable The third model of variable, third model indicates the 4th constraint condition that the data in third subset meet, and generates target mould Type includes: to generate object module by merging the first model, the second model and third model.
In yet another aspect, a kind of computer program product is provided, computer program product is stored in non-transient meter In calculation machine storage medium and including machine-executable instruction, equipment is made when machine-executable instruction is run in a device: base The first subset in data set generates at least using the first variable as the first model of independent variable, the first son of the first model instruction The first constraint condition that the data of concentration meet;Based on the second subset in data set, generating at least is from change with the second variable Second model of amount, the second model indicate the second constraint condition that the data in second subset meet, and the first variable and the Two variables are the variable in data set;And designation date concentration is generated by least merging the first model and the second model The object module of third constraint condition that meets of data, object module is used to be predicted based on data set.
In some embodiments, wherein generating the first model includes: to be generated extremely based on first group of data in the first subset Less using the first variable as the first submodel of independent variable, the bivariate value in first group of data is the first value;Based on Second group of data in one subset generate at least using the first variable as the second submodel of independent variable, in second group of data Bivariate value is second value, and the first value is different from second value;And it is based on the first submodel and the second submodel, it is raw At the first model.
In some embodiments, wherein only the value of the first variable changes in first group of data, and first is generated Submodel includes: to be generated based on first group of data only using the first variable as the first submodel of independent variable.
In some embodiments, equipment is also made when machine-executable instruction is run in a device: passing through fixed data set In the value of its dependent variable other than the first variable acquire first group of data.
In some embodiments, wherein generating object module includes: to generate the first tree for indicating the first model and indicate Second tree of the second model;And object module is generated by the first tree of matching and the second tree.
In some embodiments, wherein generating object module includes: based on the first model and the second model, generating includes not Know the model template of parameter;Carry out the unknown parameter in solving model template using data set;And based on model template and unknown Parameter determines object module.
In some embodiments, wherein generating object module includes: to be generated by merging the first model and the second model The first candidate family and the second candidate family for data sets, the first candidate family and the second candidate family are at least with first Variable and the second variable are independent variable;The first candidate family and the second candidate family are assessed using data set;And based on pair The assessment of first candidate family and the second candidate family, determines object module.
In some embodiments, make equipment when machine-executable instruction is run in a device: based in data set Three subsets generate at least using third variable as the third model of variable, and third model indicates what the data in third subset met 4th constraint condition, and generating object module includes: to be generated by merging the first model, the second model and third model Object module.
Although having used specific to the language description of the structure feature and/or method logical action disclosure, answer When understanding that theme defined in the appended claims is not necessarily limited to special characteristic described above or movement.On on the contrary, Special characteristic described in face and movement are only to realize the exemplary forms of claims.

Claims (20)

1. a method of computer implementation, comprising:
Based on the first subset in data set, generate at least using the first variable as the first model of independent variable, first model Indicate the first constraint condition that the data in first subset meet;
The second subset concentrated based on the data generates at least using the second variable as the second model of independent variable, described second Model indicates the second constraint condition that the data in the second subset meet, and first variable and second variable It is the variable in the data set;And
It is generated by least merging first model and second model and indicates what the data in the data set met The object module of third constraint condition, the object module are predicted for collection based on the data.
2. according to the method described in claim 1, wherein generating first model and including:
Based on first group of data in first subset, generate at least using first variable as the first submodule of independent variable Type, the bivariate value in first group of data are the first value;
Based on second group of data in first subset, generate at least using first variable as the second submodule of independent variable Type, the bivariate value in second group of data is second value, and first value and the second value are not Together;And
Based on first submodel and second submodel, first model is generated.
3. according to the method described in claim 2, only the value of first variable becomes wherein in first group of data Change, and generates first submodel and include:
Based on first group of data, generate only using first variable as first submodel of independent variable.
4. according to the method described in claim 3, further include:
Described first group is acquired by the value of its dependent variable in the fixation data set other than first variable Data.
5. according to the method described in claim 1, wherein generating the object module and including:
Generate the first tree for indicating first model and the second tree for indicating second model;And
The object module is generated by matching first tree and second tree.
6. according to the method described in claim 1, wherein generating the object module and including:
Based on first model and second model, the model template including unknown parameter is generated;
The unknown parameter in the model template is solved using the data set;And
Based on the model template and the unknown parameter, the object module is determined.
7. according to the method described in claim 1, wherein generating the object module and including:
The first candidate family for the data set and the are generated by merging first model and second model Two candidate families, first candidate family and second candidate family are at least become with first variable and described second Amount is independent variable;
First candidate family and second candidate family are assessed using the data set;And
Based on the assessment to the first candidate family and second candidate family, the object module is determined.
8. according to the method described in claim 1, further include:
The third subset concentrated based on the data generates at least using third variable as the third model of variable, the third mould Type indicates the 4th constraint condition that the data in the third subset meet,
And generating the object module includes: by merging first model, second model and the third mould Type generates the object module.
9. a kind of electronic equipment, comprising:
Processing unit;And
Memory is coupled to the processing unit and is stored with instruction, and described instruction is held when being executed by the processing unit The following movement of row:
Based on the first subset in data set, generate at least using the first variable as the first model of independent variable, first model Indicate the first constraint condition that the data in first subset meet;
The second subset concentrated based on the data generates at least using the second variable as the second model of independent variable, described second Model indicates the second constraint condition that the data in the second subset meet, and first variable and second variable It is the variable in the data set;And
It is generated by least merging first model and second model and indicates what the data in the data set met The object module of third constraint condition, the object module are predicted for collection based on the data.
10. equipment according to claim 9, wherein generating first model and including:
Based on first group of data in first subset, generate at least using first variable as the first submodule of independent variable Type, the bivariate value in first group of data are the first value;
Based on second group of data in first subset, generate at least using first variable as the second submodule of independent variable Type, the bivariate value in second group of data is second value, and first value and the second value are not Together;And
Based on first submodel and second submodel, first model is generated.
11. equipment according to claim 10, wherein only the value of first variable occurs in first group of data Variation, and generate first submodel and include:
Based on first group of data, generate only using first variable as first submodel of independent variable.
12. equipment according to claim 11, the movement further include:
Described first group is acquired by the value of its dependent variable in the fixation data set other than first variable Data.
13. equipment according to claim 9, wherein generating the object module and including:
Generate the first tree for indicating first model and the second tree for indicating second model;And
The object module is generated by matching first tree and second tree.
14. equipment according to claim 9, wherein generating the object module and including:
Based on first model and second model, the model template including unknown parameter is generated;
The unknown parameter in the model template is solved using the data set;And
Based on the model template and the unknown parameter, the object module is determined.
15. equipment according to claim 9, wherein generating the object module and including:
The first candidate family for the data set and the are generated by merging first model and second model Two candidate families, first candidate family and second candidate family are at least become with first variable and described second Amount is independent variable;
First candidate family and second candidate family are assessed using the data set;And
Based on the assessment to the first candidate family and second candidate family, the object module is determined.
16. equipment according to claim 9, the movement further include:
The third subset concentrated based on the data generates at least using third variable as the third model of variable, the third mould Type indicates the 4th constraint condition that the data in the third subset meet,
And generating the object module includes: by merging first model, second model and the third mould Type generates the object module.
17. a kind of computer program product, the computer program product is stored in non-transitory, computer storage medium simultaneously And including machine-executable instruction, the machine-executable instruction makes the equipment when running in a device:
Based on the first subset in data set, generate at least using the first variable as the first model of independent variable, first model Indicate the first constraint condition that the data in first subset meet;
The second subset concentrated based on the data generates at least using the second variable as the second model of independent variable, described second Model indicates the second constraint condition that the data in the second subset meet, and first variable and second variable It is the variable in the data set;And
It is generated by least merging first model and second model and indicates what the data in the data set met The object module of third constraint condition, the object module are predicted for collection based on the data.
18. computer program product according to claim 17, wherein generating first model and including:
Based on first group of data in first subset, generate at least using first variable as the first submodule of independent variable Type, the bivariate value in first group of data are the first value;
Based on second group of data in first subset, generate at least using first variable as the second submodule of independent variable Type, the bivariate value in second group of data is second value, and first value and the second value are not Together;And
Based on first submodel and second submodel, first model is generated.
19. computer program product according to claim 17, wherein generating the object module and including:
Generate the first tree for indicating first model and the second tree for indicating second model;And
The object module is generated by matching first tree and second tree.
20. computer program product according to claim 17, wherein generating the object module and including:
Based on first model and second model, the model template including unknown parameter is generated;
The unknown parameter in the model template is solved using the data set;And
Based on the model template and the unknown parameter, the object module is determined.
CN201810395943.7A 2018-04-27 2018-04-27 Active data modeling Pending CN110427351A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810395943.7A CN110427351A (en) 2018-04-27 2018-04-27 Active data modeling
PCT/US2019/027570 WO2019209571A1 (en) 2018-04-27 2019-04-16 Proactive data modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810395943.7A CN110427351A (en) 2018-04-27 2018-04-27 Active data modeling

Publications (1)

Publication Number Publication Date
CN110427351A true CN110427351A (en) 2019-11-08

Family

ID=66323996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810395943.7A Pending CN110427351A (en) 2018-04-27 2018-04-27 Active data modeling

Country Status (2)

Country Link
CN (1) CN110427351A (en)
WO (1) WO2019209571A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220027775A1 (en) * 2020-07-21 2022-01-27 International Business Machines Corporation Symbolic model discovery based on a combination of numerical learning methods and reasoning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090018982A1 (en) * 2007-07-13 2009-01-15 Is Technologies, Llc Segmented modeling of large data sets

Also Published As

Publication number Publication date
WO2019209571A1 (en) 2019-10-31

Similar Documents

Publication Publication Date Title
US10032114B2 (en) Predicting application performance on hardware accelerators
Sahu et al. Predicting software bugs of newly and large datasets through a unified neuro-fuzzy approach: Reliability perspective
Fazanaro et al. Numerical characterization of nonlinear dynamical systems using parallel computing: The role of GPUs approach
Fan et al. Sketch-based fast and accurate querying of time series using parameter-sharing LSTM networks
Zou et al. Correcting model misspecification in physics-informed neural networks (PINNs)
Mochurad Optimization of Regression Analysis by Conducting Parallel Calculations.
Prats et al. Automatic generation of workload profiles using unsupervised learning pipelines
Martinez-Gil et al. Sustainable semantic similarity assessment
Wu et al. Broad fuzzy cognitive map systems for time series classification
Heese et al. Explaining quantum circuits with shapley values: Towards explainable quantum machine learning
CN110427351A (en) Active data modeling
Wezeman et al. Distance-based classifier on the Quantum Inspire
CN115917562A (en) Inference method and device of deep learning model, computer equipment and storage medium
CN114898815A (en) Homogeneous interaction prediction method and device based on spatial structure in field of drug discovery
Sha et al. Estimating minimum operation steps via memory-based recurrent calculation network
Wang et al. Deep learning-based state prediction of the Lorenz system with control parameters
Delianidi et al. KT-Bi-GRU: Student Performance Prediction with a Bi-Directional Recurrent Knowledge Tracing Neural Network.
CN114358011A (en) Named entity extraction method and device and electronic equipment
CN110415006B (en) Advertisement click rate estimation method and device
Zuluaga et al. Predicting best design trade-offs: A case study in processor customization
Serban Learning from large-scale neural simulations
Wu et al. Explainable Network Pruning for Model Acceleration Based on Filter Similarity and Importance
Karimov et al. About the speed of work of the human brain
Shealy et al. Intelligent Resource Provisioning for Scientific Workflows and HPC
CN111989662A (en) Autonomous hybrid analysis modeling platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination