WO2017124713A1 - Procédé et appareil de détermination de modèle de données - Google Patents

Procédé et appareil de détermination de modèle de données Download PDF

Info

Publication number
WO2017124713A1
WO2017124713A1 PCT/CN2016/090343 CN2016090343W WO2017124713A1 WO 2017124713 A1 WO2017124713 A1 WO 2017124713A1 CN 2016090343 W CN2016090343 W CN 2016090343W WO 2017124713 A1 WO2017124713 A1 WO 2017124713A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
algorithm
feature
model
analyzed
Prior art date
Application number
PCT/CN2016/090343
Other languages
English (en)
Chinese (zh)
Inventor
刘权
涂丹丹
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2017124713A1 publication Critical patent/WO2017124713A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for determining a data model.
  • Data mining is the mining of knowledge from data, and data mining algorithms are a set of heuristics and calculation processes for creating data mining models based on data.
  • Automatic model selection technology refers to the technology of automated data mining algorithms without human intervention. Take SPSS (Statistical Product and Service Solutions) as an example.
  • the automatic classifier can be used for all algorithms in the algorithm library, or specified by the user.
  • the algorithm trains the data model by traversal to create a model corresponding to each algorithm, and then the model is evaluated in the test set, and the performance of the model is verified by a verification set to obtain a preferred model of the data set.
  • the automatic classifier trains the data model based on all algorithms or user-configured algorithms in the algorithm and verifies the performance of these models to determine the most efficient algorithm model, while in the case of large data sets or a large number of algorithms, Auto classifiers may need to create hundreds or thousands of models, which can take hours or longer and affect the performance of model selection.
  • Embodiments of the present invention provide a method and apparatus for determining a data model, which can reduce the number of models created, thereby shortening the time spent in creating a model and improving data model selection performance.
  • an embodiment of the present invention provides a method for determining a data model, Include: determining a request according to the received data model, and extracting a data feature vector of the data to be analyzed, wherein the data feature vector is used to reflect the data feature of the data to be analyzed; and using multiple algorithm feature models in the algorithm feature model library
  • the data feature vector is used to determine T algorithms having the highest degree of correlation with the data to be analyzed in the algorithm library, T ⁇ 1, and the plurality of algorithm feature models in the feature model library of the algorithm are analyzed according to a preset reference data set.
  • the algorithm library includes at least one algorithm for processing the data to be analyzed; respectively, using the T algorithms in the algorithm library to process the data to be analyzed to obtain T data models And outputting the data model with the highest degree of matching with the data in the T data models to determine the request in response to the data model.
  • the T algorithms with the highest correlation with the data to be analyzed can be determined, so that the filter can be filtered.
  • the algorithm in the algorithm library that is not highly correlated with the data to be analyzed, so that the model training, evaluation and verification of the data to be analyzed are not required for all the algorithms in the algorithm library, the number of created models can be reduced, thereby shortening the creation
  • the time spent in the model improves the performance of the data model selection.
  • the method further includes: establishing an algorithm feature matrix, the algorithm feature matrix comprising a data feature vector of each set of reference data in the reference data set and a corresponding algorithm of each set of reference data in the algorithm library
  • the MART Multiple Additive Regression Tree
  • the MART Multiple Additive Regression Tree
  • the establishing an algorithm feature matrix includes: extracting a data feature vector of each set of reference data in the reference data set, wherein the data feature vector is used to represent linear information, attribute information, and instance information of the reference data. And the sparsity information; marking the identifier of the corresponding algorithm for the data feature vector of each set of reference data to obtain the feature matrix of the algorithm.
  • the data feature vector is analyzed by using multiple algorithm feature models in the algorithm feature model library to determine T algorithms in the algorithm library that have the highest correlation with the data to be analyzed, including: using the algorithm
  • the algorithm feature model in the feature model library loads the data feature vector to calculate various algorithm features in the feature model library of the algorithm Correlation of the model with the data to be analyzed; sorting the algorithm feature models according to the correlation from high to low, to obtain the T algorithm feature models before the ranking; determining and the T algorithm feature models Corresponding T algorithms, the T algorithms have the highest correlation with the data to be analyzed.
  • the method further includes: saving a correction parameter in the process of analyzing the feature matrix of the algorithm by using the MART algorithm, the correction parameter including each node weight, leaf sample residual and label relationship in the MART algorithm
  • the label relationship refers to a correspondence between the reference data in the reference data set and the identifier of the algorithm in the algorithm library, and the correction parameter can be used to improve the accuracy of the T algorithms obtained by using the algorithm model library;
  • the data model of the analyzed data and the algorithm feature model of the T algorithms, the accuracy rate is calculated by a preset calculation rule, and the accuracy is used to indicate the accuracy of the ranking result of the T algorithm feature models; if the accuracy is less than The threshold value is used to correct the algorithm feature model in the algorithm feature model library.
  • the algorithm feature model in the algorithm feature model library can be corrected according to the saved correction parameter.
  • the modified model feature model library may be used to ensure the accuracy of the data model of the subsequent data to be analyzed.
  • the data feature vector of the data to be analyzed is used to represent linear correlation information, attribute information, instance information, and sparsity information of the data to be analyzed.
  • an embodiment of the present invention provides a data model determining apparatus, including: a feature extracting unit, configured to determine a request according to a received data model, and extract a data feature vector of data to be analyzed, where the data feature vector is used.
  • the algorithm screening unit is configured to analyze the data feature vector by using multiple algorithm feature models in the algorithm feature model library to determine the highest correlation between the algorithm library and the data to be analyzed.
  • T algorithms, T ⁇ 1 multiple algorithm feature models in the feature model library of the algorithm are obtained by analyzing a plurality of algorithms in the algorithm library according to a preset reference data set, wherein the algorithm library includes processing data to be analyzed At least one algorithm; a processing unit, configured to process the data to be analyzed using T algorithms in the algorithm library, respectively, to obtain T data models; and a model output unit, configured to A data model of the type that matches the highest degree of data is output to determine a request in response to the data model.
  • the apparatus further includes: a model library establishing unit, configured to establish an algorithm feature matrix, the algorithm feature matrix including a data feature vector of each set of reference data in the reference data set and each set of reference data in The identifier of the corresponding algorithm in the algorithm library; the MART algorithm is used to analyze the feature matrix of the algorithm to obtain at least one algorithm feature model, and the at least one algorithm feature model constitutes the algorithm feature model library.
  • a model library establishing unit configured to establish an algorithm feature matrix, the algorithm feature matrix including a data feature vector of each set of reference data in the reference data set and each set of reference data in The identifier of the corresponding algorithm in the algorithm library
  • the MART algorithm is used to analyze the feature matrix of the algorithm to obtain at least one algorithm feature model
  • the at least one algorithm feature model constitutes the algorithm feature model library.
  • the model library establishing unit is specifically configured to extract a data feature vector of each set of reference data in the reference data set, where the data feature vector is used to represent linear information, attribute information, and an instance of the reference data. Information and sparsity information; marking the identifier of the corresponding algorithm for the data feature vector of each set of reference data to obtain the algorithm feature matrix.
  • the algorithm screening unit is specifically configured to: load the data feature vector by using an algorithm feature model in the feature model library of the algorithm, to calculate each algorithm feature model in the feature model library of the algorithm and the to-be-analyzed Correlation degree of the data; sorting the algorithm feature models according to the order of relevance from high to low to obtain the T algorithm feature models ranked first; and determining T corresponding to the T algorithm feature models Algorithms, the T algorithms have the highest correlation with the data to be analyzed.
  • the apparatus further includes: a saving unit, configured to save a correction parameter in the process of analyzing the feature matrix of the algorithm by using the MART algorithm, where the correction parameter includes each node weight and a leaf sample in the MART algorithm Residual and label relationship, the label relationship refers to a correspondence between the reference data in the reference data set and the identifier of the algorithm in the algorithm library; the calculation unit is configured to use the data model of the data to be analyzed and the T algorithms
  • the algorithm feature model calculates an accuracy rate by using a preset calculation rule, the accuracy rate is used to indicate the accuracy of the sorting result of the T algorithm feature models; and the correcting unit is configured to use the correction if the accuracy is less than the threshold value
  • the parameters are modified by the algorithm in the algorithm feature model library.
  • an embodiment of the present invention provides a data model determining apparatus, including: a processor, a memory, a bus, and a communication interface; the memory is used to store a computer Executing an instruction, the processor is connected to the memory through the bus, and when the determining device of the data model is running, the processor executes the computer-executed instruction stored in the memory, so that the determining device of the data model performs the first aspect
  • a data model determining apparatus including: a processor, a memory, a bus, and a communication interface; the memory is used to store a computer Executing an instruction, the processor is connected to the memory through the bus, and when the determining device of the data model is running, the processor executes the computer-executed instruction stored in the memory, so that the determining device of the data model performs the first aspect
  • a data model determining apparatus including: a processor, a memory, a bus, and a communication interface; the memory is used to store a computer Executing an instruction, the processor is connected to the memory through the
  • an embodiment of the present invention provides a computer storage medium for storing computer software instructions used by the determining device for the data model, which includes a program designed to execute the determining device for the data model.
  • the name of the determining device of the data model does not limit the device itself. In actual implementation, these devices may appear under other names. As long as the functions of the respective devices are similar to the present invention, they are within the scope of the claims and the equivalents thereof.
  • FIG. 1 is a schematic diagram of an application scenario of a method for determining a data model according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart 1 of a method for determining a data model according to an embodiment of the present invention
  • FIG. 3 is a second flowchart of a method for determining a data model according to an embodiment of the present disclosure
  • FIG. 4 is a time-consuming diagram of a method for determining a data model and a method for determining a data model in the prior art according to an embodiment of the present invention
  • FIG. 5 is a flowchart 3 of a method for determining a data model according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of a method for determining a data model according to an embodiment of the present invention. four;
  • FIG. 7 is a schematic structural diagram 1 of a device for determining a data model according to an embodiment of the present disclosure
  • FIG. 8 is a schematic structural diagram 2 of a device for determining a data model according to an embodiment of the present disclosure
  • FIG. 9 is a first schematic structural diagram of a hardware structure of a data model determining apparatus according to an embodiment of the present disclosure.
  • FIG. 10 is a second schematic structural diagram of a hardware structure of a device for determining a data model according to an embodiment of the present invention.
  • the embodiment of the present invention provides a method for determining a data model, which can be applied to an application layer of an Open Systems Interconnection Reference (OSI) model, and can be specifically implemented in a general-purpose operating system.
  • OSI Open Systems Interconnection Reference
  • FIG. 1 a schematic diagram of an application scenario of the method for determining the data model, wherein the upper layer is an application layer, including various applications, network simulations, network plans, and the like; the lower layer is a device layer, and is mainly responsible for The data to be analyzed and the operations of calculating, forwarding, and storing the reference data set, wherein the upper application layer can establish communication with the lower device layer.
  • the application layer can be used to the device through an API (Application Programming Interface).
  • the physical device in the layer (for example, the determining node of the data model) delivers the user's instruction; or the physical device in the device layer can pass the generated data result through the GUI of the application layer (Graphical User Interface) The user shows the results.
  • the determining device of the data model that carries the determining method of the data model may be a physical device, as shown in FIG. 1 as the determining node 01 of the data model, or may be a logical functional module in a physical device or Software unit.
  • the data model The determining node 01 can communicate with at least one computing node 02 to perform a computing scheduling function, and can also communicate with at least one data storage node 03 to perform a data transfer function; or alternatively, the determining device of the data model can function as a software unit Integrated in a computing node 02 or a data storage node 03, the computing node 02 or a data storage node 03 where the determining device of the data model is located may communicate with other at least one computing node to perform a computing scheduling function, and may also be at least A data storage node communicates to perform a data transmission function, which is not limited in the embodiment of the present invention. It should be noted that, in FIG. 1 , only the networking form of the communication between the determining node of the data model and a computing node 02 and a data node 03 is exemplarily illustrated, and the specific implementation manner of the hardware layer in the embodiment of the present invention is not Make a limit.
  • the present invention has no restrictions on any hardware in the operating system, including but not limited to the computing node 02 and the data storage node 03 shown in FIG. 1, as long as all hardware products satisfying the computing power requirements are applicable.
  • Hbase a distributed, column-oriented open source database
  • MySQL relational database management system
  • Sybase a relational database. System
  • Oracle relational database management system
  • HDFS Hadoop Distributed File System
  • the present invention also has no mandatory requirements for the computing platform used in the computing node 02, and the user can adopt a Hadoop (Hadoop Distributed File System) or Spark platform, or other computing platform that satisfies the actual computing requirements.
  • Hadoop Hadoop Distributed File System
  • Spark platform or other computing platform that satisfies the actual computing requirements.
  • an embodiment of the present invention provides a method for determining a data model, as shown in FIG. 2, including:
  • the determining device of the data model determines the request according to the received data model, and extracts a data feature vector in the data to be analyzed, where the data feature vector is used to reflect the data feature of the data to be analyzed.
  • the determining device of the data model analyzes the data feature vector by using multiple algorithm feature models in the algorithm feature model library to determine that the algorithm library is related to the data to be analyzed.
  • the T algorithms with the highest degree, T ⁇ 1 the multiple algorithm feature models in the feature model library of the algorithm are obtained according to a plurality of algorithms in the preset reference data set analysis algorithm library.
  • the determining device of the data model respectively uses the T algorithms to process the data to be analyzed to obtain T data models.
  • the determining device of the data model outputs the data model with the highest degree of matching with the data in the T data models to determine the request in response to the data model.
  • the determining means of the data model may acquire a user trigger, or a data model determination request sent by another device, the data model determining request for indicating a data model for determining the data to be analyzed.
  • the determining device of the data model may first extract the data feature vector in the data to be analyzed, and the data feature vector is used to reflect the data features of the data to be analyzed.
  • the data feature vector may include at least linear correlation information, attribute information, instance information, and sparsity information of the data to be analyzed.
  • the linear correlation information may be characterized by a linear correlation degree to reflect the closeness of the linear correlation relationship in the data to be analyzed; the attribute information may be used to reflect the dimensional attribute of the data to be analyzed; and the instance information may be used by the data to be analyzed.
  • the number of samples is characterized to reflect the number of samples of the data to be analyzed; the sparsity information can be characterized by sparsity to indicate the relative percentage of cells that do not contain the multidimensional structure of the data.
  • the data feature vector may further include other information for reflecting data characteristics of the data to be analyzed, such as discreteness information, skewness and kurtosis information, or centralized trend information, etc., and those skilled in the art may
  • the data feature vector is set by the experience or the algorithm, which is not limited by the embodiment of the present invention.
  • the determining means of the data model may perform data representation on the data to be analyzed to extract the data feature vector.
  • a multidimensional regression association algorithm may be specifically used to calculate the linear correlation of the data to be analyzed, and the linear correlation of the data to be analyzed is obtained, wherein the multidimensional regression association algorithm is used to calculate the data to be analyzed.
  • the steps and formulas for the linear correlation are as follows:
  • N can be set to the number of samples of the data to be analyzed
  • J is the attribute dimension of the data to be analyzed
  • X is the attribute matrix of the data to be analyzed
  • y is the prediction label vector
  • b is the proportional coefficient, which can be expressed as follows:
  • the multidimensional regression linear correlation R of the data to be analyzed can be calculated by the following formula:
  • the multidimensional regression linear correlation R calculated in the above step may be corrected by adjusting the value of the sample number N in the following formula to obtain the corrected multidimensional regression linear correlation.
  • the determining device of the data model performs data image representation through the data to be analyzed, and in addition to obtaining the linear correlation information, the attribute information, the instance information, and the sparsity of the data to be analyzed may be further acquired. Information, etc., finally obtain a data feature vector that can characterize the data features to be analyzed: [f1, f2, ..., fn].
  • step 102 the determining means of the data model analyzes the data feature vector using a plurality of algorithm feature models in the algorithm feature model library to determine at least one algorithm included in the algorithm library for processing the data to be analyzed, T ⁇ 1.
  • the algorithm library includes a set of algorithms used to determine a request for various data models; the algorithm feature library includes an algorithm feature model applicable to each of the pre-stored algorithm libraries, and the algorithm features
  • the model is obtained according to a plurality of algorithms in a preset reference data set analysis algorithm library, and the reference data set may be obtained by sampling or the like. For example, face data of 100 individuals may be collected in advance as a reference data set. Then, according to these benchmark data sets, each algorithm in the algorithm library is used to train the data model, and finally the algorithm feature model applicable to each algorithm in the algorithm library is obtained.
  • the determining device of the data model may use the algorithm feature model in the algorithm feature model library to load the data feature vector obtained in step 101, which may become a model prediction process.
  • the prediction result is obtained, that is, the correlation between each algorithm feature model in the algorithm feature model library and the data to be analyzed, and then the algorithm model models can be sorted according to the order of relevance from high to low to obtain the sorting.
  • the first T algorithm feature models namely alg1, alg2, ..., algT
  • the algorithm feature model library includes the algorithm feature model applicable to each algorithm in the pre-stored algorithm library, that is, the algorithm feature model library includes The correspondence between the arbitrary algorithm and the algorithm feature model. Therefore, the determining device of the data model can further determine T algorithms corresponding to the T algorithm feature models, and the T algorithms are The T algorithms with the highest degree of data correlation to be analyzed.
  • T feature models with higher correlation with the data to be analyzed can be determined by the algorithm feature model library, but the algorithm feature model in the algorithm feature model library is trained based on the reference data set.
  • the model therefore, the T feature models are not models trained for the data to be analyzed, for example, the data to be analyzed is the face data of the face 1 and the 100 face faces based on the samples in the algorithm feature model library
  • the data is trained on the algorithm of the algorithm library, and multiple algorithm feature models are obtained.
  • the T algorithm feature models obtained by the model prediction in step 102 are not really based on the face data trained by the face 1 but
  • the T algorithms obtained by the model prediction can be used as the T algorithms with the highest degree of correlation with the data to be analyzed. Therefore, in order to determine the true data model of the data to be analyzed, the following step 103 can be further performed.
  • step 103 similar to the existing automatic model selection process, the data model determining means creates T data models for the data to be analyzed using the T algorithms determined in step 102.
  • step 104 similar to the existing automatic model selection process, the data model with the highest degree of data matching is output from the T data models obtained in step 103 to determine the request in response to the data model in step 101.
  • the T data models are evaluated, and the data model with the highest evaluation result is output, that is, the data automatic model selection to be analyzed is performed using the T algorithms to obtain a data model of the data to be analyzed.
  • the prior art it is necessary to perform model training and evaluation according to all algorithms in the algorithm inventory or user-configured algorithm types, thereby selecting the most effective algorithm model; and in the present application, Since the T algorithms with the highest degree of correlation with the data to be analyzed are obtained from the algorithm library through steps 101 and 102, that is, the T algorithms with the highest matching degree with the data to be analyzed can filter out the data in the algorithm library and the data to be analyzed.
  • the algorithm with low matching does not need to traverse all the algorithms in the algorithm library to create the model, which can reduce the number of models created, thus shortening the time spent in data model selection and improving data model selection performance.
  • an embodiment of the present invention provides a method for determining a data model.
  • a data feature vector in data to be analyzed may be extracted, where the data feature vector may be used to reflect data characteristics of data to be analyzed; and further, using algorithm features
  • the plurality of algorithm feature models in the model library analyze the data feature vector to determine T algorithms having the highest correlation with the data to be analyzed in the algorithm library, T ⁇ 1, wherein the algorithm features multiple models in the library
  • the algorithm feature model is obtained by analyzing a plurality of algorithms in the algorithm database according to the preset reference data set; thus, only the T algorithm with the highest correlation is needed to automatically select the data to be analyzed, and the model can be obtained.
  • the data model of the data to be analyzed may be extracted, where the data feature vector may be used to reflect data characteristics of data to be analyzed; and further, using algorithm features
  • the plurality of algorithm feature models in the model library analyze the data feature vector to determine T algorithms having the highest correlation with the data to be analyzed in the algorithm library, T ⁇ 1, wherein
  • the data feature vector is predicted by using the algorithm feature model in the algorithm feature model library, and the T algorithms with the highest correlation with the data to be analyzed can be determined.
  • the algorithm in the algorithm library that is not highly correlated with the data to be analyzed can be filtered out, so that the model training, evaluation and verification of the data to be analyzed are not required for all the algorithms in the algorithm library, thereby reducing the number of models created. , thereby reducing the time spent creating the model and improving the performance of the data model selection.
  • the determining device of the data model may further analyze all the algorithms in the algorithm library by using a preset reference data set to establish the above-mentioned algorithm feature model library, as shown in FIG. 5,
  • the method of the above algorithm feature model library specifically includes:
  • the data model determining device extracts a data feature vector of each set of reference data in the reference data set, where the data feature vector is used to reflect data characteristics of the set of reference data.
  • the determining device of the data model marks the identifier of the corresponding algorithm for the data feature vector of each set of reference data to establish an algorithm feature matrix.
  • the data model determining apparatus analyzes the algorithm feature matrix by using a MART algorithm to obtain an algorithm feature model library.
  • the data model determining device saves the modified parameter when analyzing the feature matrix of the algorithm by using the MART algorithm, and the modified parameter includes each node weight, leaf sample residual, and label relationship in the MART algorithm.
  • step 204 is an optional step, which can be used to subsequently modify the algorithm feature model in the algorithm feature model library established in step 203.
  • steps 201-202 provided in the embodiment of the present invention that is, the method for establishing the algorithm feature matrix, and the method for establishing the algorithm feature model library in step 203 are only illustrated as a possible implementation manner, and it should be clarified that The embodiment of the invention does not limit the method for establishing an algorithm feature matrix and the method for building an algorithm feature model library.
  • the data model determining apparatus may further perform each set of reference data (the benchmark data set in the reference data set).
  • the reference data may be obtained by experimentally collecting data images, and obtaining data feature vectors of each set of reference data, and finally obtaining a data feature matrix composed of data feature vectors of the plurality of sets of reference data, for example, the data feature matrix is :
  • the determining means of the data model marks the data feature vector of each set of reference data with the identifier of the corresponding preferred algorithm, that is, between the data feature vector of each set of reference data and the algorithm of the set of reference data. Correspondence relationship to obtain the algorithm feature matrix.
  • the data feature vector of the first set of reference data is [x 11 , x 12 , . . . , x 1n ], and the preferred algorithm corresponding to the data feature vector in the algorithm library is Q 1 , then the determining device of the data model
  • the data feature vector of the first set of reference data is labeled with the identifier of the corresponding preferred algorithm, and the labeled data feature vector is: [x 11 , x 12 , . . . , x 1n , Q 1 ].
  • the algorithm feature matrix shown below can be established.
  • step 203 the determining means of the data model analyzes the algorithm feature matrix obtained in step 202 by using a MART (Multiple Additive Regression Tree) algorithm to obtain the algorithm feature model library.
  • MART Multiple Additive Regression Tree
  • the MART algorithm may also be referred to as a GBDT (Gradient Boosting Decision Tree) algorithm, which consists of multiple decision trees, and the conclusions of all the trees are added together to make a final answer.
  • GBDT Gradient Boosting Decision Tree
  • the final answer An algorithm feature model in the algorithm feature model library
  • the MART algorithm is used to analyze the feature matrix of the algorithm, and the algorithm and formula of the algorithm feature model library are as follows:
  • the above process of analyzing the feature matrix of the algorithm by MART algorithm can be called model training process.
  • the algorithm feature model library is obtained, and the algorithm model library includes pre-stored The algorithm feature model for each algorithm in the algorithm library.
  • the determining device of the data model may further save the modified parameter when performing model training on the algorithm feature matrix in step 203, where the modified parameter includes each node weight, leaf sample residual and label relationship in the MART algorithm,
  • the correction parameters can be used to improve the accuracy of the T algorithms obtained using the algorithm's feature model library.
  • step i is added after step h of step 203 above,
  • the node weight is used to represent the importance of each node. For example, the value range may be [0, 1]. When the node weight is larger, the node corresponding to the node weight is more important. When the node weights are all 1, it means that all nodes are just as important.
  • the leaf sample residual is used to characterize the degree of difference between the algorithm feature model obtained by the above decision tree and the data model of the data to be analyzed obtained in step 103, and can be used to characterize the accuracy of the model prediction, when the leaf sample is disabled A difference of 0 indicates that the model prediction is the most accurate.
  • the label relationship refers to the correspondence between each set of reference data in the reference data set and the identifier of the corresponding algorithm in the algorithm library.
  • the node weight and the leaf sample residual are calculated when the decision tree is created in steps e and f. Therefore, the values are saved in step i, and when a new sample point is entered,
  • the above label relationship can be updated by the saved node weight and the leaf sample residual, thereby incrementally updating the generated decision tree, that is, the data model determining device can be according to step i
  • the saved correction parameters modify the algorithm feature model in the algorithm feature model library.
  • the determining means of the data model may calculate the accuracy rate according to the data model of the data to be analyzed obtained in step 103 and the algorithm feature model of the T algorithms obtained in step 102 by using a preset calculation rule (the accuracy is used) The accuracy of the ranking result of the T algorithm feature models obtained in step 102 is indicated).
  • the preset calculation rule may be: if the data model is one of the algorithm feature models of the T algorithms in step 103, the accuracy may be considered to be 100%; or the preset calculation rule may be If the data model obtained in step 103 is the algorithm feature model of the T algorithms, and the algorithm feature model with the highest correlation with the data to be analyzed, the accuracy may be considered to be 100%, and if the data model obtained in step 103 is In the algorithm feature model of the T algorithms, the algorithm feature model having the second highest correlation with the data to be analyzed can be considered to have an accuracy of 80%, etc., and those skilled in the art can set the preset calculation rule according to actual experience. This embodiment of the present invention does not limit this.
  • the determining means of the data model performs the algorithm feature model in the algorithm feature model library established in step 203 according to the modified parameter saved in step 204. Corrected. In this way, when the determining device of the subsequent data model performs the steps 101-103 to acquire the data model of the data to be analyzed, the modified model feature library can be used to ensure the accuracy of the data model of the data to be analyzed. .
  • the MART algorithm defined in steps a to i can be referred to as an IMART (Incremental Multiple Additive Regression Tree) algorithm, which can be used as an incremental version of the MART algorithm.
  • IMART Intelligent Multiple Additive Regression Tree
  • the established model is incrementally updated according to the saved correction parameters, thereby improving the accuracy of model prediction by the algorithm feature model library.
  • an embodiment of the present invention provides a method for determining a data model.
  • the data feature vector in the data to be analyzed may be extracted, and the data feature vector may be used to reflect the data feature of the data to be analyzed; and further, the data feature vector is analyzed by using multiple algorithm feature models in the algorithm feature model library to determine an algorithm.
  • the data feature vector is predicted by using the algorithm feature model in the algorithm feature model library, and the T algorithms with the highest correlation with the data to be analyzed can be determined.
  • the algorithm in the algorithm library that is not highly correlated with the data to be analyzed can be filtered out, so that the model training, evaluation and verification of the data to be analyzed are not required for all the algorithms in the algorithm library, thereby reducing the number of models created. , thereby reducing the time spent creating the model and improving the performance of the data model selection.
  • the data model determining apparatus 01 provided by the embodiment of the present invention can be exemplarily divided into the feature extracting unit 11 and the algorithm screening unit 12. And the model selection unit 13.
  • the feature extraction unit 11 is configured to determine a data feature vector of the data to be analyzed according to the received data model determination request, where the data feature vector is used to reflect the data feature of the data to be analyzed.
  • the data feature vector may be used to indicate linear correlation information, attribute information, instance information, and sparsity information of the data to be analyzed;
  • the algorithm screening unit 12 is configured to analyze the data feature vector by using multiple algorithm feature models in the algorithm feature model library to determine T algorithms having the highest correlation with the data to be analyzed in the algorithm library, T ⁇ 1,
  • the plurality of algorithm feature models in the algorithm feature model library are obtained by analyzing a plurality of algorithms in the algorithm library according to a preset reference data set, where the algorithm library includes determining the request for each data model. a collection of algorithms;
  • the processing unit 13 is configured to process the method by using T algorithms in the algorithm library respectively Data to be analyzed to obtain T data models;
  • the model output unit 14 is configured to output a data model with the highest degree of matching with the data among the T data models, to determine a request in response to the data model.
  • the data model determining apparatus 01 may further include a model library establishing unit 15; the model library establishing unit 15 is configured to establish an algorithm feature matrix, and the algorithm feature matrix includes the reference data. Concentrating the data feature vector of each set of reference data and the identifier of the corresponding algorithm of each set of reference data in the algorithm library; analyzing the algorithm feature matrix using an iterative decision tree MART algorithm to obtain the algorithm feature model library Multiple algorithm feature models.
  • model library establishing unit 15 is specifically configured to extract a data feature vector of each set of reference data in the reference data set, where the data feature vector is used to represent linear information, attribute information, and an instance of the reference data. Information and sparsity information; marking the identifier of the corresponding algorithm for the data feature vector of each set of reference data to obtain the algorithm feature matrix.
  • the algorithm screening unit 12 is specifically configured to: load the data feature vector by using an algorithm feature model in the algorithm feature model library, to calculate each algorithm feature model in the algorithm feature model library and the to-be-processed Correlation of the analyzed data; sorting the respective algorithm feature models according to the highest to lowest correlation degree to obtain the T-first algorithm feature models; and determining and the T algorithm features
  • the T algorithms corresponding to the model have the highest correlation with the data to be analyzed.
  • the apparatus further includes a saving unit 16, a calculating unit 17, and a correcting unit 18.
  • the saving unit 16 is configured to save a modified parameter in the process of analyzing the feature matrix of the algorithm by using the MART algorithm, where the modified parameter includes each node weight, a leaf sample residual, and a label relationship in the MART algorithm.
  • the label relationship refers to a correspondence between the reference data set reference data and the identifier of the algorithm in the algorithm library;
  • the calculating unit 17 is configured to: according to the data model of the data to be analyzed and the T algorithms Algorithm feature model, calculating accuracy by preset calculation rules, the quasi The accuracy is used to indicate the accuracy of the ranking result of the T algorithm feature models;
  • the correcting unit 18 is configured to: if the accuracy rate is less than the threshold, use the modified parameter to algorithm features in the algorithm feature model library The model is revised.
  • FIG. 9 is a schematic diagram of a hardware structure of a data model determining apparatus 01 according to an embodiment of the present invention.
  • the data model determining apparatus 01 provided in the embodiment of the present invention may be used to implement the foregoing FIG.
  • FIG. 9 For the convenience of the description, only the parts related to the embodiments of the present invention are shown.
  • the determining device 01 of the data model may be a multi-node cluster, a single-node server device, a mobile device, etc., and the present invention does not impose any limitation on this, and all hardware products that can meet the computing capability requirements are applicable.
  • the determining means 01 of the data model includes a processor 21, a communication interface 22, and a memory 23, and the processor 21, the communication interface 22, and the memory 23 communicate via the bus 24.
  • the above-described feature extraction unit 11, algorithm screening unit 12, model selection unit 13, model library creation unit 15, save unit 15, calculation unit 16, and correction unit 17 can all be called into the memory 23 by the processor 21 shown in FIG.
  • the instruction is implemented.
  • the algorithm feature model library established by the model library establishing unit 15 may be stored in the memory 23, and the correction parameters saved in the saving unit 15 may also be stored in the memory 23.
  • the memory 23 is configured to store a computer execution instruction
  • the processor 21 is connected to the memory 23 via the bus 24, and when the determining device 01 of the data model is running, the processor 21 performs the The computer stored in the memory 23 executes an instruction to cause the determining means 01 of the data model to perform the determination method of the data model as described in FIG. 2 or 5.
  • the processor 21 may extract a data feature vector of the data to be analyzed, the data feature vector is used to reflect the data feature of the data to be analyzed; and further, the processor 21 uses multiple algorithm feature models in the algorithm feature model library.
  • the data feature vector is analyzed to determine T algorithms with the highest degree of correlation with the data to be analyzed in the algorithm library, T ⁇ 1, and the algorithm feature model library can be stored in the memory 23, and the algorithm features multiple models in the library.
  • the algorithm feature model is obtained according to a plurality of algorithms in the preset reference data set analysis algorithm library; further, the processor 21 processes the data to be analyzed using T algorithms to obtain T data models; and through the communication interface 22 outputting, in the T data models, a data model having the highest degree of matching with the data to determine a request in response to the data model.
  • the processor 21 can be a central processing unit (English: central processing unit, abbreviated as: CPU).
  • the processor 21 can also be other general-purpose processors, digital signal processing (DSP), application specific integrated circuit (ASIC), field programmable gate array (English) : field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the processor 21 is a control center of the determining means 01 of the data model, and the processor 21 executes the data model determining means 01 by processing the data received by the communication interface 22 and calling the software or program in the memory 23.
  • the processor 21 executes the data model determining means 01 by processing the data received by the communication interface 22 and calling the software or program in the memory 23.
  • the communication interface 22 can be specifically an interface circuit for receiving and transmitting signals during the process of transmitting and receiving information or requests. After receiving the information sent by the terminal, the communication interface 22 processes the information to the processor 21; in addition, the communication interface 22 can be wireless. Communication communicates with the network and other devices.
  • the memory 23 may include a volatile memory (English: volatile memory), such as a random access memory (English: random-access memory, abbreviation: RAM); the memory 31 may also include a non-volatile memory (English: non -volatile memory), such as read-only memory (English: read-only memory, abbreviation: ROM), flash memory (English: flash memory), hard disk (English: hard disk drive, abbreviation: HDD) or solid state drive (English: Solid-state drive, abbreviated: SSD); the memory 31 may also include a combination of the above types of memories.
  • the processor 21 can The various functional applications and data processing of the determining means 01 of the data model are executed by running a software program stored in the memory 23.
  • the bus 24 can include a data bus, a power bus, a control bus, and a signal status bus. For the sake of clarity in the present embodiment, various buses are illustrated as the bus 24 in FIG.
  • a method for determining a data model provided in the embodiment of the present invention may also be performed by a physical host where one or more virtual machines (VMs) are located.
  • VMs virtual machines
  • Host is a combination of the VMM and a privileged virtual machine running on the VMM. This implementation is often used in cloud computing scenarios.
  • the above-described feature extraction unit 11, algorithm screening unit 12, model selection unit 13, model library creation unit 15, save unit 15, calculation unit 16, and correction unit 17 may be disposed on one or more virtual machines.
  • the above-mentioned feature extraction unit 11 may be implemented by a virtual machine, and other units may be implemented by a virtual machine, or a plurality of units may be implemented by a virtual machine, which is not limited by the embodiment of the present invention.
  • the above-described feature extraction unit 11, algorithm screening unit 12, model selection unit 13, model library establishment unit 15, save unit 15, calculation unit 16, and correction unit 17 may be set in a virtual On the physical host 100 where the machine is located, the physical host 100 performs the determination method of the data model in the above embodiment.
  • the physical host 100 includes a hardware layer, a Host (host) 1001 running on the hardware layer, and at least one virtual machine VM1002 running on the Host 1001, and the hardware layer,
  • the hardware layer includes a network card 1003.
  • the host may include a VMM on the physical host 100 and a privileged virtual machine running on the VMM.
  • the virtual machine 1002 is the physical host. 100 other virtual machines except the privileged virtual machine.
  • the virtual machine 1001 virtual machine software can simulate one or more virtual computers on a physical host, and the virtual machines work like a real computer, and an operating system can be installed on the virtual machine.
  • Application virtual machine You can also access network resources.
  • the virtual machine is like working on a real computer.
  • Hardware layer The hardware platform on which the virtualized environment runs.
  • the hardware layer may include various hardware.
  • a hardware layer of a physical host may include a processor 1004 (eg, a CPU) and a memory 1005, and may also include a network card 1003 (eg, an RDMA network card), a memory, etc., and a high speed/low speed input/output. (I/O, Input/Output) devices, and other devices with specific processing capabilities.
  • Host 1001 used as a management layer to manage and allocate hardware resources; present a virtual hardware platform for virtual machines; implement scheduling and isolation of virtual machines.
  • Host may be a virtual machine monitor (VMM); in addition, sometimes VMM and a privileged virtual machine work together, and the two combine to form a Host.
  • the virtual hardware platform provides various hardware resources for each virtual machine running on it, such as providing a virtual processor (such as a VCPU), virtual memory, a virtual disk, a virtual network card, and the like.
  • the virtual disk may correspond to a file of the Host or a logical block device.
  • the virtual machine runs on the virtual hardware platform that Host prepares for it, and one or more virtual machines run on the Host.
  • Privileged virtual machine A special virtual machine, also known as a driver domain.
  • this special virtual machine is called Dom0 on the Xen Hypervisor platform, and real physics such as network card and SCSI disk is installed in the virtual machine.
  • a device driver that detects and directly accesses these real physical devices.
  • Other virtual machines access the real physical devices through privileged virtual machines using the appropriate mechanisms provided by the hypervisor.
  • the embodiment of the present invention may be applied to the xen virtual machine platform, and may also be applied to a virtualization platform that can be used to map virtual machine memory when any one of the virtual machines is migrated; limit.
  • the method for determining the data model can be referred to the related description in the foregoing embodiment shown in any one of FIG. 2 to FIG. 6, and details are not described herein again.
  • an embodiment of the present invention provides a data model determining apparatus.
  • a request may be determined based on a data model, and a data feature vector in the data to be analyzed may be extracted, where the data feature vector may be used to reflect data characteristics of data to be analyzed.
  • the T algorithms with the highest correlation with the data to be analyzed can be determined, so that the filter can be filtered.
  • the algorithm in the algorithm library that is not highly correlated with the data to be analyzed, so that the model training, evaluation and verification of the data to be analyzed are not required for all the algorithms in the algorithm library, the number of created models can be reduced, thereby shortening the creation
  • the time spent in the model improves the performance of the data model selection.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be another division manner for example, multiple units or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment. of.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'invention concerne un procédé et un appareil de détermination de modèle de données qui se rapportent au domaine technique des ordinateurs, qui peuvent réduire le nombre de modèles créés et qui permettent ainsi de raccourcir le temps passé à créer les modèles, ainsi que d'améliorer l'efficacité de sélection de modèle de données. Le procédé consiste : à extraire un vecteur de caractéristique de données des données à analyser selon une requête de détermination de modèle de données reçue; à utiliser une pluralité de modèles de caractéristiques d'algorithme dans une bibliothèque de modèles de caractéristiques d'algorithme pour analyser le vecteur de caractéristique de données de façon à déterminer T algorithmes, ayant la corrélation la plus élevée avec les données à analyser, dans la bibliothèque d'algorithmes, la pluralité de modèles de caractéristiques d'algorithme dans la bibliothèque de modèles de caractéristiques d'algorithme étant obtenus par analyse d'une pluralité d'algorithmes dans la bibliothèque d'algorithmes selon un ensemble de données de référence préétabli, la bibliothèque d'algorithmes comprenant au moins un algorithme pour traiter les données à analyser; à utiliser respectivement les T algorithmes dans la bibliothèque d'algorithmes pour traiter les données à analyser de façon à obtenir T modèles de données; à sortir un modèle de données, ayant le degré de correspondance le plus élevé avec les données, dans les T modèles de données de façon à répondre à la requête de détermination de modèle de données.
PCT/CN2016/090343 2016-01-18 2016-07-18 Procédé et appareil de détermination de modèle de données WO2017124713A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610031557.0 2016-01-18
CN201610031557.0A CN106980623B (zh) 2016-01-18 2016-01-18 一种数据模型的确定方法及装置

Publications (1)

Publication Number Publication Date
WO2017124713A1 true WO2017124713A1 (fr) 2017-07-27

Family

ID=59341080

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/090343 WO2017124713A1 (fr) 2016-01-18 2016-07-18 Procédé et appareil de détermination de modèle de données

Country Status (2)

Country Link
CN (1) CN106980623B (fr)
WO (1) WO2017124713A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288468A (zh) * 2019-04-19 2019-09-27 平安科技(深圳)有限公司 数据特征挖掘方法、装置、电子设备及存储介质
CN111401671A (zh) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 一种精准营销中衍生特征计算方法、装置和可读存储介质
CN112100557A (zh) * 2020-09-01 2020-12-18 上海交通大学 基于内容发布订阅的组合匹配系统与方法
CN114358649A (zh) * 2022-01-17 2022-04-15 安徽君鲲科技有限公司 一种海事现场监管方法及系统

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451266A (zh) * 2017-07-31 2017-12-08 北京京东尚科信息技术有限公司 用于处理数据方法及其设备
CN107807956A (zh) * 2017-09-30 2018-03-16 平安科技(深圳)有限公司 电子装置、数据处理方法及计算机可读存储介质
CN107870810B (zh) * 2017-10-31 2020-05-12 Oppo广东移动通信有限公司 应用清理方法、装置、存储介质及电子设备
US11257002B2 (en) * 2017-11-22 2022-02-22 Amazon Technologies, Inc. Dynamic accuracy-based deployment and monitoring of machine learning models in provider networks
CN108121780B (zh) * 2017-12-15 2021-10-08 中盈优创资讯科技有限公司 数据分析模型确定方法及装置
CN113159145B (zh) * 2018-04-28 2024-10-22 华为技术有限公司 一种特征工程编排方法及装置
US10965611B2 (en) 2019-01-10 2021-03-30 International Business Machines Corporation Scheduler utilizing normalized leaves of a weighted tree
CN110210558B (zh) * 2019-05-31 2021-10-26 北京市商汤科技开发有限公司 评估神经网络性能的方法及装置
CN111159268B (zh) * 2019-12-19 2022-01-04 武汉达梦数据库股份有限公司 一种ETL流程在Spark集群中运行的方法和装置
CN111444159B (zh) * 2020-03-03 2024-05-03 中国平安人寿保险股份有限公司 精算数据处理方法、装置、电子设备及存储介质
CN111708818B (zh) * 2020-05-28 2023-06-16 北京赛博云睿智能科技有限公司 一种智能计算方法
TWI768554B (zh) * 2020-11-23 2022-06-21 宏碁股份有限公司 計算系統及其效能調整方法
CN113064904B (zh) * 2021-04-29 2022-04-08 济南慧天云海信息技术有限公司 一种基于数据自学习的画像构建方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488656A (zh) * 2012-06-14 2014-01-01 深圳市世纪光速信息技术有限公司 一种数据处理方法及装置
CN104391860A (zh) * 2014-10-22 2015-03-04 安一恒通(北京)科技有限公司 内容类别检测方法及装置
CN104751463A (zh) * 2015-03-31 2015-07-01 梁爽 一种基于草图轮廓特征的三维模型最佳视角选取方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8180891B1 (en) * 2008-11-26 2012-05-15 Free Stream Media Corp. Discovery, access control, and communication with networked services from within a security sandbox
US8756362B1 (en) * 2010-02-16 2014-06-17 Marvell Israel (M.I.S.L.) Methods and systems for determining a cache address
CN103942604B (zh) * 2013-01-18 2017-07-07 上海安迪泰信息技术有限公司 基于森林区分度模型的预测方法及系统
CN104598741B (zh) * 2015-01-26 2017-10-17 上海交通大学 一种车道饱和度预测模型的构建方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488656A (zh) * 2012-06-14 2014-01-01 深圳市世纪光速信息技术有限公司 一种数据处理方法及装置
CN104391860A (zh) * 2014-10-22 2015-03-04 安一恒通(北京)科技有限公司 内容类别检测方法及装置
CN104751463A (zh) * 2015-03-31 2015-07-01 梁爽 一种基于草图轮廓特征的三维模型最佳视角选取方法

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401671A (zh) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 一种精准营销中衍生特征计算方法、装置和可读存储介质
CN111401671B (zh) * 2019-01-02 2023-11-21 中国移动通信有限公司研究院 一种精准营销中衍生特征计算方法、装置和可读存储介质
CN110288468A (zh) * 2019-04-19 2019-09-27 平安科技(深圳)有限公司 数据特征挖掘方法、装置、电子设备及存储介质
CN110288468B (zh) * 2019-04-19 2023-06-06 平安科技(深圳)有限公司 数据特征挖掘方法、装置、电子设备及存储介质
CN112100557A (zh) * 2020-09-01 2020-12-18 上海交通大学 基于内容发布订阅的组合匹配系统与方法
CN112100557B (zh) * 2020-09-01 2022-11-29 上海交通大学 基于内容发布订阅的组合匹配系统与方法
CN114358649A (zh) * 2022-01-17 2022-04-15 安徽君鲲科技有限公司 一种海事现场监管方法及系统
CN114358649B (zh) * 2022-01-17 2022-09-13 安徽君鲲科技有限公司 一种海事现场监管方法及系统

Also Published As

Publication number Publication date
CN106980623B (zh) 2020-02-21
CN106980623A (zh) 2017-07-25

Similar Documents

Publication Publication Date Title
WO2017124713A1 (fr) Procédé et appareil de détermination de modèle de données
US20230126005A1 (en) Consistent filtering of machine learning data
US11182691B1 (en) Category-based sampling of machine learning data
TWI620075B (zh) 用於雲端巨量資料運算架構之伺服器及其雲端運算資源最佳化方法
US10366053B1 (en) Consistent randomized record-level splitting of machine learning data
US11100420B2 (en) Input processing for machine learning
US9811527B1 (en) Methods and apparatus for database migration
US10983873B1 (en) Prioritizing electronic backup
JP7450741B2 (ja) リチウム電池のsoc推定方法、装置及びコンピュータ読み取り可能な記憶媒体
WO2019161645A1 (fr) Procédé d'extraction de données basé sur shell, terminal, dispositif et support de stockage
AU2017327824B2 (en) Data integration job conversion
US11086726B2 (en) User-based recovery point objectives for disaster recovery
US12079472B2 (en) Data reduction method, apparatus, computing device, and storage medium for forming index information based on fingerprints
WO2020151320A1 (fr) Procédé de mémoire de données, appareil, dispositif informatique et support d'informations
WO2023131121A1 (fr) Procédé de simulation parallèle d'automatisation de circuit intégré et dispositif de simulation
AU2021244852B2 (en) Offloading statistics collection
US11237740B2 (en) Automatically determining sizing configurations for storage components using machine learning techniques
CN114968612A (zh) 一种数据处理方法、系统及相关设备
US20210181945A1 (en) User-based recovery point objectives for disaster recovery
US8667008B2 (en) Search request control apparatus and search request control method
US20100070458A1 (en) Rule creation method and rule creating apparatus
US20170046344A1 (en) Method for Performing In-Database Distributed Advanced Predictive Analytics Modeling via Common Queries
US20240020320A1 (en) Automated database operation classification using artificial intelligence techniques
US11216413B1 (en) Processing platform configured for data set management utilizing metadata-based data set operational signatures
US20230315530A1 (en) Preserving and modifying virtual machine settings in replication-based virtual machine migration

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16885973

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16885973

Country of ref document: EP

Kind code of ref document: A1