WO2019061667A1 - 电子装置、数据处理方法、系统及计算机可读存储介质 - Google Patents

电子装置、数据处理方法、系统及计算机可读存储介质 Download PDF

Info

Publication number
WO2019061667A1
WO2019061667A1 PCT/CN2017/108799 CN2017108799W WO2019061667A1 WO 2019061667 A1 WO2019061667 A1 WO 2019061667A1 CN 2017108799 W CN2017108799 W CN 2017108799W WO 2019061667 A1 WO2019061667 A1 WO 2019061667A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
processing
model
machine algorithm
preset
Prior art date
Application number
PCT/CN2017/108799
Other languages
English (en)
French (fr)
Inventor
吴振宇
刘睿恺
王建明
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019061667A1 publication Critical patent/WO2019061667A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Definitions

  • the present application relates to the field of communications technologies, and in particular, to an electronic device, a data processing method, a system, and a computer readable storage medium.
  • ETL Extract-Transform-Load
  • ETL Extract-Transform-Load
  • Users extract the required data from the data source, clean it through data, and finally load the data according to the defined data warehouse model. Go in the data warehouse.
  • technicians need to put a lot of effort into the data ETL operation step by step, and then carry out modeling analysis step by step on the sorted data, including selecting parameters, modeling models and Adjusting the specific model structure, this operation method is time consuming and laborious, and the data processing efficiency is low.
  • the purpose of the present application is to provide an electronic device, a data processing method, a system, and a computer readable storage medium, which are intended to simplify user operations in data sorting analysis and modeling processes, and improve data processing efficiency.
  • the present application provides an electronic device including a memory and a processor coupled to the memory, the memory storing a data processing system operable on the processor, The data processing system is implemented by the processor to implement the following steps:
  • the present application further provides a data processing method, where the data processing method includes:
  • the present application further provides a data processing system, where the data processing system includes:
  • a processing module configured to perform type conversion processing on the acquired data based on a preset data type after acquiring data of the data source end, and perform abnormal processing and null value processing on the converted data;
  • a first storage module configured to store data processed by the final processing stage as data to be modeled into a preset delivery path ETL Pipeline after completing data processing in all processing stages;
  • the modeling module is configured to acquire a preset plurality of machine algorithm models and preset model parameter ranges corresponding to the respective machine algorithm models, and select a machine algorithm model and a model parameter corresponding to the machine algorithm model based on the grid search grid search Model the data to be modeled.
  • the application further provides a computer readable storage medium having a data processing system stored thereon, the data processing system being implemented by a processor to implement the steps:
  • the present application performs type conversion, exception processing, and null value processing on the data by the user's preset, and finally obtains data to be modeled from the delivery path ETL Pipeline, and selects a machine based on the grid search grid search.
  • the algorithm model and the model parameters corresponding to the machine algorithm model complete the modeling. Due to the user's preset setting, the present application can realize the process of analyzing and modeling the entire data in a one-click process when performing data sorting and data modeling. Simplify user operations and improve data processing efficiency.
  • FIG. 1 is a schematic diagram of an optional application environment of each embodiment of the present application.
  • FIG. 2 is a schematic flowchart diagram of an embodiment of a data processing method according to the present application.
  • first, second and the like in the present application are for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. .
  • features defining “first” and “second” may include at least one of the features, either explicitly or implicitly.
  • the technical solutions between the various embodiments may be combined with each other, but must be based on the realization of those skilled in the art, and when the combination of the technical solutions is contradictory or impossible to implement, it should be considered that the combination of the technical solutions does not exist. Nor is it within the scope of protection required by this application.
  • FIG. 1 it is a schematic diagram of an application environment of a preferred embodiment of the data processing method of the present application.
  • the application environment diagram includes an electronic device 1 and a data source terminal 2.
  • the electronic device 1 performs data interaction with the data source terminal 2, and the data source terminal 2 may have one or more.
  • the electronic device 1 is an apparatus capable of automatically performing numerical calculation and/or information processing in accordance with an instruction set or stored in advance.
  • the electronic device 1 may be a computer, a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, where cloud computing is a type of distributed computing.
  • a super virtual computer consisting of a group of loosely coupled computers.
  • the electronic device 1 may include, but is not limited to, a memory 11 communicably connected to each other through a system bus, a processor 12, and a network interface 13, and the memory 11 stores a data processing system operable on the processor 12.
  • FIG. 1 only shows the electronic device 1 having the components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
  • the storage device 11 includes a memory and at least one type of readable storage medium.
  • the memory provides a cache for the operation of the electronic device 1;
  • the readable storage medium may be, for example, a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), a random access memory (RAM), a static random access memory (SRAM).
  • a non-volatile storage medium such as a read only memory (ROM), an electrically erasable programmable read only memory (EEPROM), a programmable read only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, or the like.
  • the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1; in other embodiments, the non-volatile storage medium may also be external to the electronic device 1.
  • a storage device such as a plug-in hard disk equipped with an electronic device 1, a smart memory card (SMC), a Secure Digital (SD) card, a flash card, or the like.
  • the readable storage medium of the storage device 11 is generally used to store an operating system installed in the electronic device 1 and various types of application software, such as program codes of the data processing system in an embodiment of the present application. Further, the storage device 11 can also be used to temporarily store various types of data that have been output or are to be output.
  • the processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 12 is generally used to control the overall operation of the electronic device 1, such as executing with the data source 2 Perform data interaction or communication related control and processing.
  • the processor 12 is configured to run program code or process data stored in the memory 11, such as running a data processing system or the like.
  • the network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the electronic device 1 and other electronic devices.
  • the network interface 13 is mainly used to connect the electronic device 1 with one or more data source terminals 2, and establish a data transmission channel and a communication connection between the electronic device 1 and one or more data source terminals 2.
  • the data processing system is stored in the memory 11 and includes at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement the methods of various embodiments of the present application; And, the at least one computer readable instruction can be classified into different logic modules according to functions implemented by the various parts thereof.
  • the embodiment includes a processing module, a first storage module, and a modeling module.
  • Step S1 After obtaining data of the data source end, performing type conversion processing on the acquired data based on the preset data type, and performing abnormal processing and null value processing on the converted data;
  • data may be acquired from one or more data sources based on instructions issued by the user, and the data sources may be different networks, different operating platforms, different databases and data formats, different applications, and the like.
  • the typed conversion processing is then performed on the acquired data.
  • the preset data types include integer type, floating point type, and string type.
  • the user can pre-set the type of data to be converted in the acquired data, for example, some data in the acquired data needs to be converted into an integer type, and for another part of the data needs to be converted into a floating point type, the user pre-sets so that After the data is obtained from the data source, the type conversion process is directly performed according to the user's settings, and the data is converted to facilitate subsequent unified processing.
  • the abnormal processing of the converted data includes: processing noise points in the converted data or garbled characters in the data.
  • the noise data or garbled characters can be automatically cleared by analyzing the distribution of the data. For a large amount of data, the data after abnormal processing is data for clearing noise, the data is more concise, the quality of the data is improved, and the subsequent processing is facilitated.
  • Performing null processing on the abnormally processed data includes: capturing a null value field, and in order to ensure the robustness of the data after the final processing, preferably, the filled null field is the average value, the median, and the highest frequency. Or the value set by the user, etc.
  • the filled null field is the average value, the median, and the highest frequency. Or the value set by the user, etc.
  • the data processed by this kind of null value not only the integrity of the data is ensured, but also the quality of the data is guaranteed.
  • Step S2 after completing the data processing in all the processing stages, storing the data processed in the final processing stage as data to be modeled into the preset delivery path ETL Pipeline;
  • the user after performing the type conversion processing on the acquired data based on the preset data type, the user does not need to perform data type conversion according to the needs in each step of the processing operation; and abnormally processing the converted data. After that, for a large amount of data, the data is more concise and the quality of the data is higher; after the null value processing, the data is further improved while ensuring the integrity of the data. the quality of. After the data completion type conversion processing, the exception processing, and the null value processing, the data format normalization processing, the split processing, the verification of the correctness processing, the data replacement processing, and the like can be further performed, and after the data processing is completed, the final processing is obtained. Stage data.
  • the data processed in the final processing stage is stored as data to be modeled in the ETL Pipeline, which is preset by the user, and the ETL Pipeline is used as the storage location of the data processed in the final processing stage.
  • the data can be quickly acquired through the channel, and the data ETL process and the data modeling process can be seamlessly combined.
  • Step S3 acquiring a preset plurality of machine algorithm models and preset model parameter ranges corresponding to the respective machine algorithm models, and selecting a machine algorithm model and a model parameter corresponding to the machine algorithm model based on the grid search grid search to treat The modeled data is modeled.
  • the preset plurality of machine algorithm models include a logistic regression model, a decision tree model, and a random forest model, and each machine algorithm model has a corresponding model parameter range.
  • the user can preset the model algorithm model and the model parameter range corresponding to the machine algorithm model for selection and use. For example, the user can add a certain machine algorithm model and a model parameter range corresponding to the machine algorithm model.
  • the grid search method is used to select the machine algorithm model and the model parameters corresponding to the machine algorithm model, and the optimal machine algorithm model and corresponding model parameters for modeling can be quickly determined. Specifically, each model parameter in each machine algorithm model and the model parameter range corresponding to the machine algorithm model is trained, and an optimal machine algorithm model and corresponding model parameters are selected according to the training result.
  • the embodiment performs type conversion, exception processing, and null value processing on the data by the user's preset setting, and finally obtains data to be modeled from the delivery path ETL Pipeline, and selects based on the grid search grid search.
  • the machine algorithm model and the model parameters corresponding to the machine algorithm model complete the modeling.
  • the data collation and data modeling can realize the one-click completion of the entire data collation analysis and modeling.
  • the process does not need to be processed step by step, simplifying the user's operation and improving data processing efficiency.
  • step S2 before the data processing system is executed by the processor, step S2, the following steps are further implemented: before the final processing stage, and after each completion.
  • the data processed by each processing stage is stored in a preset corresponding delivery path ETL Pipeline, or the processed data processed in the selected processing stage is stored to a preset correspondence based on the user's setting.
  • the pipeline is in the ETL Pipeline.
  • the data obtained after processing in different processing stages may also be stored in the corresponding delivery path ETL Pipeline preset by the user, or selectively set by the user in advance.
  • the data obtained after processing in some processing stages is stored in a preset corresponding delivery path ETL Pipeline, for example, the converted processed data is stored to correspond
  • the pipeline is in the ETL Pipeline.
  • the data is stored in the corresponding delivery path ETL Pipeline, and the subsequent processing stage can conveniently acquire data and automatically realize the connection of the internal data flow, thereby completing the process of the data ETL efficiently.
  • the step S3 includes:
  • the machine algorithm model with the highest accuracy and the corresponding model parameters are selected to model the data to be modeled.
  • each machine algorithm model and a corresponding machine algorithm model constructed by each model parameter in the model parameter range corresponding to the machine algorithm model are trained, and then the accuracy of the trained machine algorithm model is verified.
  • the respective accuracy rates are compared, and the machine algorithm model with the highest accuracy and the corresponding model parameters are selected, for example, accurate.
  • the rate is 0.98, 095, 0.94, 0.99, and the machine algorithm model with the accuracy of 0.99 and the corresponding model parameters are selected, so that the modeled data can be modeled.
  • a machine algorithm model with an accuracy rate greater than or equal to a predetermined accuracy threshold and corresponding model parameters may be selected, for example, a predetermined accuracy threshold of 0.98, and a machine algorithm model with an accuracy of 0.98 and 0.99 and corresponding Model parameters can be used for subsequent modeling operations.
  • FIG. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application, where the data processing method includes the following steps:
  • Step S1 After obtaining data of the data source end, performing type conversion processing on the acquired data based on the preset data type, and performing abnormal processing and null value processing on the converted data;
  • data may be obtained from one or more data sources, which may be different networks, different operating platforms, different databases and data formats, different applications, and the like.
  • the typed conversion processing is then performed on the acquired data.
  • the preset data types include integer type, floating point type, and string type.
  • the user can pre-set the type of data to be converted in the acquired data, for example, some data in the acquired data needs to be converted into an integer type, and for another part of the data needs to be converted into a floating point type, the user pre-sets so that After the data is obtained from the data source, the type conversion process is directly performed according to the user's settings, and the data is converted to facilitate subsequent unified processing.
  • the abnormal processing of the converted data includes: processing noise points in the data or garbled characters in the data.
  • the noise data or garbled characters can be automatically cleared by analyzing the distribution of the data. For a large amount of data, the data after abnormal processing is data for clearing noise, the data is more concise, the quality of the data is improved, and the subsequent processing is facilitated.
  • Performing null processing on the abnormally processed data includes: capturing a null value field, and in order to ensure the robustness of the data after the final processing, preferably, the filled null field is the average value, the median, and the highest frequency. Or the value set by the user, etc.
  • the filled null field is the average value, the median, and the highest frequency. Or the value set by the user, etc.
  • the data processed by this kind of null value not only the integrity of the data is ensured, but also the quality of the data is guaranteed.
  • the user after performing the type conversion processing on the acquired data based on the preset data type, the user does not need to perform data type conversion according to the needs in each step of the processing operation; and abnormally processing the converted data. After that, for a large amount of data, the data is more concise and the quality of the data is higher; after the null value processing, the data quality is further improved while ensuring the integrity of the data.
  • the data completion type conversion processing, the exception processing, and the null value processing, the data format normalization processing, the split processing, the verification of the correctness processing, the data replacement processing, and the like can be further performed, and after the data processing is completed, the final processing is obtained. Stage data.
  • the data processed in the final processing stage is stored as data to be modeled in the ETL Pipeline, which is preset by the user, and the ETL Pipeline is used as the storage location of the data processed in the final processing stage.
  • the data can be quickly acquired through the channel, and the data ETL process and the data modeling process can be seamlessly combined.
  • the preset plurality of machine algorithm models include a logistic regression model, a decision tree model, and a random forest model, and each machine algorithm model has a corresponding model parameter range.
  • the user can preset the model algorithm model and the model parameter range corresponding to the machine algorithm model for selection and use. For example, the user can add a certain machine algorithm model and a model parameter range corresponding to the machine algorithm model.
  • the grid search method is used to select the machine algorithm model and the model parameters corresponding to the machine algorithm model, and the optimal machine algorithm model and corresponding model parameters for modeling can be quickly determined. Specifically, each model parameter in each machine algorithm model and the model parameter range corresponding to the machine algorithm model is trained, and an optimal machine algorithm model and corresponding model parameters are selected according to the training result.
  • the embodiment performs type conversion, exception processing, and null value processing on the data by the user's preset setting, and finally obtains data to be modeled from the delivery path ETL Pipeline, and selects based on the grid search grid search.
  • the machine algorithm model and the model parameters corresponding to the machine algorithm model complete the modeling.
  • the data collation and data modeling can realize the one-click completion of the entire data collation analysis and modeling. Process, simplify user operations and improve data processing efficiency.
  • the method before step S2, further includes the following steps: before the final processing stage, and after completing the data processing of each processing stage, each processing is performed.
  • the phase processed data is stored in a preset corresponding delivery path ETLPipeline, or the selected processing stage processed data is stored in a preset corresponding delivery path ETL Pipeline based on the user's settings.
  • the data obtained after processing in different processing stages may also be stored in the corresponding delivery path ETL Pipeline preset by the user, or selectively set by the user in advance.
  • the data obtained after processing in some processing stages is stored in a preset corresponding delivery path ETL Pipeline, for example, the converted processed data is stored in the corresponding delivery path ETL Pipeline.
  • the subsequent processing stage can conveniently acquire data and automatically realize the connection of the internal data flow, thereby completing the process of the data ETL efficiently.
  • the step S3 includes:
  • the machine algorithm model with the highest accuracy and the corresponding model parameters are selected to model the data to be modeled.
  • each machine algorithm model and a corresponding machine algorithm model constructed by each model parameter in the model parameter range corresponding to the machine algorithm model are trained, and then the accuracy of the trained machine algorithm model is verified.
  • the respective accuracy rates are compared, and the machine algorithm model with the highest accuracy and the corresponding model parameters are selected, for example, accurate.
  • the rate is 0.98, 095, 0.94, 0.99, and the machine algorithm model with the accuracy of 0.99 and the corresponding model parameters are selected, so that the modeled data can be modeled.
  • a machine algorithm model with an accuracy rate greater than or equal to a predetermined accuracy threshold and corresponding model parameters may be selected, for example, a predetermined accuracy threshold of 0.98, and a machine algorithm model with an accuracy of 0.98 and 0.99 and corresponding Model parameters can be used for subsequent modeling operations.
  • the present application also provides a computer readable storage medium having stored thereon a data processing system that, when executed by a processor, implements the steps of the data processing method described above.
  • the technical solution of the present application may be in the form of a software product in essence or in part contributing to the prior art.
  • the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD), and includes a plurality of instructions for making a data source (which may be a mobile phone, a computer, a server, an air conditioner, or a network).
  • a data source which may be a mobile phone, a computer, a server, an air conditioner, or a network.
  • the device or the like performs the methods described in the various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种电子装置、数据处理方法、系统及存储介质,电子装置包括存储器及处理器,存储器中存储有数据处理系统,数据处理系统被处理器执行时实现:在获取数据源端的数据后,基于预设的数据类型对所获取的数据进行类型的转换处理,以及对转换处理后的数据进行异常处理及空值处理(S1);在完成所有处理阶段的数据处理后,将最终处理阶段处理后的数据作为待建模的数据存储至预设的传递途径ETL Pipeline中(S2);获取预设的多个机器算法模型及与各个机器算法模型对应的预设的模型参数范围,基于网格搜索grid search选取机器算法模型及与该机器算法模型对应的模型参数,以对待建模的数据进行建模(S3)。该方法能够简化数据整理分析和建模过程中用户的操作,提高数据处理效率。

Description

电子装置、数据处理方法、系统及计算机可读存储介质
优先权申明
本申请基于巴黎公约申明享有2017年09月30日递交的申请号为CN201710914863.3、名称为“电子装置、数据处理方法及计算机可读存储介质”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及一种电子装置、数据处理方法、系统及计算机可读存储介质。
背景技术
ETL(Extract-Transform-Load,提取-转换-装载)是构建数据仓库的重要一环,用户从数据源抽取出所需的数据,经过数据清洗,最终按照定义的数据仓库模型,将数据加载到数据仓库中去。目前,在对数据整理和对数据建模的过程中,需要技术人员投入大量精力一步步对数据ETL操作,然后在整理好的数据上一步步进行建模分析,包括选择参数、建模模型及调整具体模型结构,这种操作方式费时费力,数据处理效率低。
发明内容
本申请的目的在于提供一种电子装置、数据处理方法、系统及计算机可读存储介质,旨在简化数据整理分析和建模过程中用户的操作,提高数据处理效率。
为实现上述目的,本申请提供一种电子装置,所述电子装置包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的数据处理系统,所述数据处理系统被所述处理器执行时实现如下步骤:
S1,在获取数据源端的数据后,基于预设的数据类型对所获取的数据进行类型的转换处理,以及对转换处理后的数据进行异常处理及空值处理;
S2,在完成所有处理阶段的数据处理后,将最终处理阶段处理后的数据作为待建模的数据存储至预设的传递途径ETL Pipeline中;
S3,获取预设的多个机器算法模型及与各个机器算法模型对应的预设的模型参数范围,基于网格搜索grid search选取机器算法模型及与该机器算法模型对应的模型参数,以对待建模的数据进行建模。
为实现上述目的,本申请还提供一种数据处理方法,所述数据处理方法包括:
S1,在获取数据源端的数据后,基于预设的数据类型对所获取的数据进行类型的转换处理,以及对转换处理后的数据进行异常处理及空值处理;
S2,在完成所有处理阶段的数据处理后,将最终处理阶段处理后的数据作为待建模的数据存储至预设的传递途径ETL Pipeline中;
S3,获取预设的多个机器算法模型及与各个机器算法模型对应的预设的模型参数范围,基于网格搜索grid search选取机器算法模型及与该机器算法模型对应的模型参数,以对待建模的数据进行建模。
为实现上述目的,本申请还提供一种数据处理系统,所述数据处理系统包括:
处理模块,用于在获取数据源端的数据后,基于预设的数据类型对所获取的数据进行类型的转换处理,以及对转换处理后的数据进行异常处理及空值处理;
第一存储模块,用于在完成所有处理阶段的数据处理后,将最终处理阶段处理后的数据作为待建模的数据存储至预设的传递途径ETL Pipeline中;
建模模块,用于获取预设的多个机器算法模型及与各个机器算法模型对应的预设的模型参数范围,基于网格搜索grid search选取机器算法模型及与该机器算法模型对应的模型参数,以对待建模的数据进行建模。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有数据处理系统,所述数据处理系统被处理器执行时实现步骤:
S1,在获取数据源端的数据后,基于预设的数据类型对所获取的数据进行类型的转换处理,以及对转换处理后的数据进行异常处理及空值处理;
S2,在完成所有处理阶段的数据处理后,将最终处理阶段处理后的数据作为待建模的数据存储至预设的传递途径ETL Pipeline中;
S3,获取预设的多个机器算法模型及与各个机器算法模型对应的预设的模型参数范围,基于网格搜索grid search选取机器算法模型及与该机器算法模型对应的模型参数,以对待建模的数据进行建模。
本申请的有益效果是:本申请通过用户的预先设置,对数据进行类型转换、异常处理及空值处理,最后从传递途径ETL Pipeline中获取待建模的数据,基于网格搜索grid search选取机器算法模型及该机器算法模型对应的模型参数,完成建模,本申请由于用户的预先设置,在进行数据整理和对数据建模时能够实现一键性完成整个数据整理分析和建模的过程,简化用户的操作,提高数据处理效率。
附图说明
图1为本申请各个实施例一可选的应用环境示意图;
图2为本申请数据处理方法一实施例的流程示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施 例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
参阅图1所示,是本申请数据处理方法的较佳实施例的应用环境示意图。该应用环境示意图包括电子装置1及数据源端2。电子装置1与数据源端2进行数据交互,数据源端2可以有一个或多个。
所述电子装置1是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。所述电子装置1可以是计算机、也可以是单个网络服务器、多个网络服务器组成的服务器组或者基于云计算的由大量主机或者网络服务器构成的云,其中云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。
在本实施例中,电子装置1可包括,但不仅限于,可通过系统总线相互通信连接的存储器11、处理器12、网络接口13,存储器11存储有可在处理器12上运行的数据处理系统。需要指出的是,图1仅示出了具有组件11-13的电子装置1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
其中,存储设备11包括内存及至少一种类型的可读存储介质。内存为电子装置1的运行提供缓存;可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等的非易失性存储介质。在一些实施例中,可读存储介质可以是电子装置1的内部存储单元,例如该电子装置1的硬盘;在另一些实施例中,该非易失性存储介质也可以是电子装置1的外部存储设备,例如电子装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。本实施例中,存储设备11的可读存储介质通常用于存储安装于电子装置1的操作系统和各类应用软件,例如本申请一实施例中的数据处理系统的程序代码等。此外,存储设备11还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器12在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述电子装置1的总体操作,例如执行与所述数据源端2 进行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据,例如运行数据处理系统等。
所述网络接口13可包括无线网络接口或有线网络接口,该网络接口13通常用于在所述电子装置1与其他电子设备之间建立通信连接。本实施例中,网络接口13主要用于将电子装置1与一个或多个数据源端2相连,在电子装置1与一个或多个数据源端2之间建立数据传输通道和通信连接。
所述数据处理系统存储在存储器11中,包括至少一个存储在存储器11中的计算机可读指令,该至少一个计算机可读指令可被处理器器12执行,以实现本申请各实施例的方法;以及,该至少一个计算机可读指令依据其各部分所实现的功能不同,可被划为不同的逻辑模块,本实施例包括处理模块、第一存储模块及建模模块。
在一实施例中,上述数据处理系统被所述处理器12执行时实现如下步骤:
步骤S1,在获取数据源端的数据后,基于预设的数据类型对所获取的数据进行类型的转换处理,以及对转换处理后的数据进行异常处理及空值处理;
本实施例中,基于用户发出的指令,可以从一个或多个数据源端获取数据,该数据源可以是不同的网络、不同的操作平台、不同的数据库及数据格式、不同的应用等等。然后对所获取的数据进行类型的转换处理。其中,预设的数据类型包括整数类型、浮点数类型、字符串类型。用户可以预先设置所获取的数据中需要转换的数据类型,例如对于所获取的数据中某部分数据需要转换为整数类型,对于另一部分的数据需要转换为浮点数类型,则用户预先进行设置,以便在从数据源端获取到数据后直接按照用户的设置进行类型的转换处理,数据进行类型转换后便于后续进行相应的统一处理。
其中,对转换处理后的数据进行异常处理包括:处理转换处理后的数据中的噪音点或者数据中的乱码,在一实施例中,可以通过分析数据的分布情况自动清除噪音数据或者乱码。对于海量的数据而言,经过异常处理后的数据为清除噪音的数据,数据更简洁,提高数据的质量,方便后续处理。
对于异常处理后的数据进行空值处理包括:捕获空值字段,为了保证数据在最终处理后的鲁棒性,优选地,填充的空值字段为平均值、中位数、出现频率最高的值或用户设置的值等。对于进行该种空值处理后的数据不仅保证数据的完整性,且保证数据的质量。
步骤S2,在完成所有处理阶段的数据处理后,将最终处理阶段处理后的数据作为待建模的数据存储至预设的传递途径ETL Pipeline中;
本实施例中,基于预设的数据类型对所获取的数据进行类型的转换处理后,不需要用户在每步的处理操作中根据需要进行数据类型的转换;对转换处理后的数据进行异常处理后,对于海量的数据而言,数据更简洁,数据的质量更高;在进行空值处理后,在保证数据的完整性的同时进一步提高数据 的质量。在数据完成类型转换处理、异常处理及空值处理后,还可进一步对数据格式规范化处理、拆分处理、验证其正确性的处理、数据替换处理等等,在数据处理完成后,得到最终处理阶段的数据。
本实施例中,将最终处理阶段处理后的数据作为待建模的数据存储至用户预先设置的传递途径ETL Pipeline中,传递途径ETL Pipeline作为最终处理阶段处理后的数据的存储位置,在进行建模时,通过该渠道能够快速获取到进行建模的数据,将数据ETL过程和数据建模过程进行无缝结合。
步骤S3,获取预设的多个机器算法模型及与各个机器算法模型对应的预设的模型参数范围,基于网格搜索grid search选取机器算法模型及与该机器算法模型对应的模型参数,以对待建模的数据进行建模。
本实施例中,预设的多个机器算法模型包括逻辑回归模型、决策树模型及随机森林模型等,每一机器算法模型具有对应的模型参数范围。用户可以预先设置机器算法模型及机器算法模型对应的模型参数范围供选择、使用,例如用户可以增加某一机器算法模型及该机器算法模型对应的模型参数范围。
由于机器算法模型具有多个,且每个机器算法模型有对应的模型参数范围,因此需要在机器算法模型对应的模型参数范围中确定该机器算法模型对应的模型参数,以便最终确定用于建模的机器算法模型。
本实施例采用网格搜索grid search的方法来选取机器算法模型及该机器算法模型对应的模型参数,能够快速确定用于建模的最优的机器算法模型及对应的模型参数。具体地,对于每一机器算法模型及该机器算法模型对应的模型参数范围中的每一模型参数进行训练,并根据训练结果选择最优的机器算法模型及对应的模型参数。
与现有技术相比,本实施例通过用户的预先设置,对数据进行类型转换、异常处理及空值处理,最后从传递途径ETL Pipeline中获取待建模的数据,基于网格搜索grid search选取机器算法模型及该机器算法模型对应的模型参数,完成建模,本实施例由于用户的预先设置,在进行数据整理和对数据建模时能够实现一键性完成整个数据整理分析和建模的过程,不需要一步步进行处理,简化用户的操作,提高数据处理效率。
在一优选的实施例中,在上述图1的实施例的基础上,所述数据处理系统被所述处理器执行步骤S2之前,还实现如下步骤:在最终处理阶段之前,且在完成每一处理阶段的数据处理后,将各个处理阶段处理后的数据存储至预设的对应的传递途径ETL Pipeline中,或者,基于用户的设置将选定的处理阶段处理后的数据存储至预设的对应的传递途径ETL Pipeline中。
本实施例中,在最终处理阶段之前,对于不同处理阶段处理后得到的数据,也可将其存储至用户预设的对应的传递途径ETL Pipeline中,或者通过用户预先的设置,选择性地将其中某些处理阶段处理后得到的数据存储至预设的对应的传递途径ETL Pipeline中,例如将转换处理后的数据存储至对应 的传递途径ETL Pipeline中。通过用户预先设置,将数据存储至对应的传递途径ETL Pipeline中,后续处理阶段可以方便地获取数据,自动实现内部数据流的衔接,从而高效完成数据ETL的过程。
在一优选的实施例中,在上述图1的实施例的基础上,所述步骤S3包括:
对于每一机器算法模型及该机器算法模型对应的模型参数范围中的每一模型参数构建的对应的机器算法模型进行训练;
对训练后的机器算法模型的准确率进行验证;
选取准确率最高的机器算法模型及对应的模型参数,以对待建模的数据进行建模。
本实施例中,对于每一机器算法模型及该机器算法模型对应的模型参数范围中的每一模型参数构建的对应的机器算法模型进行训练,然后对训练后的机器算法模型的准确率进行验证,直至将所有的机器算法模型对应的模型参数构建的机器算法模型全部进行训练及准确率的验证之后,将各个准确率进行比较,选取准确率最高的机器算法模型及对应的模型参数,例如准确率为0.98、095、0.94、0.99,则选取准确率为0.99的机器算法模型及对应的模型参数,这样就可以对待建模的数据进行建模。
在其他实施例中,也可以选取准确率大于等于预定的准确率阈值的机器算法模型及对应的模型参数,例如预定的准确率阈值为0.98,则准确率为0.98及0.99的机器算法模型及对应的模型参数均可以用于后续建模操作。
如图2所示,图2为本申请数据处理方法一实施例的流程示意图,该数据处理方法包括以下步骤:
步骤S1,在获取数据源端的数据后,基于预设的数据类型对所获取的数据进行类型的转换处理,以及对转换处理后的数据进行异常处理及空值处理;
本实施例中,可以从一个或多个数据源端获取数据,该数据源可以是不同的网络、不同的操作平台、不同的数据库及数据格式、不同的应用等等。然后对所获取的数据进行类型的转换处理。其中,预设的数据类型包括整数类型、浮点数类型、字符串类型。用户可以预先设置所获取的数据中需要转换的数据类型,例如对于所获取的数据中某部分数据需要转换为整数类型,对于另一部分的数据需要转换为浮点数类型,则用户预先进行设置,以便在从数据源端获取到数据后直接按照用户的设置进行类型的转换处理,数据进行类型转换后便于后续进行相应的统一处理。
其中,对转换处理后的数据进行异常处理包括:处理数据中的噪音点或者数据中的乱码,在一实施例中,可以通过分析数据的分布情况自动清除噪音数据或者乱码。对于海量的数据而言,经过异常处理后的数据为清除噪音的数据,数据更简洁,提高数据的质量,方便后续处理。
对于异常处理后的数据进行空值处理包括:捕获空值字段,为了保证数据在最终处理后的鲁棒性,优选地,填充的空值字段为平均值、中位数、出现频率最高的值或用户设置的值等。对于进行该种空值处理后的数据不仅保证数据的完整性,且保证数据的质量。
S2,在完成所有处理阶段的数据处理后,将最终处理阶段处理后的数据作为待建模的数据存储至预设的传递途径ETL Pipeline中;
本实施例中,基于预设的数据类型对所获取的数据进行类型的转换处理后,不需要用户在每步的处理操作中根据需要进行数据类型的转换;对转换处理后的数据进行异常处理后,对于海量的数据而言,数据更简洁,数据的质量更高;在进行空值处理后,在保证数据的完整性的同时进一步提高数据的质量。在数据完成类型转换处理、异常处理及空值处理后,还可进一步对数据格式规范化处理、拆分处理、验证其正确性的处理、数据替换处理等等,在数据处理完成后,得到最终处理阶段的数据。
本实施例中,将最终处理阶段处理后的数据作为待建模的数据存储至用户预先设置的传递途径ETL Pipeline中,传递途径ETL Pipeline作为最终处理阶段处理后的数据的存储位置,在进行建模时,通过该渠道能够快速获取到进行建模的数据,将数据ETL过程和数据建模过程进行无缝结合。
S3,获取预设的多个机器算法模型及与各个机器算法模型对应的预设的模型参数范围,基于网格搜索grid search选取机器算法模型及与该机器算法模型对应的模型参数,以对待建模的数据进行建模。
本实施例中,预设的多个机器算法模型包括逻辑回归模型、决策树模型及随机森林模型等,每一机器算法模型具有对应的模型参数范围。用户可以预先设置机器算法模型及机器算法模型对应的模型参数范围供选择、使用,例如用户可以增加某一机器算法模型及该机器算法模型对应的模型参数范围。
由于机器算法模型具有多个,且每个机器算法模型有对应的模型参数范围,因此需要在机器算法模型对应的模型参数范围中确定该机器算法模型对应的模型参数,以便最终确定用于建模的机器算法模型。
本实施例采用网格搜索grid search的方法来选取机器算法模型及该机器算法模型对应的模型参数,能够快速确定用于建模的最优的机器算法模型及对应的模型参数。具体地,对于每一机器算法模型及该机器算法模型对应的模型参数范围中的每一模型参数进行训练,并根据训练结果选择最优的机器算法模型及对应的模型参数。
与现有技术相比,本实施例通过用户的预先设置,对数据进行类型转换、异常处理及空值处理,最后从传递途径ETL Pipeline中获取待建模的数据,基于网格搜索grid search选取机器算法模型及该机器算法模型对应的模型参数,完成建模,本实施例由于用户的预先设置,在进行数据整理和对数据建模时能够实现一键性完成整个数据整理分析和建模的过程,简化用户的操作,提高数据处理效率。
在一优选的实施例中,在上述图2的实施例的基础上,在步骤S2之前,还包括如下步骤:在最终处理阶段之前,且在完成每一处理阶段的数据处理后,将各个处理阶段处理后的数据存储至预设的对应的传递途径ETLPipeline中,或者,基于用户的设置将选定的处理阶段处理后的数据存储至预设的对应的传递途径ETL Pipeline中。
本实施例中,在最终处理阶段之前,对于不同处理阶段处理后得到的数据,也可将其存储至用户预设的对应的传递途径ETL Pipeline中,或者通过用户预先的设置,选择性地将其中某些处理阶段处理后得到的数据存储至预设的对应的传递途径ETL Pipeline中,例如将转换处理后的数据存储至对应的传递途径ETL Pipeline中。通过用户预先设置,将数据存储至对应的传递途径ETL Pipeline中,后续处理阶段可以方便地获取数据,自动实现内部数据流的衔接,从而高效完成数据ETL的过程。
在一优选的实施例中,在上述图2的实施例的基础上,所述步骤S3包括:
对于每一机器算法模型及该机器算法模型对应的模型参数范围中的每一模型参数构建的对应的机器算法模型进行训练;
对训练后的机器算法模型的准确率进行验证;
选取准确率最高的机器算法模型及对应的模型参数,以对待建模的数据进行建模。
本实施例中,对于每一机器算法模型及该机器算法模型对应的模型参数范围中的每一模型参数构建的对应的机器算法模型进行训练,然后对训练后的机器算法模型的准确率进行验证,直至将所有的机器算法模型对应的模型参数构建的机器算法模型全部进行训练及准确率的验证之后,将各个准确率进行比较,选取准确率最高的机器算法模型及对应的模型参数,例如准确率为0.98、095、0.94、0.99,则选取准确率为0.99的机器算法模型及对应的模型参数,这样就可以对待建模的数据进行建模。
在其他实施例中,也可以选取准确率大于等于预定的准确率阈值的机器算法模型及对应的模型参数,例如预定的准确率阈值为0.98,则准确率为0.98及0.99的机器算法模型及对应的模型参数均可以用于后续建模操作。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有数据处理系统,所述数据处理系统被处理器执行时实现上述的数据处理方法的步骤。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式 体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台数据源端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种电子装置,其特征在于,所述电子装置包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的数据处理系统,所述数据处理系统被所述处理器执行时实现如下步骤:
    S1,在获取数据源端的数据后,基于预设的数据类型对所获取的数据进行类型的转换处理,以及对转换处理后的数据进行异常处理及空值处理;
    S2,在完成所有处理阶段的数据处理后,将最终处理阶段处理后的数据作为待建模的数据存储至预设的传递途径ETL Pipeline中;
    S3,获取预设的多个机器算法模型及与各个机器算法模型对应的预设的模型参数范围,基于网格搜索grid search选取机器算法模型及与该机器算法模型对应的模型参数,以对待建模的数据进行建模。
  2. 根据权利要求1所述的电子装置,其特征在于,所述数据处理系统被所述处理器执行时,还实现如下步骤:
    在最终处理阶段之前,且在完成每一处理阶段的数据处理后,将各个处理阶段处理后的数据存储至预设的对应的传递途径ETL Pipeline中,或者,基于用户的设置将选定的处理阶段处理后的数据存储至预设的对应的传递途径ETL Pipeline中。
  3. 根据权利要求1所述的电子装置,其特征在于,所述步骤S3包括:
    对于每一机器算法模型及该机器算法模型对应的模型参数范围中的每一模型参数构建的对应的机器算法模型进行训练;
    对训练后的机器算法模型的准确率进行验证;
    选取准确率最高的机器算法模型及对应的模型参数,以对待建模的数据进行建模。
  4. 根据权利要求1至3任一项所述的电子装置,其特征在于,所述异常处理包括:处理数据中的噪音点或者数据中的乱码;所述空值处理包括:捕获数据中的空值字段,利用平均值、中位数、出现频率最高的值或用户设置的值填充所捕获的空值字段。
  5. 根据权利要求1至3任一项所述的电子装置,其特征在于,所述数据类型包括整数类型、浮点数类型及字符串类型。
  6. 一种数据处理方法,其特征在于,所述数据处理方法包括:
    S1,在获取数据源端的数据后,基于预设的数据类型对所获取的数据进行类型的转换处理,以及对转换处理后的数据进行异常处理及空值处理;
    S2,在完成所有处理阶段的数据处理后,将最终处理阶段处理后的数据作为待建模的数据存储至预设的传递途径ETL Pipeline中;
    S3,获取预设的多个机器算法模型及与各个机器算法模型对应的预设的模型参数范围,基于网格搜索grid search选取机器算法模型及与该机器算法模型对应的模型参数,以对待建模的数据进行建模。
  7. 根据权利要求6所述的数据处理方法,其特征在于,所述步骤S2之 前还包括:
    在最终处理阶段之前,且在完成每一处理阶段的数据处理后,将各个处理阶段处理后的数据存储至预设的对应的传递途径ETL Pipeline中,或者,基于用户的设置将选定的处理阶段处理后的数据存储至预设的对应的传递途径ETL Pipeline中。
  8. 根据权利要求6所述的数据处理方法,其特征在于,所述步骤S3包括:
    对于每一机器算法模型及该机器算法模型对应的模型参数范围中的每一模型参数构建的对应的机器算法模型进行训练;
    对训练后的机器算法模型的准确率进行验证;
    选取准确率最高的机器算法模型及对应的模型参数,以对待建模的数据进行建模。
  9. 根据权利要求6至8任一项所述的数据处理方法,其特征在于,所述异常处理包括:处理数据中的噪音点或者数据中的乱码;所述空值处理包括:捕获数据中的空值字段,利用平均值、中位数、出现频率最高的值或用户设置的值填充所捕获的空值字段。
  10. 根据权利要求6至8任一项所述的数据处理方法,其特征在于,所述数据类型包括整数类型、浮点数类型及字符串类型。
  11. 一种数据处理系统,其特征在于,所述数据处理系统包括:
    处理模块,用于在获取数据源端的数据后,基于预设的数据类型对所获取的数据进行类型的转换处理,以及对转换处理后的数据进行异常处理及空值处理;
    第一存储模块,用于在完成所有处理阶段的数据处理后,将最终处理阶段处理后的数据作为待建模的数据存储至预设的传递途径ETL Pipeline中;
    建模模块,用于获取预设的多个机器算法模型及与各个机器算法模型对应的预设的模型参数范围,基于网格搜索grid search选取机器算法模型及与该机器算法模型对应的模型参数,以对待建模的数据进行建模。
  12. 根据权利要求11所述的数据处理系统,其特征在于,所述数据处理系统还包括:
    第二存储模块,用于在最终处理阶段之前,且在完成每一处理阶段的数据处理后,将各个处理阶段处理后的数据存储至预设的对应的传递途径ETL Pipeline中,或者,基于用户的设置将选定的处理阶段处理后的数据存储至预设的对应的传递途径ETL Pipeline中。
  13. 根据权利要求11所述的数据处理系统,其特征在于,所述建模模块具体用于:对于每一机器算法模型及该机器算法模型对应的模型参数范围中的每一模型参数构建的对应的机器算法模型进行训练;对训练后的机器算法模型的准确率进行验证;选取准确率最高的机器算法模型及对应的模型参数,以对待建模的数据进行建模。
  14. 根据权利要求11至13任一项所述的数据处理系统,其特征在于,所 述异常处理包括:处理数据中的噪音点或者数据中的乱码;所述空值处理包括:捕获数据中的空值字段,利用平均值、中位数、出现频率最高的值或用户设置的值填充所捕获的空值字段。
  15. 根据权利要求11至13任一项所述的数据处理系统,其特征在于,所述数据类型包括整数类型、浮点数类型及字符串类型。
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有数据处理系统,所述数据处理系统被处理器执行时实现步骤:
    S1,在获取数据源端的数据后,基于预设的数据类型对所获取的数据进行类型的转换处理,以及对转换处理后的数据进行异常处理及空值处理;
    S2,在完成所有处理阶段的数据处理后,将最终处理阶段处理后的数据作为待建模的数据存储至预设的传递途径ETL Pipeline中;
    S3,获取预设的多个机器算法模型及与各个机器算法模型对应的预设的模型参数范围,基于网格搜索grid search选取机器算法模型及与该机器算法模型对应的模型参数,以对待建模的数据进行建模。
  17. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述数据处理系统被所述处理器执行时,还实现如下步骤:
    在最终处理阶段之前,且在完成每一处理阶段的数据处理后,将各个处理阶段处理后的数据存储至预设的对应的传递途径ETL Pipeline中,或者,基于用户的设置将选定的处理阶段处理后的数据存储至预设的对应的传递途径ETL Pipeline中。
  18. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述步骤S3包括:
    对于每一机器算法模型及该机器算法模型对应的模型参数范围中的每一模型参数构建的对应的机器算法模型进行训练;
    对训练后的机器算法模型的准确率进行验证;
    选取准确率最高的机器算法模型及对应的模型参数,以对待建模的数据进行建模。
  19. 根据权利要求16至18任一项所述的计算机可读存储介质,其特征在于,所述异常处理包括:处理数据中的噪音点或者数据中的乱码;所述空值处理包括:捕获数据中的空值字段,利用平均值、中位数、出现频率最高的值或用户设置的值填充所捕获的空值字段。
  20. 根据权利要求16至18任一项所述的计算机可读存储介质,其特征在于,所述数据类型包括整数类型、浮点数类型及字符串类型。
PCT/CN2017/108799 2017-09-30 2017-10-31 电子装置、数据处理方法、系统及计算机可读存储介质 WO2019061667A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710914863.3A CN107807956A (zh) 2017-09-30 2017-09-30 电子装置、数据处理方法及计算机可读存储介质
CN2017109148633 2017-09-30

Publications (1)

Publication Number Publication Date
WO2019061667A1 true WO2019061667A1 (zh) 2019-04-04

Family

ID=61584715

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/108799 WO2019061667A1 (zh) 2017-09-30 2017-10-31 电子装置、数据处理方法、系统及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN107807956A (zh)
WO (1) WO2019061667A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549981B (zh) * 2018-03-30 2022-06-03 安徽大学 一种提高大批量并行业务流程服务质量的方法
CN109639910B (zh) * 2018-10-19 2021-12-24 平安科技(深圳)有限公司 数据交互方法、设备、存储介质及装置
CN110263229B (zh) * 2019-06-27 2020-06-02 北京中油瑞飞信息技术有限责任公司 一种基于数据湖的数据治理方法及装置
CN113032374A (zh) * 2019-12-24 2021-06-25 北京数聚鑫云信息技术有限公司 数据处理方法、装置、介质及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077192A (zh) * 2012-12-24 2013-05-01 中标软件有限公司 一种数据处理方法及其系统
CN104933160A (zh) * 2015-06-26 2015-09-23 河海大学 一种面向安全监测业务分析的etl框架设计方法
US20170132525A1 (en) * 2015-11-09 2017-05-11 Xerox Corporation Method and system using machine learning techniques for checking data integrity in a data warehouse feed
CN106779087A (zh) * 2016-11-30 2017-05-31 福建亿榕信息技术有限公司 一种通用机器学习数据分析平台
CN106980623A (zh) * 2016-01-18 2017-07-25 华为技术有限公司 一种数据模型的确定方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9699205B2 (en) * 2015-08-31 2017-07-04 Splunk Inc. Network security system
CN105956015A (zh) * 2016-04-22 2016-09-21 四川中软科技有限公司 一种基于大数据的服务平台整合方法
CN106682118A (zh) * 2016-12-08 2017-05-17 华中科技大学 基于网络爬虫和利用机器学习的社交网站虚假粉丝检测方法
CN106815338A (zh) * 2016-12-25 2017-06-09 北京中海投资管理有限公司 一种大数据的实时存储、处理和查询系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077192A (zh) * 2012-12-24 2013-05-01 中标软件有限公司 一种数据处理方法及其系统
CN104933160A (zh) * 2015-06-26 2015-09-23 河海大学 一种面向安全监测业务分析的etl框架设计方法
US20170132525A1 (en) * 2015-11-09 2017-05-11 Xerox Corporation Method and system using machine learning techniques for checking data integrity in a data warehouse feed
CN106980623A (zh) * 2016-01-18 2017-07-25 华为技术有限公司 一种数据模型的确定方法及装置
CN106779087A (zh) * 2016-11-30 2017-05-31 福建亿榕信息技术有限公司 一种通用机器学习数据分析平台

Also Published As

Publication number Publication date
CN107807956A (zh) 2018-03-16

Similar Documents

Publication Publication Date Title
CN106980623B (zh) 一种数据模型的确定方法及装置
US20200412526A1 (en) Method and apparatus for verifying smart contracts in blockchain, and storage medium
US11379687B2 (en) Method for extracting feature string, device, network apparatus, and storage medium
WO2019061667A1 (zh) 电子装置、数据处理方法、系统及计算机可读存储介质
WO2019148669A1 (zh) 机器学习模型的生成方法、装置、计算机设备及存储介质
WO2018166113A1 (zh) 随机森林模型训练的方法、电子装置及存储介质
JP2020522774A (ja) サーバ、金融時系列データの処理方法及び記憶媒体
WO2019019381A1 (zh) 批量处理保单任务的方法、装置、计算机设备及存储介质
US11627113B2 (en) Network-based authentication rule cleaning and optimization
CN108470045B (zh) 电子装置、数据链式归档的方法及存储介质
WO2019061664A1 (zh) 电子装置、基于用户上网数据的产品推荐方法及存储介质
WO2022134828A1 (zh) 基于图片识别的智能仓储方法、系统、设备及存储介质
WO2020042503A1 (zh) 风控系统的验证方法、装置、设备及存储介质
WO2019095667A1 (zh) 数据库数据采集方法、应用服务器及计算机可读存储介质
WO2021051556A1 (zh) 深度学习权值更新方法、系统、计算机设备及存储介质
JP6629973B2 (ja) 携帯電話番号を変更するためのサービス要求を認識する方法及び装置
US20220005004A1 (en) Method and device for blockchain transaction tracing
CN112416972A (zh) 实时数据流处理方法、装置、设备、及可读存储介质
WO2018120726A1 (zh) 基于数据挖掘的建模方法、系统、电子装置及存储介质
CN110457704B (zh) 目标字段的确定方法、装置、存储介质及电子装置
WO2019119635A1 (zh) 种子用户拓展方法、电子设备及计算机可读存储介质
CN104484132B (zh) 数据缩减的方法及装置
CN110876072B (zh) 一种批量注册用户识别方法、存储介质、电子设备及系统
WO2021174882A1 (zh) 数据分片校验方法、装置、计算机设备及可读存储介质
CN108463813B (zh) 一种进行数据处理的方法和装置

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13/10/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17927496

Country of ref document: EP

Kind code of ref document: A1