CN110908994B - Data model processing method, system, electronic equipment and readable medium - Google Patents

Data model processing method, system, electronic equipment and readable medium Download PDF

Info

Publication number
CN110908994B
CN110908994B CN201811076282.8A CN201811076282A CN110908994B CN 110908994 B CN110908994 B CN 110908994B CN 201811076282 A CN201811076282 A CN 201811076282A CN 110908994 B CN110908994 B CN 110908994B
Authority
CN
China
Prior art keywords
model
data
server
format
scheduling task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811076282.8A
Other languages
Chinese (zh)
Other versions
CN110908994A (en
Inventor
周长江
吴荣彬
高达申
龚君泰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Holding Co Ltd
Original Assignee
Jingdong Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Holding Co Ltd filed Critical Jingdong Technology Holding Co Ltd
Priority to CN201811076282.8A priority Critical patent/CN110908994B/en
Publication of CN110908994A publication Critical patent/CN110908994A/en
Application granted granted Critical
Publication of CN110908994B publication Critical patent/CN110908994B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The application provides a data model processing method, a system, electronic equipment and a readable medium, wherein the system comprises the following steps: the data server is used for storing training data in a first format; the application server is used for configuring model parameters of the model to be processed according to the configuration data; the data conversion server is used for generating scheduling task information according to the model parameters; and the model server is used for converting the training data in the first format into the training data in the second format, and executing the model to be processed according to the scheduling task information and the training data in the second format so as to generate a processing result. The data model processing method, the system, the electronic equipment and the readable medium can realize the efficient data transmission between the model server and the big data cluster and realize the functions of automatic deployment, model monitoring, alarm, statistics and the like.

Description

Data model processing method, system, electronic equipment and readable medium
Technical Field
The present application relates to the field of internet, and in particular, to a data model processing method, system, electronic device, and computer readable medium.
Background
Hbase (distributed database) and Hive (distributed data warehouse) technologies based on Hadoop (distributed computing) are common technologies of large data platforms of Internet companies, and have great advantages in processing mass data. In the field of data analysis and modeling, traditional statistical analysis, machine learning of heat and fire, deep learning and the like are very important application aspects of big data. However, the technical basis of the big data cluster (the large distributed server cluster based on the Hadoop technology) is greatly different from the technical basis of the commonly used algorithm model (Python and R language), so that the data interaction and communication between the model server and the big data cluster are very difficult, and the advantages of the model server and the big data cluster cannot be fully exerted. Meanwhile, in actual work, the test and deployment work of the model usually needs to be manually performed by a model developer by a person skilled in the art, so that the working efficiency is low. Furthermore, the related configurations of the prior art fail to enable deployment and visual monitoring of the algorithm model.
In the existing public cloud service providers, although model service based on virtual machines is provided, the reliability is low; meanwhile, the method is limited to practical situations of production and operation of Internet companies, and seamless connection of a big data cluster and a model server cannot be achieved.
To sum up, in the prior art, the processing of the data model has the following drawbacks:
(1) The technical basis and architecture of a big data cluster and a model server are different, and data transmission and communication of the big data cluster and the model server are difficult (the big data cluster usually stores column-based data, and the model server usually processes CSV file data).
(2) The public cloud big data service is difficult to process mass data of a large internet company; even if a big data server of an internet company and a public cloud model server are opened, the problems of slow data transmission and poor stability still exist, and daily demands are difficult to meet.
Disclosure of Invention
In view of this, the present application provides a data model processing method, system, electronic device, and computer readable medium, which implement efficient data transmission between a model server and a large data cluster, and implement functions of automatic deployment, model monitoring, alarm, statistics, and the like.
Other features and advantages of the application will be apparent from the following detailed description, or may be learned by the practice of the application.
According to a first aspect of an embodiment of the present application, there is provided a data model processing system, the system including: the data server is used for storing training data in a first format; the application server is used for configuring model parameters of the model to be processed according to the configuration data; the data conversion server is used for generating scheduling task information according to the model parameters; and converting the training data in the first format into training data in a second format. And the model server is used for executing the model to be processed according to the scheduling task information and the training data in the second format so as to generate a processing result.
In an exemplary embodiment of the present application, at least one of the following servers is further included: the middle data server is used for storing middle data of the model server, and the middle data server is an Hbase cluster server; the result data server is used for storing the processing result, and the intermediate data server is an Hbase cluster server; and the version control server is used for generating the configuration data according to the version control information.
In an exemplary embodiment of the application, the data server is an Hbase cluster server and/or a Hive cluster server.
In an exemplary embodiment of the application, the application server is further configured to generate shell scripts for the model server to run.
In an exemplary embodiment of the present application, the application server is further configured to monitor and manage a training process of the model, and provide model early warning information.
In an exemplary embodiment of the present application, a data conversion server includes: the data conversion application module is used for generating the scheduling task information on line according to the model parameters and monitoring the scheduling task; and the data conversion scheduling module is used for monitoring the scheduling task at regular time and calling the model server according to the scheduling task to execute the scheduling task.
In an exemplary embodiment of the application, the model server is further adapted to provide a Python environment package and/or an R environment package for the pending model run.
In an exemplary embodiment of the application, the model server is further configured to assign a sub-model server to the model to be processed for processing the model to be processed according to the running environment of the model to be processed.
According to a second aspect of an embodiment of the present application, a data model processing method is provided, including: configuring model parameters of a model to be processed according to the configuration data; generating scheduling task information according to the model parameters; converting the training data in the first format into training data in a second format; and executing the model to be processed according to the scheduling task information and the training data in the second format so as to generate a processing result.
In one exemplary embodiment of the present application, converting training data in a first format to training data in a second format includes: and converting the training data in the HFile format into the training data in the CSV format.
According to a third aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; a storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the data model processing method of any of the preceding claims.
According to a fourth aspect of an embodiment of the present application, a computer-readable medium is presented, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a data model processing method as described in any of the above.
According to the data model processing method, the system, the electronic equipment and the computer readable medium, the data between the model server and the big data cluster can be efficiently transmitted, and the functions of automatic deployment, model monitoring, alarm, statistics and the like are realized.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. The drawings described below are only some embodiments of the present application and other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is a block diagram of a data model processing system, according to an example embodiment.
FIG. 2 is an architecture diagram of a data model processing system, according to another example embodiment.
FIG. 3 is a flow chart illustrating a data model processing system according to another exemplary embodiment.
FIG. 4 is a flow chart illustrating a data model processing system according to another exemplary embodiment.
FIG. 5 is a flowchart illustrating a method of data model processing, according to an example embodiment.
FIG. 6 is a block diagram of an electronic device for data model processing, according to an example embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.
The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, systems, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the invention.
The drawings are merely schematic illustrations of the present invention, in which like reference numerals denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and not necessarily all of the elements or steps are included or performed in the order described. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
The following describes example embodiments of the invention in detail with reference to the accompanying drawings.
FIG. 1 is a block diagram of a data model processing system, according to an example embodiment. Referring to fig. 1, a data model processing system may include: a data server 110, an application server 120, a data transformation server 130, and a model server 140.
In the data model processing system, the data server 110 is used to store training data in a first format. The data server may be a large distributed server cluster based on Hadoop technology. The user may develop a distributed program at the distributed server cluster without knowing the details of the distributed underlying layer. And the power of the clusters is fully utilized to perform high-speed operation and storage. Hadoop is a distributed system infrastructure developed by the Apache foundation, which can implement distributed processing of a large amount of data, and has the characteristics of high reliability, high expansibility, high efficiency, high fault tolerance, low cost and the like. The aforementioned first format may be a data storage format commonly used in large data cluster servers. Such as column-based data
According to an example embodiment, the data server 110 may be an Hbase cluster server and/or a Hive cluster server. The Hbase cluster server and the Hive cluster server are both based on the Hadoop technology. The Hbase cluster server is a non-relational database system running on the top layer of the Hadoop distributed file system, has a random read-write function, and is a column-oriented database. HBase stores data in the form of a table consisting of rows and columns, the columns being divided into several column clusters (row family). The Hive cluster server is a data warehouse (non-database) built on the top layer of Hadoop, and does not store and calculate data per se, and can be regarded as a user programming interface. Hive relies on Hadoop distributed file systems and MapReduce (a programming model for parallel operations on big data).
The application server 120 is configured to configure model parameters of the model to be processed according to the configuration data. The application server 120 may implement interaction with a user, and when the user inputs a related command, the application server 120 may obtain configuration data according to the related command, and build a target model, that is, model parameters of a model to be processed, according to the configuration data. The model parameters may also include data, runtime, etc. information required by the model for extracting said data and starting the model according to the relevant runtime, etc.
According to an example embodiment, the application server 120 may also be used to generate shell scripts for the model server 140 to run. The shell is a command interpreter which starts, pauses, stops the running of programs or controls the computer by accepting shell commands input by a user. Shell scripts are files composed of shell commands, which are names of executable programs, and the scripts can be run without compiling.
According to an example embodiment, the application server 120 may also be used to monitor and manage the training process of the model, as well as provide model warning information. For example, a user may manage his/her intra-rights transactions, as well as model testing, model monitoring, model pre-warning, etc., through the application server 120.
The data conversion server 130 is configured to generate scheduling task information according to the model parameters; and converting the training data in the first format into training data in a second format. For example, scheduling task information may be generated from runtime, required data, etc. in model parameters to provide scheduling functionality for the operation of the model. For another example, the model parameters are taken as an online task and corresponding scheduling task information is generated to provide monitoring information, log information and the like of the task and its parent task when the task is executed. The second format may be, for example, CSV file data (comma separated value file format), or other data format that the model server 140 may process, which is not particularly limited in the present invention. The comma separated value file format is also referred to as a character separated value file format because the separated characters may not be commas.
According to an example embodiment, the data conversion server 130 may include: the data conversion application module is used for generating the scheduling task information on line according to the model parameters and monitoring the scheduling task; and a data conversion scheduling module, configured to monitor the scheduled task at regular time, and call the model server 140 according to the scheduled task to execute the scheduled task. For example, the data conversion application module may take the algorithm model as an online task and generate corresponding scheduling task information after formal deployment, and provide monitoring information and log information of the task and its parent task in the process of executing the task. For another example, the data transformation scheduling module may regularly refresh the scheduled task information generated by the data transformation application module and send the scheduled task information to the model server 140 when the task starts.
The model server 140 is configured to execute the to-be-processed model according to the scheduling task information and training data in the second format, so as to generate a processing result. The model server 140 may also execute the pending model, for example, based on the shell script, the scheduled task information, and training data in a second format.
According to an example embodiment, the model server 140 is further configured to provide a Python environment package and/or an R environment package for the pending model run. The model server 140 is further configured to assign a sub-model server to the model to be processed according to the running environment of the model to be processed, so as to process the model to be processed. For example, model server 140 may make a selection of an environment package based on the model file type (Python or R) uploaded by the developer, e.g., assigning a different model server for computation.
According to an example embodiment, the data model processing system may further include: an intermediate data server, a result data server and a version control server. The intermediate data server is used for storing the intermediate data of the model server 140, and the intermediate data server is an Hbase cluster server. And the result data server is used for storing the processing result, and the intermediate data server is an Hbase cluster server. The version control server is used for generating the configuration data according to the version control information. The intermediate data server and the result data server can be used as storage containers for storing the intermediate data and the result data in a second format; the data conversion server 130 may convert the intermediate data and the result data into the first format according to the schedule information or periodically and transmit to the data server 110. For example, the intermediate data server may be an Hbase cluster server for storing intermediate data during model operation.
According to the data model processing system, the model is deployed through the data conversion server, the first format data and the second format data are converted mutually, and the configuration, monitoring and early warning of the model are realized through the application server. The data model processing system can realize the efficient data transmission between the model server and the big data cluster, and realize the functions of automatic deployment, model monitoring, alarm, statistics and the like.
FIG. 2 is an architecture diagram of a data model processing system, according to another example embodiment. Referring to fig. 2, the data model processing system may include a data server 110, an application server 120, a data conversion application server 132, a data conversion scheduling server 134, a model server 140, an intermediate data server 150, a result data server 160, and a version control server 170.
In a data model processing system, data server 110 is used to store vast amounts of data, including training data required for model operation. The storage format of the data in the data server 110 is a first format, such as a column-based data format. Common data servers 110 are e.g. Hbase cluster servers, hive cluster servers.
The application server 120 may implement operations related to model management and monitoring of the user, for example, may provide functions of rights management, project management, model query, model upload, ETL (Extract-Transform-Load) parameter configuration, model test, model deployment, model monitoring, model early warning, model modification, model offline, model deletion, log query, parent task monitoring, version control, etc. for the user; various configuration information of the user and the model is stored. The ETL is used to describe the process of extracting (extracting), converting (transforming), and loading (load) data from the source end to the destination end.
The data conversion application server 132 may take the algorithm model as an online task and generate corresponding scheduling information after formal deployment, and provide monitoring information and log information of the task and its parent task during task execution.
The data conversion scheduling server 134 may send scheduling task information to the model server 140 at the beginning of a task based on scheduling task information of the regularly swiped data conversion application server 132 at the time of task execution of the model in an online environment.
Model server 140 may provide both Python and R environments and commonly used packages; according to the model file type (Python or R) uploaded by the developer, different model servers are allocated for calculation, data are extracted from the data server 110 during model operation, CSV files are generated and provided for the model server 140, and the calculated results are reported to the intermediate data server 150 and the result data server 160; the model server 140 has the characteristics of larger CPU and memory and larger computing power.
The intermediate data server 150 may be an Hbase cluster server, and may be used to store various intermediate data generated during model runtime. The results data server 160 may be used to store results data for model runs. The storage format of the data in the intermediate data server 150 and the result data server 160 may be a second format, such as a CSV format. The data in the intermediate data server 150 and the result data server 160 may be converted into data of the first format by the data conversion server 130 and then stored in the data server 110. Wherein the intermediate data server 150 and the result data server 160 may also directly store the data in the first format converted by the data conversion server 130 and send to the data server 110 for storage.
Version control server 170 may implement version management of the algorithm model files and other related files; which may provide files and configuration information to the application server 120, model server 140.
FIG. 3 is a flow chart illustrating a data model processing system according to another exemplary embodiment. Referring to FIG. 3, during the data model test phase, the data model processing system may complete the steps of:
In step S310, the application server 120 generates a deployment file from the model file, the drawing script, the data reflow script, and other files uploaded by the user, and transmits the deployment file to the model server 140. The deployment file may further include shell script files, scheduling task information, and the like.
In step S320, the model server 140 extracts data from the data server 110 according to the deployment file. The extraction of the data may be performed by the data conversion server 130 to convert the data in the first format into the data in the second format in the data server 110.
In step S330, the model server 140 acquires the deployment model file and the data and then runs the model, and stores the results to the intermediate data server 150 and the result data server 160, respectively. The model server 140 may also read intermediate data according to the deployment file, so as to implement continued operation of the model.
In step S340, the application server 120 reads intermediate data and result data of the model operation to analyze the model. Wherein the intermediate data and the result data are stored in the intermediate data server 150 and the result data server 160, respectively.
FIG. 4 is a flow chart illustrating a data model processing system according to another exemplary embodiment. Referring to FIG. 4, when the data model is in a display run state, the data model processing system may complete the steps of:
In step S410, the data conversion application server 132 transmits model task information to the data conversion scheduling server 134. The model task information may be scheduling task information generated by the application server 120 according to model parameters; the data conversion application module 132 may be used to store task configuration information for the model and to provide scheduling information to the data conversion scheduling module 134. The data conversion server 130 may be an ETL (Extract-Transform-Load) system, and performs scheduling task information as one ETL task.
In step S420, the data conversion scheduling server 134 schedules the model tasks in the model server 140. The model task may be shell script information generated by the application server 120 according to the model parameters.
In step S430, the model server 140 acquires a data file from the data server 110 to run the model task, and stores the result to the intermediate data server 150 and the result data server 160, respectively. Wherein, the model server 140 may obtain the data file in the second format through the data conversion server 130. The data conversion server 130 may convert data stored in the first format in the data server 110 into a data file in the second format. The data conversion server 130 may, for example, employ an ETL system, and utilize its powerful data processing capability to implement rapid transmission and conversion of data. In the present exemplary embodiment, the data server 110 may be, for example, an Hbase cluster server or a Hive cluster server, on which a data file of a first format is stored. The model server 140 may also read the intermediate data from the intermediate data server 150 according to the deployment file, so as to implement the continuous running of the model.
In step S440, the result data server 160 transmits the result data to the data server 110. The data conversion server 130 may convert the result data from the second format to the data file in the first format, and then send the result data to the data server 110.
According to an example embodiment, the model may also be analyzed by reading monitoring information of each server at the model runtime from the application server 110. For example, the application server 110 may provide functions such as rights management, project management, model query, model upload, ETL parameter configuration, model test, model deployment, version control, etc., as well as functions such as model monitoring, model early warning, model modification, model offline, model deletion, log query, parent task monitoring, etc., to the user.
According to the data model processing system, the configuration, monitoring and early warning of the model are realized through deployment of the model and mutual conversion of the data in the first format and the data in the second format. The data model processing system can realize the efficient data transmission between the model server and the big data cluster, and realize the functions of automatic deployment, model monitoring, alarm, statistics and the like. In conclusion, the data model processing system can realize the rapid data transmission between the model server and the big data cluster; meanwhile, the implementation mode of the platformization can enable a developer to realize one-key deployment and monitoring of the model without manual operation when deploying the model; finally, the visualization of the monitoring data will effectively enable the actual effect assessment of the model.
FIG. 5 is a flowchart illustrating a method of data model processing, according to an example embodiment. Referring to fig. 5, the data model processing method may include:
Step S510, configuring model parameters of the model to be processed according to the configuration data. When a user inputs a related command, the configuration data can be obtained according to version control information provided by the user, and a target model, namely model parameters of a model to be processed, can be built according to the configuration data. The model parameters may also include data, runtime, etc. information required by the model for extracting said data and starting the model according to the relevant runtime, etc. According to example embodiments, the model parameters may also be generated into shell scripts for the model server to run.
Step S520, generating scheduling task information according to the model parameters. For example, scheduling task information may be generated from runtime, required data, etc. in model parameters to provide scheduling functionality for the operation of the model. For another example, the model parameters are taken as an online task and corresponding scheduling task information is generated to provide monitoring information, log information and the like of the task and its parent task when the task is executed.
In step S530, the training data in the first format is converted into the training data in the second format. According to an example embodiment, the training data in the first format may be in Hfile format, stored in a data server; the second format may be a CSV format, stored in the model server.
And step S540, executing the model to be processed according to the scheduling task information and the training data in the second format so as to generate a processing result. For example, the model to be processed may be executed according to the shell script, the scheduling task information, and the training data in the second format.
According to the data model processing method, the configuration, the monitoring and the early warning of the model are realized by deploying the model and mutually converting the data in the first format and the data in the second format. The data model processing method can realize the efficient data transmission between the model server and the big data cluster, and realize the functions of automatic deployment, model monitoring, alarm, statistics and the like.
FIG. 6 is a block diagram of an electronic device for data model processing, according to an example embodiment.
An electronic device 600 according to this embodiment of the application is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. For example, the central processing unit 601 may perform the steps as shown in one or more of fig. 2, 3, 4, 5.
In the RAM 603, various programs and data required for system operation, such as configuration data, training data, and the like, are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a touch screen, a keyboard, etc.; an output portion 607 including a Liquid Crystal Display (LCD) or the like, a speaker or the like; a storage section 608 including a flash memory or the like; and a communication section 609 including a wireless network card, a high-speed network card, and the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a semiconductor memory, a magnetic disk, or the like is mounted on drive 610 as needed so that a computer program read therefrom is mounted into storage section 608 as needed.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, embodiments of the present invention may be embodied in a software product which may be stored on a non-volatile storage medium (which may be a CD-ROM, a usb disk, a mobile hard disk, etc.), and which includes instructions for causing a computing device (which may be a personal computer, a server, a mobile terminal, or a smart device, etc.) to perform a method according to embodiments of the present invention, such as the steps shown in one or more of fig. 2,3,4, 5.
Furthermore, the above-described drawings are only schematic illustrations of processes included in the method according to the exemplary embodiment of the present invention, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the invention is not limited to the details of construction, the manner of drawing, or the manner of implementation, which has been set forth herein, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (11)

1. A data model processing system, comprising:
The data server is used for storing training data in a first format;
the application server is used for configuring model parameters of the model to be processed according to the configuration data;
The data conversion server is used for generating scheduling task information according to the model parameters; and converting the training data in the first format into training data in a second format;
the model server is used for executing the model to be processed according to the scheduling task information and training data in a second format so as to generate a processing result;
Wherein, the data conversion server includes:
The data conversion application module is used for generating the scheduling task information on line according to the model parameters and monitoring the scheduling task; and
And the data conversion scheduling module is used for monitoring the scheduling task at regular time and calling the model server according to the scheduling task to execute the scheduling task.
2. The system of claim 1, further comprising at least one of the following servers:
the middle data server is used for storing middle data of the model server, and the middle data server is an Hbase cluster server;
The result data server is used for storing the processing result, and the intermediate data server is an Hbase cluster server; and
And the version control server is used for generating the configuration data according to the version control information.
3. The system according to claim 1, wherein the data server is an Hbase cluster server and/or a Hive cluster server.
4. The system of claim 1, wherein the application server is further configured to generate shell scripts from the model parameters for the model server to run.
5. The system of claim 1, wherein the application server is further configured to monitor and manage a training process of the model and provide model warning information.
6. The system of claim 1, wherein a model server is further configured to provide a Python environment package and/or an R environment package for the pending model run.
7. The system of claim 6, wherein the model server is further configured to assign a sub-model server to a model to be processed for processing the model to be processed according to an operating environment of the model to be processed.
8. A method of data model processing, comprising:
Configuring model parameters of a model to be processed according to the configuration data;
Generating scheduling task information according to the model parameters;
Converting the training data in the first format into training data in a second format; and
Executing the model to be processed according to the scheduling task information and training data in a second format to generate a processing result;
wherein the generating scheduling task information according to the model parameters includes:
Generating the scheduling task information on line according to the model parameters;
the method further comprises the steps of:
and monitoring the dispatching task at fixed time and executing the dispatching task.
9. The method of claim 8, wherein converting training data in a first format to training data in a second format comprises:
And converting the training data in the HFile format into the training data in the CSV format.
10. An electronic device, comprising:
one or more processors;
a storage means for storing one or more programs;
The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 8-9.
11. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 8-9.
CN201811076282.8A 2018-09-14 2018-09-14 Data model processing method, system, electronic equipment and readable medium Active CN110908994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811076282.8A CN110908994B (en) 2018-09-14 2018-09-14 Data model processing method, system, electronic equipment and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811076282.8A CN110908994B (en) 2018-09-14 2018-09-14 Data model processing method, system, electronic equipment and readable medium

Publications (2)

Publication Number Publication Date
CN110908994A CN110908994A (en) 2020-03-24
CN110908994B true CN110908994B (en) 2024-06-14

Family

ID=69812426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811076282.8A Active CN110908994B (en) 2018-09-14 2018-09-14 Data model processing method, system, electronic equipment and readable medium

Country Status (1)

Country Link
CN (1) CN110908994B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459588A (en) * 2020-03-27 2020-07-28 深圳融安网络科技有限公司 Big data model setting method, terminal device and computer readable storage medium
CN113986141A (en) * 2021-11-08 2022-01-28 北京奇艺世纪科技有限公司 Server model updating method, system, electronic device and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095391A (en) * 2016-05-31 2016-11-09 携程计算机技术(上海)有限公司 Based on big data platform and the computational methods of algorithm model and system
CN107885762A (en) * 2017-09-19 2018-04-06 北京百度网讯科技有限公司 Intelligent big data system, the method and apparatus that intelligent big data service is provided

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10797953B2 (en) * 2010-10-22 2020-10-06 International Business Machines Corporation Server consolidation system
US9413779B2 (en) * 2014-01-06 2016-08-09 Cisco Technology, Inc. Learning model selection in a distributed network
CN104899284B (en) * 2015-06-05 2018-09-04 北京京东尚科信息技术有限公司 A kind of method and device for dispatching system based on metadata driven
CN107220261B (en) * 2016-03-22 2020-10-30 中国移动通信集团山西有限公司 Real-time mining method and device based on distributed data
CN105608512A (en) * 2016-03-24 2016-05-25 东南大学 Short-term load forecasting method
CN106066934A (en) * 2016-05-27 2016-11-02 山东大学苏州研究院 A kind of Alzheimer based on Spark platform assistant diagnosis system in early days
CN108228683A (en) * 2016-12-21 2018-06-29 广东工业大学 A kind of distributed intelligence electric network data analysis platform based on cloud computing
CN107545418A (en) * 2017-09-19 2018-01-05 深圳金融电子结算中心有限公司 Transaction processing system and method based on distributed architecture

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106095391A (en) * 2016-05-31 2016-11-09 携程计算机技术(上海)有限公司 Based on big data platform and the computational methods of algorithm model and system
CN107885762A (en) * 2017-09-19 2018-04-06 北京百度网讯科技有限公司 Intelligent big data system, the method and apparatus that intelligent big data service is provided

Also Published As

Publication number Publication date
CN110908994A (en) 2020-03-24

Similar Documents

Publication Publication Date Title
US20210326128A1 (en) Edge Computing Platform
JP6523354B2 (en) State machine builder with improved interface and handling of state independent events
CN103441900B (en) Centralized cross-platform automatization test system and control method thereof
US10977076B2 (en) Method and apparatus for processing a heterogeneous cluster-oriented task
US10249197B2 (en) Method and system for mission planning via formal verification and supervisory controller synthesis
US11163540B2 (en) Application program for extension and deployment of integrated and exportable cross platform digital twin model
CN110908994B (en) Data model processing method, system, electronic equipment and readable medium
Berekmeri et al. Feedback autonomic provisioning for guaranteeing performance in mapreduce systems
Wang et al. Flint: A platform for federated learning integration
CN115454629A (en) AI algorithm and micro-service scheduling method and device based on cloud native technology
WO2019027597A1 (en) Workflows management framework
Agarwal et al. Azurebot: A framework for bag-of-tasks applications on the azure cloud platform
Collier et al. Developing distributed high-performance computing capabilities of an open science platform for robust epidemic analysis
CN113721898A (en) Machine learning model deployment method, system, computer device and storage medium
CN117171471A (en) Visual big data machine learning system and method based on Ray and Spark
CN106843822B (en) Execution code generation method and equipment
CN110766163B (en) System for implementing machine learning process
JP2008293186A (en) Method and apparatus for automatically generating steel plant control program
CN116134387B (en) Method and system for determining the compression ratio of an AI model for an industrial task
CN104731900A (en) Hive scheduling method and device
US11651257B2 (en) Software CoBot engineering, execution, and monitoring
CN114896049A (en) Method, system, equipment and medium for scheduling operation tasks of electric power artificial intelligence platform
Penttinen et al. Advanced fault tree analysis for improved quality and risk assessment
Sun et al. Scalable multi‐site photovoltaic power forecasting based on stream computing
Gong et al. Web visualization of distributed network measurement system based on HTML5

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: Jingdong Digital Technology Holding Co.,Ltd.

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Digital Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Beijing Economic and Technological Development Zone, 100176

Applicant before: BEIJING JINGDONG FINANCIAL TECHNOLOGY HOLDING Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant