CN115617834A - Data processing method, device, equipment and storage medium - Google Patents

Data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN115617834A
CN115617834A CN202211236740.6A CN202211236740A CN115617834A CN 115617834 A CN115617834 A CN 115617834A CN 202211236740 A CN202211236740 A CN 202211236740A CN 115617834 A CN115617834 A CN 115617834A
Authority
CN
China
Prior art keywords
data
input
data processing
model
configuration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211236740.6A
Other languages
Chinese (zh)
Inventor
赵荣生
孙梓涵
蒋文伟
汪磊
王永亮
焦广才
冀文杰
朱一飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Netease Cloud Music Technology Co Ltd
Original Assignee
Hangzhou Netease Cloud Music Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Netease Cloud Music Technology Co Ltd filed Critical Hangzhou Netease Cloud Music Technology Co Ltd
Priority to CN202211236740.6A priority Critical patent/CN115617834A/en
Publication of CN115617834A publication Critical patent/CN115617834A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2428Query predicate definition using graphical user interfaces, including menus and forms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04817Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance using icons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus

Abstract

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a data processing method, an apparatus, a device, and a storage medium. The technical scheme disclosed by the invention can receive user input through the user interaction interfaces such as the first page and the second page, so that the associated steps in the data processing process, such as data model configuration and corresponding data processing task configuration and execution, are unified and a set of universal data processing mechanism is constructed, an automatic data processing task is provided according to the user requirement, the logic unification of data processing is ensured, the data processing efficiency and the data asset efficiency are improved, and the task development and maintenance cost can be reduced.

Description

Data processing method, device, equipment and storage medium
Technical Field
Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a data processing method, an apparatus, a device, and a storage medium.
Background
This section is intended to provide a background or context to the embodiments of the disclosure and the description herein is not an admission that it is prior art, nor is it admitted to be prior art by inclusion in this section.
With the rapid development of information technologies such as big data, cloud computing, internet of things, artificial intelligence and the like, the data scale in the network space shows exponential growth. The value of the data itself and its supporting role for the above-mentioned technologies make the data more and more important.
Data capitalization has also become a consensus in the information age today. Data assets refer to data resources owned by an enterprise or organization, recorded physically or electronically, often associated with a particular business, and expected to bring business benefits to the owning party.
The data asset transformation is a process of standardizing, labeling and valuating mass data. In the process of data industrialization, in view of mass data, how to improve data processing efficiency is a subject generally considered in the industry.
Disclosure of Invention
In this context, embodiments of the present invention are intended to provide a data processing method, apparatus, device, and storage medium.
According to an aspect of the present disclosure, there is provided a data processing method including:
receiving a first input of a user, wherein the first input is the configuration input of the data model of the first page by the user;
obtaining model configuration information in response to the first input, the model configuration information including an identification of the data model and model parameters;
receiving a second input of the user, wherein the second input is a configuration input of a data task for a second page;
in response to the second input, obtaining data task configuration information, the data task configuration information including a combination of one or more of data source information, data processing logic, and model result processing configuration information of the data model;
and constructing a data processing task according to the model configuration information and the data task configuration information, and executing the data processing task under the condition that an execution condition is reached.
According to an aspect of the present disclosure, there is provided a data processing apparatus including:
the first receiving module is used for receiving a first input of a user, wherein the first input is the configuration input of a data model of a first page by the user;
a first obtaining module, responsive to a first input, obtaining model configuration information, the model configuration information including an identification of the data model and model parameters;
the second receiving module is used for receiving a second input of the user, wherein the second input is the configuration input of a data task on a second page;
the second obtaining module is used for responding to second input and obtaining data task configuration information, wherein the data task configuration information comprises one or more combinations of data source information, data processing logic and model result processing configuration information of the data model;
and the execution module is used for constructing a data processing task according to the model configuration information and the data task configuration information and executing the data processing task under the condition that the execution condition is reached.
According to an aspect of the present disclosure, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements: the data processing method is provided.
According to an aspect of the present disclosure, there is provided an electronic device including:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the above data processing method via execution of executable instructions.
According to the data processing method of the embodiment of the disclosure, user input is received through the user interaction interfaces such as the first page and the second page, so that the related steps in the data processing process, such as data model configuration and corresponding data processing task configuration and execution, are unified and a set of universal data processing mechanism is constructed, an automatic data processing task is provided according to user requirements, the logic unification of data processing is ensured, the data processing efficiency and the data capitalization efficiency are improved, and the task development and maintenance cost can be reduced.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
FIG. 1 schematically shows one of the architecture diagrams of a data processing system according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a second architecture diagram of a data processing system according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates an interface diagram in a data processing system according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates one of the interface diagrams in a data processing system according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a second interface diagram in a data processing system according to an embodiment of the present disclosure;
FIG. 6 schematically illustrates a third interface diagram in a data processing system according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a fourth interface diagram in a data processing system according to an embodiment of the present disclosure;
FIG. 8 schematically illustrates five of interface diagrams in a data processing system according to an embodiment of the present disclosure;
FIG. 9 schematically illustrates six of an interface diagram in a data processing system according to an embodiment of the present disclosure;
FIG. 10 schematically illustrates a third architecture diagram of a data processing system, in accordance with an embodiment of the present disclosure;
FIG. 11 schematically shows a flow chart of a data processing method according to an embodiment of the present disclosure;
FIG. 12 schematically shows one of the block diagrams of a data processing apparatus according to an embodiment of the present disclosure;
FIG. 13 schematically shows a second block diagram of a data processing apparatus according to an embodiment of the present disclosure;
FIG. 14 schematically shows a third block diagram of a data processing apparatus according to an embodiment of the present disclosure;
FIG. 15 schematically illustrates a fourth block diagram of a data processing apparatus according to an embodiment of the present disclosure;
FIG. 16 schematically shows a fifth block diagram of a data processing apparatus according to an embodiment of the present disclosure;
FIG. 17 shows a schematic diagram of a storage medium according to an embodiment of the present disclosure; and
FIG. 18 schematically illustrates a block diagram of an electronic device in accordance with the disclosed embodiments.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
The data related to the present disclosure may be data authorized by a user or fully authorized by each party, and the collection, transmission, use, and the like of the data all meet the requirements of relevant national laws and regulations, and the embodiments/examples of the present disclosure may be combined with each other.
According to the embodiment of the disclosure, a data processing method, a data processing device, a data processing apparatus and a storage medium are provided.
In this document, any number of elements in the drawings is by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.
The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments of the present disclosure.
Summary of The Invention
A data capitalization process disclosed in the related art includes the steps of:
firstly, a data developer is required to design a universal standardized data model according to the relevant requirements of users such as a service party, an analyst and the like;
then, a data processing system is constructed, data processing tasks are developed, tested, optimized and monitored, and unified data specifications are formulated to achieve the purpose of data standardization.
In the process, the standardization of the data model, the stability of the data processing task and the like all depend on the technical skill and the data view of the data developer, that is, the requirement on manpower is high.
The exemplary embodiments of the present disclosure provide an automated data processing scheme to achieve more efficient processing of data and to achieve data capitalization. The data processing method comprises the steps of providing a uniform data processing platform, providing a configurable data model and a user interaction page by the data processing platform, receiving configuration input of a user to the data model and a data processing task through the user interaction page, automatically constructing the data processing task according to model configuration information and data task configuration information, and executing the data processing task.
By using the exemplary embodiment of the present disclosure, through the steps related to the data processing process, such as data model configuration, corresponding data processing task configuration and execution, unification is performed and a set of general data processing mechanism is constructed, so as to provide an automated data processing task according to the user requirement, ensure the logical unification of data processing, improve the data processing efficiency and the data capitalization efficiency, and also reduce the task development and maintenance cost.
Having described the general principles of the invention, various non-limiting embodiments of the invention are described in detail below.
Exemplary method
A data processing system and method according to exemplary embodiments of the present disclosure will be described with reference to the accompanying drawings.
For ease of understanding, several terms referred to in the embodiments of the present disclosure are explained below.
And (3) data model: and describing the definitions of the structure, format and operation mode of the data, including the dimension, index, column type, reading and writing mode and the like of the data.
Flink: an open source stream processing framework, the core of which is a distributed stream data stream engine written by Java and Scala, flink executes any stream data program in a data parallel and pipeline mode, and a pipeline runtime system of the Flink can execute batch processing and stream processing programs.
SQL: structured Query Language, a special purpose programming Language, is used to manage a relational database management system (RDBMS), or stream processing in a relational stream data management system (RDSMS).
And the Spark is an open source cluster operation framework, relative to MapReduce of Hadoop, the intermediate data can be stored in a disk after the execution of work, and the Spark uses an in-memory operation technology and can analyze and operate in a memory when the data is not written into a hard disk. The arithmetic speed of Spark executing the program in the memory can be 100 times faster than that of Hadoop MapReduce, and even if the program is executed on a hard disk, spark can also be 10 times faster.
ClickHouse: an open source columnar database for online analytical processing (OLAP).
HBase: an open source non-relational distributed database (NoSQL) which refers to the BigTable modeling of Google, and the programming language is Java. It runs on top of the HDFS file system and provides BigTable-scale-like services for Hadoop. Therefore, it can provide a very high fault tolerance rate for sparse files.
ES: elasticSearch, a search engine based on the Lucene library. It provides a distributed, multi-tenant supported full-text search engine with an HTTP Web interface and modeless JSON documents.
API: an Application Programming Interface is a computing Interface that defines the interaction between software intermediaries, as well as the types of calls (calls) or requests (requests) that can be made, how the calls are made or requests are made, the data format that should be used, the conventions that should be followed, and so on.
The architecture of a data processing system according to an exemplary embodiment of the present disclosure is described below with reference to fig. 1, where the data processing system shown in fig. 1 is a system architecture for implementing the data processing method of the exemplary embodiment of the present disclosure, and the data processing method of the exemplary embodiment of the present disclosure is not limited to the system.
Referring to fig. 1, the system specifically includes the following modules: a model layer 11, a configuration layer 12, and an execution layer 13.
The model layer 11 includes an index management module 111 and a data model management module 112. The index management module 111 is used for performing logic precipitation on data demand information of a user, constructing and managing uniform dimension indexes, and enabling the dimension indexes to be online. The data model management module 112 is configured to convert the data index dimension into a specific physical design according to the logical data index dimensions in different service scenarios, that is, to implement the conversion through a corresponding data model.
In this case, the index management module 111 is further configured to receive user input by online the unified dimension index to obtain model configuration information, which includes an identifier of the data model and model parameters. The data model management module 112 is further configured to match the corresponding data model according to the model configuration information.
The configuration layer 12 is used to receive user input to obtain data task configuration information.
The execution layer 13 is configured to construct a data processing task according to the model configuration information and the data task configuration information, and execute the data processing task when the execution condition is reached.
In this embodiment, the modules are associated with each other to provide a one-stop data processing service. Firstly, through the management of a data model, data are standardized and precipitated in a system, the defect of manual maintenance is overcome, unified management such as dimensionality and indexes is provided, the usability, usability and uniformity of the data are guaranteed, the data quality is improved, and the data management and use cost is reduced. Secondly, through data model and data processing task configuration, individualized data processing tasks can be constructed at any time, task development cost and maintenance cost are reduced, resource utilization rate is improved, and data processing logic is unified.
Further, fig. 2 shows an architecture diagram of a specific data processing system provided in an exemplary embodiment of the present disclosure, where the index management module 21 is specifically configured to:
demand management: the method comprises a series of process management such as data requirement proposing, data reviewing, data tracking, data development testing on-line process management and the like;
and (3) data caliber management: binding with data requirements to ensure that the data apertures of all service domains are uniform, for example, the duration does not exceed a certain value;
managing dimension indexes: keeping the data requirement and the data caliber consistent, and realizing the logic of the bearing requirement and the caliber, including functions of increasing, deleting, modifying and checking the dimensionality index, blood margin management, monitoring, metadata management and the like;
and (3) service domain management: the service domain, the qualifier specification, the naming specification, the label classification and the like of the dimension index realize the standardization and the unification of the dimension index;
and (3) authority management: role management, data auditing, data security, etc.
In the data model management module 22, a data model is selected according to the service scenario, and parameter configuration is performed on the data model. The page is shown in combination with fig. 3, service field information such as time field information, dimension index and the like of data is defined in the data model, and the index field information needs to specify a statistical operator (COUNT/MAX/MIN and the like). For specific contents in the time field and the service field, the contents can be shown with reference to fig. 3, and are not described herein again.
Managing the following different data models according to different service scenes, wherein the different data models are applied to the different service scenes as follows:
ETL model: the method is suitable for simple data processing and comprises the following steps: basic logic such as filtering, converting, widening and the like, such as processing tasks from Operational Data ODS (Operational Data Store) to Data Detail layer DWD (Data Warehouse Detail) are applicable to the model;
the relation model is as follows: the method is suitable for services with aggregated scenes, such as data indexes of playing times, red heart times and the like of each day according to song ID statistics;
and (3) refluxing the model: the method is suitable for exporting the data processed on the big data platform to the outside for storage and providing the data to a service use scene, for example, the index data such as the playing times, the red heart times and the like of each song are displayed in a data product and provided for service personnel such as products, operation and the like for use;
and others: the method supports the horizontal extension, can abstract the data into a general data model according to different service scenes, deposits the data model into a system, supports flexible expansibility, is also one of core designs of the current technical scheme, and can adapt to different service scenes.
At the configuration layer 23, the execution process of the data processing task can be abstracted into three steps of "input-logic processing-output". Where the input corresponds to data source management, the logic processing corresponds to data processing logic, and the model result processing configuration information may specifically correspond to storage management.
As shown in connection with fig. 4, the input node corresponds to the configuration of the input, including the flow table and the offline table. The model nodes correspond to data processing logic, such as a relational model, based on the data model. The storage node corresponds to a configuration of the output, such as a data service.
Specifically, the configuration steps of the data processing task are specifically as follows.
1. Data source management data source configuration may be performed, as shown in fig. 5, to support the following functions:
selecting real-time and off-line tables as data source tables, dimension tables and other use scenes;
a data source table filter condition;
configuring a mapping relation from a data source table to a model field;
different configurations are supported for both real-time and offline scenarios. The real-time scene supports window time type, time field, time unit and delay tolerant time configuration; the offline scenario supports partition condition, dependent policy configuration.
2. Logic processing
The data processing logic, such as the steps of filtering (where), converting (function), aggregating (group by), associating (join), column-to-row, etc. of data, provides different functions for the real-time and offline scenarios.
In combination with the real-time scenario shown in fig. 6, task parallelism setting, time window definition (type, size, time format, and the like), filtering condition configuration, global aggregation optimization, and the like are supported.
In combination with the offline scenario shown in fig. 7, two modes of direct input and direct output and aggregation, model filtering conditions, advanced configuration and the like are supported. The direct-in direct-out mode is suitable for synchronous operation of data, such as data synchronization to external storage in Hive. The aggregation mode is suitable for the aggregation scene of data, such as indexes of counting the playing times and the red heart times of each song every day.
3. Output of
And configuring storage engines into which model result data need to be written, such as Hive, clickHouse, HBase, ES, redis and the like, and providing customized functions for different storage engines.
Referring to fig. 8, clickHouse supports fast table creation, seamless switching, writing to local table by designated column, and current limiting. The cost of using different storage engines by a user is greatly reduced, and the use efficiency is improved.
As shown in fig. 9, HBase supports fast table building and automatic row key generation, ensures uniform data distribution, and ensures query effect.
As shown in fig. 2, at the execution layer 24, the system automatically generates a corresponding real-time or offline data processing task according to the model information and task configuration information configured by the user. The real-time task is executed by using Flink, the off-line task is executed by using Spark, and meanwhile, targeted optimization is carried out according to an input table, an execution logic and an output table of a user.
And under a real-time scene, parameters such as the parallelism of tasks, the memory size configuration, the write-in batch size and the like are automatically set according to the input flow information of a user, the calculation window size and the output storage engine.
And in an offline scene, automatically setting queue priority, seamlessly switching, writing into a local ground surface according to the priority information of the user model and the relevant configuration of an output storage engine, and the like.
In this case, the user does not need to pay attention to the bottom-layer details too much, and only needs to pay attention to the basic service logics of the user, such as the data model, the data processing logic, the input and the output, and the like.
Further, the exemplary embodiment of the present disclosure also provides an architecture diagram of a specific data processing system, and compared with fig. 1 and 2, the data processing system shown in fig. 10 further includes: an operation and maintenance layer 31 and a service layer 32.
The operation and maintenance layer 31 is used for providing functions of monitoring and warning, automatic optimization, regular treatment and the like of data processing tasks, providing a plug-in mode, and providing different optimization and treatment strategies for different models at three levels of a priori, a middle of the day and a posterior of the day. Such as:
in advance: and selecting proper resource configuration, memory configuration, read-write flow batch number and the like according to the input flow, the model processing logic complexity and the output storage.
In the process: during the operation of the data processing task, the collected monitoring index data are periodically counted to provide a better optimization strategy, a resource use rationality suggestion, an emergency processing strategy for sudden flow increase, task instability problem early warning and the like.
After the fact: if the data processing task fails, a diagnostic function is provided, the problem is checked one-click and a solution is provided. After the task is off-line, the delayed cleaning of resources such as model data, storage and the like is carried out, and the resource is prevented from being used endlessly.
The service layer 32 is configured to provide data query API, authority audit and blood relationship tracking functions, provide a unified outlet for data, record query times, number of people, processing time consumption, and the like of each data model, count a user query mode, perform targeted optimization suggestion, and the like.
Fig. 11 shows a flowchart of a data processing method provided in an exemplary embodiment of the present disclosure, and as shown in fig. 11, the method includes the following steps:
step 1110: receiving a first input of a user, wherein the first input is the configuration input of the data model of the first page by the user;
step 1120: obtaining model configuration information in response to the first input, the model configuration information including an identification of the data model and model parameters;
step 1130: receiving a second input of the user, wherein the second input is the configuration input of a data task for a second page;
step 1140: in response to the second input, obtaining data task configuration information, the data task configuration information including a combination of one or more of data source information, data processing logic, and model result processing configuration information of the data model;
step 1150: and constructing a data processing task according to the model configuration information and the data task configuration information, and executing the data processing task under the condition that an execution condition is reached.
The exemplary embodiment can unify related steps in the data processing process, such as data model configuration, corresponding data processing task configuration and execution, and construct a set of general data processing mechanism, so as to provide an automatic data processing task according to user requirements, ensure the logic unification of data processing, improve the data processing efficiency and the data capitalization efficiency, and reduce the task development and maintenance cost.
In an exemplary embodiment of the present disclosure, the data processing system provides a user interface, such as the first page and the second page above, to receive model configuration information and data task configuration information.
The first page may present interface content as shown in fig. 3, through which the user inputs model configuration information such as model parameters. The second page may present the contents of at least one interface as in fig. 4 through 10 through which the user enters data task configuration information.
In an exemplary embodiment of the present disclosure, the first input includes a first sub-input and a second sub-input, the first sub-input is a selection input to a model configuration item in the first page, and the second sub-input is a selection input to a menu icon in the model configuration item;
in response to a first input, obtaining model configuration information, including:
displaying a menu list under the model configuration item in response to the first sub-input;
and responding to the second sub-input, obtaining the selected menu icon, and obtaining corresponding model configuration information according to the menu icon.
The present embodiment provides the model configuration items displayed in the form of menus, as shown in fig. 3, the field type and the list type can be configured in the form of pull-down menus.
In the first page, an operable icon may also be displayed, such as the time field shown in fig. 3, and a tick operation may be performed on the box before dt, hh, or mm to select the corresponding time column. In addition, the first page may also display a text entry window.
In an exemplary embodiment of the present disclosure, before receiving the second input of the user, the data processing method further includes:
receiving a third input of the first page by the user;
switching from the first page to the second page in response to a third input.
In this embodiment, the third input is used to trigger a page switch. Through the page switching mode, the model configuration and the data task configuration are associated and unified, and for a user, the user can configure a plurality of related data processing steps on the same platform, so that the data processing efficiency is improved.
As an implementation, the first page may display a switch icon, the switch icon being triggered by the third input, the first page switching to the second page.
In an exemplary embodiment of the present disclosure, configuration item icons of the data source information, the data processing logic, and the model result processing configuration information are displayed on a second page; the second input comprises a third sub-input and a fourth sub-input, and the third sub-input is a selection operation of a target configuration item icon;
in response to the second input, obtaining data task configuration information, including:
responding to the third sub-input, obtaining the identifier of the target configuration item icon, and displaying a corresponding target configuration item interface according to the identifier;
and responding to the fourth sub input, obtaining the selected sub item information in the target configuration item interface, and obtaining the data task configuration information according to the sub item information.
The data source is data to be processed and input data of the data model. The data processing logic refers to data processing logic based on a data model, and the model result processing configuration information is configuration information for performing processing such as storage of model result data.
The interface shown in fig. 4 is an implementation manner of the second page, which is called in this embodiment, in which the input node, the model node, and the output node correspond to the data source information, the data processing logic, and the model result processing configuration information, respectively.
Wherein the second page shown in fig. 4 displays a flow table and an offline table. The flow table is a table into which real-time flow data is mapped, representing a real-time data source. The offline table corresponds to an offline data source.
Specifically, the third sub-input acts on a configuration item icon of the data source information in the interface shown in fig. 4, such as an offline table icon, so as to show the offline table input configuration interface shown in fig. 5, where the offline table input configuration interface is a current target configuration item interface, and the interface shown in fig. 5 is an implementation manner of the target configuration item interface referred to in this embodiment.
Specifically, the following sub-items are displayed on the target configuration item interface:
mapping relation items of the data source table to the model fields of the data model;
a data source filtering condition;
selecting a mode item;
selecting a data source library, such as music _ new _ dm;
a data source table, such as ads _ itm _ pgc _ song _ tag _ dd, is selected.
The display content of the target configuration item interface in this embodiment is only an example, and in a specific application, selection or addition/deletion may be performed as needed, which is not limited herein.
In an exemplary embodiment of the present disclosure, the target configuration item icon is a configuration item icon of the data processing logic, the target configuration item interface is a configuration item interface in a target scene, and the target scene is one or both of a real-time scene and an offline scene.
The present embodiment provides a configuration interface for data processing logic. As one implementation, as shown in fig. 6, the interface of configuration items of the data processing logic in the real-time scenario displays the parallelism, the window type, the window size, the time type, the time format, and the like.
As another implementation, as shown in FIG. 7, a configuration item interface of the data processing logic in an offline scenario displays mapping patterns, filtering conditions, advanced configurations, and the like.
In an exemplary embodiment of the present disclosure, the target configuration item icon is a configuration item icon of the model result processing configuration information, and the target configuration item interface displays a storage configuration item for the model result data.
The embodiment provides a customized function for different storage engines, and the model result data can be written into configured storage engines, such as Hive, clickHouse, HBase, ES, and the like, so as to complete the data asset target. Therefore, through data storage management, bottom data storage details are shielded, a proper data storage engine is automatically selected according to a user use scene, and related storage engines are read and written in an optimal mode, so that the development cost is reduced, and the use efficiency is improved.
The interface is configured based on the clickwouse model results as shown in fig. 8, where items such as catalog, database, table name, etc. are displayed. Wherein FIG. 9 further illustrates a storage configuration interface for fast tabulation.
In an exemplary embodiment of the present disclosure, in a case where the execution condition is reached, before the data processing task is executed, the data processing method further includes:
allocating operating resources for the data model;
wherein the run resources are configured to run the data model during execution of the data processing task.
The embodiment can automatically allocate the running resources according to the data processing tasks, so that the data processing tasks run quickly and efficiently.
In the exemplary embodiment of the present disclosure, when executing a data processing task, index monitoring may be performed on the data processing task, and when the index value reaches a set condition, a task adjustment policy may be generated.
The present embodiment unifies the configuration, execution and monitoring of data processing tasks into the same system or platform, and the implementation manner thereof can refer to the above contents regarding the operation and maintenance layer. This can promote the high-efficient customization and processing of data processing task, promote resource utilization.
In an exemplary embodiment of the present disclosure, in case of a failure of a data processing task, receiving a fourth input of a user;
and responding to the fourth input, performing problem troubleshooting and diagnosis of task failure, and acquiring a solution according to a troubleshooting diagnosis result.
The present embodiment provides self-checking and self-resolution functions for data processing tasks, a key-to-key problem-solving and solution, which can be implemented in the manner referred to above with respect to the operation and maintenance layer.
In the exemplary embodiment of the present disclosure, in the case that the data processing task goes offline and the offline time reaches the target time, the task data resources of the data processing task are cleaned, and the implementation manner of this can refer to the above contents on the operation and maintenance layer. Therefore, the resources can be prevented from being used endlessly, and the resource utilization rate is improved.
In an exemplary embodiment of the present disclosure, the data processing method may further include the steps of:
receiving a data query request of a service user under the condition of executing a data processing task;
and responding to the data query request, performing data query in the processed data, and providing a query result to the service party user.
The data processing method of the embodiment further provides an application to the processed data, specifically, a data query function. Besides, data query information can be recorded to perform targeted optimization.
The data processing method of the exemplary embodiment of the present disclosure provides an integrated solution for an end-to-end data processing process, and improves data processing efficiency.
Further, exemplary embodiments of the present disclosure provide an extensible data model plug-in management scheme to adapt to different business scenarios. According to the detailed information of the data model, targeted optimization is carried out, and on the premise of ensuring the stability of tasks, the rationality of resource use is improved and the resource waste is reduced.
Furthermore, the exemplary embodiment of the present disclosure can generate the data processing task automatically or semi-automatically, perform customizable optimization, reduce the task development cost, improve the task stability, and ensure the reasonable and effective use of resources.
In addition, in the embodiment, the storage configuration of the model result data is adopted, the storage details are shielded, the table is automatically established, the corresponding service scene is adapted, the use cost of the user is reduced, and the development efficiency of the user is improved. In addition, for the processed data, data auditing and blood relationship management are provided, the data use condition is collected, and data support is provided for data value quantification and data management.
Exemplary devices
Having introduced the data processing method of the exemplary embodiment of the present disclosure, next, a data processing apparatus of the exemplary embodiment of the present disclosure is described with reference to fig. 12.
Referring to fig. 12, a data processing apparatus 1200 according to an exemplary embodiment of the present disclosure may include:
a first receiving module 1210 for receiving a first input from a user, where the first input is a configuration input of a data model performed on a first page by the user;
a first obtaining module 1220, responsive to the first input, for obtaining model configuration information, the model configuration information including an identification of the data model and model parameters;
the second receiving module 1230 receives a second input of the user, where the second input is a configuration input for performing a data task on a second page;
a second obtaining module 1240, which is used for responding to the second input and obtaining the data task configuration information, wherein the data task configuration information comprises one or more combinations of data source information, data processing logic and model result processing configuration information of the data model;
the execution module 1250 constructs a data processing task according to the model configuration information and the data task configuration information, and executes the data processing task when the execution condition is reached.
Optionally, the first input includes a first sub-input and a second sub-input, the first sub-input is a selected input to a model configuration item in the first page, and the second sub-input is a selected input to a menu icon in the model configuration item;
the first obtaining module 1220 is specifically configured to:
displaying a menu list under the model configuration item in response to the first sub-input;
and responding to the second sub-input, obtaining the selected menu icon, and obtaining corresponding model configuration information according to the menu icon.
Optionally, the second receiving module 1230 is further specifically configured to:
receiving a third input of the first page by the user before receiving the second input of the user;
switching from the first page to the second page in response to a third input.
Optionally, displaying configuration item icons of the data source information, the data processing logic and the model result processing configuration information on a second page; the second input comprises a third sub-input and a fourth sub-input, and the third sub-input is a selection operation of a target configuration item icon;
the second obtaining module 1240 is specifically configured to:
responding to the third sub-input, obtaining an identifier of the target configuration item icon, and displaying a corresponding target configuration item interface according to the identifier;
and responding to the fourth sub input, obtaining the selected sub item information in the target configuration item interface, and obtaining the data task configuration information according to the sub item information.
Optionally, the target configuration item icon is a configuration item icon of the data source information, and the following sub-items are displayed on the target configuration item interface:
mapping relation items of the data source table to the model fields of the data model;
the data source filters the conditions.
Optionally, the target configuration item icon is a configuration item icon of the data processing logic, the target configuration item interface is a configuration item interface in a target scene, and the target scene is one or both of a real-time scene and an offline scene.
Optionally, the target configuration item icon is a configuration item icon of the model result processing configuration information, and the target configuration item interface displays a storage configuration item for the model result data.
Optionally, the executing module 1250 is specifically further configured to:
under the condition that the execution condition is reached, before the data processing task is executed, allocating operation resources for the data model;
wherein the run resources are configured to run the data model during execution of the data processing task.
Alternatively, compared to fig. 12, the data processing apparatus 1300 shown in fig. 13 further includes:
the index monitoring module 1310 performs index monitoring on the data processing task, and generates a task adjustment policy when the index value reaches a set condition.
Optionally, compared to fig. 12, the data processing apparatus 1400 shown in fig. 14 further includes:
the third receiving module 1410, in case that the data processing task fails, receives a fourth input of the user;
and the troubleshooting module 1420 is used for responding to the fourth input, troubleshooting and diagnosing the problem of the task failure, and acquiring a solution according to the troubleshooting and diagnosing result.
Alternatively, compared to fig. 12, the data processing apparatus 1500 shown in fig. 15 further includes:
the cleaning module 1510 cleans the task data resources of the data processing task when the offline and offline time of the data processing task reaches the target time.
Optionally, compared to fig. 12, the data processing apparatus 1600 shown in fig. 16 further includes:
a fourth receiving module 1610, which receives a data query request of a service user when executing a data processing task;
the query module 1620, in response to the data query request, performs data query on the processed data, and provides a query result to the service user.
The data processing device of the embodiment of the disclosure can unify the related steps in the data processing process, such as data model configuration, corresponding data processing task configuration and execution, and construct a set of universal data processing mechanism, so as to provide an automatic data processing task according to user requirements, ensure the logic unification of data processing, solve the drawback of manual maintenance, improve the data processing efficiency and the data capitalization efficiency, and reduce the task development and maintenance cost.
Since each functional module of the data processing apparatus according to the embodiment of the present disclosure is the same as that in the embodiment of the data processing method, it is not described herein again.
Exemplary storage Medium
Having described the data processing method and apparatus thereof according to the exemplary embodiment of the present disclosure, a storage medium according to the exemplary embodiment of the present disclosure will be described with reference to fig. 17.
Referring to fig. 17, a program product 1700 for implementing the above method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a device, such as a personal computer. However, the program product of the present disclosure is not so limited, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Exemplary electronic device
Having described the storage medium of the exemplary embodiment of the present disclosure, next, an electronic device of the exemplary embodiment of the present disclosure will be described with reference to fig. 10.
The electronic device 1800 shown in fig. 18 is only an example, and should not bring any limitations to the function and scope of use of the embodiments of the present disclosure.
As shown in fig. 18, the electronic device 1800 is in the form of a general purpose computing device. Components of the electronic device 1800 may include, but are not limited to: the at least one processing unit 1810, the at least one memory unit 1820, the bus 1830 that connects the various system components (including the memory unit 1820 and the processing unit 1810), and the display unit 1840.
Where the storage unit stores program code, the program code may be executed by the processing unit 1810 to cause the processing unit 1810 to perform steps according to various exemplary embodiments of the present invention described in the above-mentioned "exemplary methods" section of this specification. For example, processing unit 1810 may perform the steps as shown in fig. 11.
The storage unit 1820 can include volatile storage units, such as a random access storage unit (RAM) 1821 and/or a cache memory unit 1822, and can further include a read-only storage unit (ROM) 1823.
The storage unit 1820 may also include a program/utility 1824 having a set (at least one) of program modules 1825, such program modules 1825 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The bus 1830 may include a data bus, an address bus, and a control bus.
The electronic device 1800 may also communicate with one or more external devices 1801 (e.g., keyboard, pointing device, bluetooth device, etc.), via an input/output (I/O) interface 1850. The electronic device 1800 also includes a display unit 1840 that is connected to the input/output (I/O) interface 1850 for displaying. Also, the electronic device 1800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 1860. As shown, the network adapter 1860 communicates with the other modules of the electronic device 1800 over a bus 1830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
It should be noted that although in the above detailed description several modules or sub-modules of the data processing apparatus are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A data processing method, comprising:
receiving a first input of a user, wherein the first input is the configuration input of a data model of a first page by the user;
obtaining model configuration information in response to the first input, the model configuration information including an identification of a data model and model parameters;
receiving a second input of the user, wherein the second input is a configuration input for performing a data task on a second page;
in response to the second input, obtaining data task configuration information, wherein the data task configuration information comprises one or more of data source information, data processing logic and model result processing configuration information of the data model;
and constructing a data processing task according to the model configuration information and the data task configuration information, and executing the data processing task under the condition of reaching an execution condition.
2. The data processing method according to claim 1, wherein the first input comprises a first sub-input and a second sub-input, the first sub-input is a selection input of a model configuration item in the first page, and the second sub-input is a selection input of a menu icon in the model configuration item;
in response to the first input, obtaining model configuration information, including:
displaying a menu list under the model configuration item in response to the first sub-input;
and responding to the second sub-input, obtaining the selected menu icon, and obtaining corresponding model configuration information according to the menu icon.
3. The data processing method of claim 2, wherein prior to receiving the second input from the user, the data processing method further comprises:
receiving a third input of the first page by the user;
switching from the first page to the second page in response to the third input.
4. The data processing method according to claim 3, wherein configuration item icons of data source information, data processing logic and model result processing configuration information are displayed on the second page; the second input comprises a third sub-input and a fourth sub-input, and the third sub-input is a selection operation of a target configuration item icon;
in response to the second input, obtaining data task configuration information, including:
responding to the third sub-input, obtaining the identifier of the target configuration item icon, and displaying a corresponding target configuration item interface according to the identifier;
and responding to the fourth sub input, obtaining the selected sub item information in the target configuration item interface, and obtaining data task configuration information according to the sub item information.
5. The data processing method according to claim 4, wherein the target configuration item icon is a configuration item icon of data source information, and the following sub-items are displayed on the target configuration item interface:
a mapping relationship entry of a data source table to a model field of the data model;
the data source filters the conditions.
6. The data processing method of claim 4, wherein the target configuration item icon is a configuration item icon of the data processing logic, the target configuration item interface is a configuration item interface in a target scenario, and the target scenario is one or both of a real-time scenario and an offline scenario.
7. The data processing method of claim 4, wherein the target configuration item icon is a configuration item icon of the model result processing configuration information, and the target configuration item interface displays a storage configuration item for model result data.
8. A data processing apparatus, comprising:
the first receiving module is used for receiving a first input of a user, wherein the first input is the configuration input of a data model of a first page by the user;
a first obtaining module, responsive to the first input, obtaining model configuration information, the model configuration information including an identification of a data model and model parameters;
the second receiving module is used for receiving a second input of the user, wherein the second input is the configuration input of a data task on a second page;
a second obtaining module, responsive to the second input, obtaining data task configuration information, the data task configuration information including a combination of one or more of data source information, data processing logic, and model result processing configuration information of the data model;
and the execution module is used for constructing a data processing task according to the model configuration information and the data task configuration information and executing the data processing task under the condition of reaching an execution condition.
9. A storage medium having a computer program stored thereon, the computer program when executed by a processor implementing:
the data processing method according to any one of claims 1 to 7.
10. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform, via execution of the executable instructions:
the data processing method according to any one of claims 1 to 7.
CN202211236740.6A 2022-10-10 2022-10-10 Data processing method, device, equipment and storage medium Pending CN115617834A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211236740.6A CN115617834A (en) 2022-10-10 2022-10-10 Data processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211236740.6A CN115617834A (en) 2022-10-10 2022-10-10 Data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115617834A true CN115617834A (en) 2023-01-17

Family

ID=84862353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211236740.6A Pending CN115617834A (en) 2022-10-10 2022-10-10 Data processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115617834A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303833A (en) * 2023-05-18 2023-06-23 联通沃音乐文化有限公司 OLAP-based vectorized data hybrid storage method
CN116629805A (en) * 2023-06-07 2023-08-22 浪潮智慧科技有限公司 Water conservancy index service method, equipment and medium for distributed flow batch integration
CN117435596A (en) * 2023-12-20 2024-01-23 杭州网易云音乐科技有限公司 Streaming batch task integration method and device, storage medium and electronic equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303833A (en) * 2023-05-18 2023-06-23 联通沃音乐文化有限公司 OLAP-based vectorized data hybrid storage method
CN116303833B (en) * 2023-05-18 2023-07-21 联通沃音乐文化有限公司 OLAP-based vectorized data hybrid storage method
CN116629805A (en) * 2023-06-07 2023-08-22 浪潮智慧科技有限公司 Water conservancy index service method, equipment and medium for distributed flow batch integration
CN116629805B (en) * 2023-06-07 2023-12-01 浪潮智慧科技有限公司 Water conservancy index service method, equipment and medium for distributed flow batch integration
CN117435596A (en) * 2023-12-20 2024-01-23 杭州网易云音乐科技有限公司 Streaming batch task integration method and device, storage medium and electronic equipment
CN117435596B (en) * 2023-12-20 2024-04-02 杭州网易云音乐科技有限公司 Streaming batch task integration method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
Davoudian et al. Big data systems: A software engineering perspective
US11663257B2 (en) Design-time information based on run-time artifacts in transient cloud-based distributed computing clusters
US10372492B2 (en) Job-processing systems and methods with inferred dependencies between jobs
CA3001304C (en) Systems, methods, and devices for an enterprise internet-of-things application development platform
RU2610288C2 (en) Providing capabilities of configured technological process
CN115617834A (en) Data processing method, device, equipment and storage medium
US9058359B2 (en) Proactive risk analysis and governance of upgrade process
US10162892B2 (en) Identifying information assets within an enterprise using a semantic graph created using feedback re-enforced search and navigation
US9116973B2 (en) Method and apparatus for monitoring an in-memory computer system
US20150160969A1 (en) Automated invalidation of job output data in a job-processing system
US9389982B2 (en) Method and apparatus for monitoring an in-memory computer system
US20120166620A1 (en) System and method for integrated real time reporting and analytics across networked applications
US8478623B2 (en) Automated derivation, design and execution of industry-specific information environment
Deelman et al. PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows
Dubuc et al. Mapping the big data landscape: technologies, platforms and paradigms for real-time analytics of data streams
CN113641739A (en) Spark-based intelligent data conversion method
JP7305641B2 (en) Methods and systems for tracking application activity data from remote devices and generating corrective behavior data structures for remote devices
US20190065979A1 (en) Automatic model refreshment
KR102309806B1 (en) Systems and methods for centralization and diagnostics for live virtual server performance data
Darius et al. From Data to Insights: A Review of Cloud-Based Big Data Tools and Technologies
Kukreja et al. Data Engineering with Apache Spark, Delta Lake, and Lakehouse: Create scalable pipelines that ingest, curate, and aggregate complex data in a timely and secure way
Bautista Villalpando et al. DIPAR: a framework for implementing big data science in organizations
Rodriguez et al. Understanding and Addressing the Allocation of Microservices into Containers: A Review
Khatiwada Architectural issues in real-time business intelligence
Balusamy et al. Challenges in Big Data Analytics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination