CN112925838A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN112925838A
CN112925838A CN201911245124.5A CN201911245124A CN112925838A CN 112925838 A CN112925838 A CN 112925838A CN 201911245124 A CN201911245124 A CN 201911245124A CN 112925838 A CN112925838 A CN 112925838A
Authority
CN
China
Prior art keywords
data set
data
preset
operator
formatted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911245124.5A
Other languages
Chinese (zh)
Inventor
江小辉
杨馨惠
杨斌
叶志坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201911245124.5A priority Critical patent/CN112925838A/en
Publication of CN112925838A publication Critical patent/CN112925838A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the application provides a data processing method and a data processing device, wherein the method comprises the following steps: loading an original data set from a preset data source; converting the original data set into a formatted data set with a preset data format; acquiring arrangement information aiming at a preset operator; generating a data set operation model by adopting the preset operator and the arrangement information; and analyzing and processing at least two formatted data sets according to the data set operation model to obtain a data analysis result. By the data processing method, a user can load the original data sets from different data sources, the original data sets are converted into the formatted data sets with the preset data format, unified butt joint of various data sets is completed, operator arrangement operation is simple and convenient, and the user can conveniently realize required data set calculation on the data sets.

Description

Data processing method and device
Technical Field
The present application relates to the field of data technologies, and in particular, to a data processing method and a data processing apparatus.
Background
With the development of the internet and the internet of things, data in many fields are explosively increased, the analysis of big data resources is a data processing method for finding the regularity in the data, and with the continuous expansion of big data services, service scenes are more and more complex, and various models need to be compiled to effectively analyze the data.
For most users of an unknown data analysis model, even if they own the data resources, they do not understand the information in the analyzed data. For users of the known data analysis models, the specific models are still required to be developed according to the data characteristics, which is time-consuming and labor-consuming, and the data analysis cannot be completed in real time and rapidly.
Disclosure of Invention
In view of the above problems, embodiments of the present application are proposed to provide a data processing method and a corresponding data processing apparatus that overcome or at least partially solve the above problems.
In order to solve the above problem, an embodiment of the present application discloses a data processing method, including:
loading an original data set from a preset data source;
converting the original data set into a formatted data set with a preset data format;
acquiring arrangement information aiming at a preset operator;
generating a data set operation model by adopting the preset operator and the arrangement information;
and analyzing and processing at least two formatted data sets according to the data set operation model to obtain a data analysis result.
Optionally, the loading the original data set from the preset data source includes:
creating a mounting point aiming at a preset data source;
and loading the original data set from the mounting point.
Optionally, before converting the original data set into a formatted data set in a preset data format, the method further includes:
acquiring the number and the type of fields input by a user;
and determining the target data format according to the field number and the field type.
Optionally, the converting the original data set into a formatted data set in a preset data format includes:
and converting the original data set into a formatted data set in the target data format.
Optionally, before acquiring the arrangement information for the preset operator, the method further includes;
generating and displaying a data set operation model editing interface;
detecting operator selection operation triggered by a user on the data set operation model editing interface, and determining a corresponding target preset operator in response to the operator selection operation;
and detecting the arrangement operation of a user on the editing interface of the data set operation model, responding to the arrangement operation, and generating arrangement information aiming at the target preset operator.
Optionally, the dataset operational model has at least two input nodes; before performing data set operation on at least two formatted data sets according to the data set operation model to obtain a data analysis result, the method further comprises the following steps:
generating and displaying an input node selection interface;
receiving selection information input by a user on the input node selection interface;
and taking the formatted data set corresponding to the selection information as the formatted data set corresponding to the input node.
Optionally, the analyzing at least two formatted data sets according to the data set operation model to obtain a data analysis result includes:
generating an executable instance by using the dataset operational model and the formatted data;
and operating the executable example to obtain a data analysis result.
The embodiment of the invention also discloses a data processing device, which comprises:
the original data set loading module is used for loading an original data set from a preset data source;
the format conversion module is used for converting the original data set into a formatted data set with a preset data format;
the arrangement information acquisition module is used for acquiring arrangement information aiming at a preset operator;
the data set operation model generation module is used for generating a data set operation model by adopting the preset operator and the arrangement information;
and the data analysis module is used for analyzing and processing at least two formatted data sets according to the data set operation model to obtain a data analysis result.
Optionally, the raw data set loading module includes:
the mounting point creating submodule is used for creating a mounting point aiming at a preset data source;
and the original data set loading submodule is used for loading the original data set from the mounting point.
Optionally, the method further comprises:
the field information acquisition module is used for acquiring the number and the type of fields input by a user before the format conversion module converts the original data set into a formatted data set with a preset data format;
and the target data format determining module is used for determining the target data format according to the field number and the field type.
Optionally, the format conversion module includes:
and the format conversion submodule is used for converting the original data set into a formatted data set in the target data format.
Optionally, further comprising;
the data set operation model editing interface display module is used for generating and displaying a data set operation model editing interface before the arrangement information acquisition module acquires the arrangement information aiming at the preset operator;
the target preset operator determining module is used for detecting operator selection operation triggered by a user on the data set operation model editing interface and determining a corresponding target preset operator in response to the operator selection operation;
and the arrangement information generation module is used for detecting the arrangement operation of a user on the data set operation model editing interface and responding to the arrangement operation to generate the arrangement information aiming at the target preset operator.
Optionally, the dataset operational model has at least two input nodes; the device further comprises:
the input node selection interface display module is used for generating and displaying an input node selection interface before the data analysis module performs data set operation on at least two formatted data sets according to the data set operation model to obtain a data analysis result;
the selection information receiving module is used for receiving selection information input by a user on the input node selection interface;
and the input node data determining module is used for taking the formatted data set corresponding to the selection information as the formatted data set corresponding to the input node.
Optionally, the data analysis module comprises:
the executable instance generation sub-module is used for generating an executable instance by adopting the data set operation model and the formatted data;
and the executable instance running sub-module is used for running the executable instance to obtain a data analysis result.
The embodiment of the invention also discloses a device, which comprises:
one or more processors; and
one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform one or more methods as described above.
Embodiments of the invention also disclose one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause the processors to perform one or more of the methods described above.
The embodiment of the application has the following advantages:
by the data processing method, a user can load the original data set from different data sources, convert the original data set into the formatted data set with the preset data format, and finish the unified docking of various data sets. Acquiring arrangement information aiming at a preset operator and input by a user according to the requirements of a service scene, and generating a data set operation model by adopting the preset operator and the arrangement information; and analyzing and processing the at least two formatted data sets according to the data set operation model to obtain a data analysis result. For users who do not understand operator compiling, the data set operation model can be generated only by simply combining preset operators, the operation is simple and convenient, and the users can conveniently realize required data set calculation on the data set. The whole process is realized on line, and various calculation logics such as online collision, fusion, combination, intersection, union set, difference set, sequencing, regular calculation and the like can be realized on the data of different calculation platforms, so that the online value and the real-time value of the data are obtained through analysis.
Drawings
FIG. 1 is a flow chart of steps of a first embodiment of a data processing method of the present application;
FIG. 2 is a flowchart illustrating steps of a second embodiment of a data processing method according to the present application;
FIG. 3 is a schematic diagram of an operator editing component provided by a data computing platform;
FIG. 4 is a schematic diagram of a data set operational model editing component provided by a data computing platform;
FIG. 5 is a schematic diagram of an input node selection interface in an embodiment of the present application;
FIG. 6 is a schematic diagram of another input node selection interface in an embodiment of the present application
Fig. 7 is a block diagram of an embodiment of a data processing apparatus according to the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
With the rise of cloud computing technology, various big data computing platforms are also rising, wherein a hybrid computing platform is a platform architecture which is more and more favored by customers, and the customers can converge data stored on different computing platforms to the hybrid computing platform to realize data logic analysis of different service scenes.
One of the core concepts of the embodiments of the present application is to provide a data computing platform, which can converge data of other computing platforms to the data computing platform, so that a user can compile a data set operation model in an online compiling and free combining manner, and the data computing platform uses the data set operation model to perform online data analysis on data of a plurality of other computing platforms for different service scenarios.
The data of different computing platforms can be subjected to various computing logics such as online collision, fusion, combination, intersection, union, difference set, sequencing, regular computation and the like, so that the online value and the real-time value of the data are obtained through analysis.
Referring to fig. 1, a flowchart of a first step of a data processing method according to an embodiment of the present application is shown, where the data processing method according to the embodiment of the present application may be applied to a data computing platform, and specifically may include the following steps:
step 101, loading an original data set from a preset data source;
in an actual business scenario, a user may have multiple data sources, and unified management, control and calculation of data across multiple heterogeneous data sources are required.
The data processing method of the embodiment of the invention can be applied to a data computing platform to realize various logical operation processing on the data set. The data computing platform can be an open platform, a closed internal platform or an artificial intelligence platform. The logical operation processing on the data set may be on-line processing or off-line processing.
In this embodiment, the data computing platform may load the original data set from a preset data source, and the preset data source may include other computing platforms or databases specified by the user.
When the user uses the data computing platform, the user can set the original data set to be imported from the data source of the private domain or the public domain.
Step 102, converting the original data set into a formatted data set with a preset data format;
the formats of raw data sets obtained by different data sources are usually different, but when data analysis is carried out, the same data format is required to be kept among the data sets for processing. Therefore, the original data sets of different data sources need to be uniformly converted into formatted data sets of preset data formats.
The preset data format can be a data format provided by a data computing platform, or a data format defined by the user on line, or a data format imported from the outside by the user.
Step 103, acquiring arrangement information aiming at a preset operator;
operators refer to the encapsulation of data processing logic, including, for example, the encapsulation of point-of-care analysis computation logic, relational data computation logic, graph data computation logic, object store computation logic, other open source compute engine logic.
For example, data processing logic such as collisions, fusion, combination, intersection, union, difference, ordering, canonical calculation, etc., between data sets may be encapsulated as operators.
Collisions refer to the creation of data intersections between different data sets, conditioned on one or more data items.
Fusing refers to fusing objects having the same attribute field value and an intersection or distance within a fusion tolerance range into one integral object.
The combination means that objects having the same attribute field value are combined into one object and the overlapping portion is deleted.
The operator can be an operator which can be provided for a user by a data computing platform, can also be an operator written by the user on line, or can be an operator imported by the user from the outside.
The user can select the needed operator in the data computing platform according to the requirements of the service scene, and the operator is arranged, wherein the arrangement refers to the combination of the operators.
The arrangement information may include information of the selected operator and information of an arrangement for the selected operator.
The data computing platform can determine which operators are selected by the user and obtain the arrangement information of the user for arranging the selected operators.
104, generating a data set operation model by adopting the preset operator and the arrangement information;
the data computing platform can generate a data set operation model according to arrangement information of a user on a preset operator.
In practice, for some simple business requirements, a preset operator can be used as a data set operation model alone. For some complex business requirements, a plurality of operators can be combined to obtain a data set operation model.
And 105, analyzing and processing at least two formatted data sets according to the data set operation model to obtain a data analysis result.
The data set operation model can analyze and process at least two formatted data converted from original data sets from different data sources to obtain a data analysis result.
In the data processing method of the embodiment of the application, a user can load the original data set from different data sources, convert the original data set into the formatted data set with the preset data format, and complete the unified docking of various data sets. Acquiring arrangement information aiming at a preset operator and input by a user according to the requirements of a service scene, and generating a data set operation model by adopting the preset operator and the arrangement information; and analyzing and processing the at least two formatted data sets according to the data set operation model to obtain a data analysis result. For users who do not understand operator compiling, the data set operation model can be generated only by simply combining preset operators, the operation is simple and convenient, and the users can conveniently realize required data set calculation on the data set. The whole process is realized on line, and various calculation logics such as online collision, fusion, combination, intersection, union set, difference set, sequencing, regular calculation and the like can be realized on the data of different calculation platforms, so that the online value and the real-time value of the data are obtained through analysis.
Referring to fig. 2, a flowchart illustrating steps of a second embodiment of the data processing method of the present application is shown, which may specifically include the following steps:
step 201, creating a mount point for a preset data source;
the mounting point is an access address of the data computing platform in the data source, each mounting point corresponds to a Uniform Resource Locator (URL), and data of the data source can be accessed through the URL.
In actual use, a user may create a mount point on the data computing platform and generate a URL, e.g., dfs:// dfs-endpoint:10290/path1/to/dfs _ xxx _ table.
The mounting point created by the user is only used by the current user, and other users cannot see the mounting point, so that the permission isolation is realized.
The user can maintain the mounting point on the data computing platform, for example, if the mounting point is off-line, the user can prohibit others from accessing the data of the mounting point.
Step 202, loading an original data set from the mounting point;
through the URL of the mount point, the data source may be accessed and the original data set loaded from the data source to the data computing platform. By means of the storage of the mounting points, unified convergence of different data sources can be achieved.
Step 203, converting the original data set into a formatted data set with a preset data format;
the preset data format can be a data format provided by a data computing platform, or a data format defined by the user on line, or a data format imported from the outside by the user.
The preset data format may be a data format comprising a plurality of fields of indefinite length, for example a data format comprising 3 fields of indefinite length, and the data format may be: the commodity number + commodity name + commodity type, including three indefinite length fields, the field type is commodity number, commodity name, commodity type respectively.
In practice, the number and the type of the field with the indefinite length included in the data format may be set according to actual service requirements. The user can set the field number and the field type of the field with the indefinite length according to actual needs.
In this embodiment of the present application, before converting the original data set into a formatted data set in a preset data format, the method further includes: acquiring the number and the type of fields input by a user; and determining the target data format according to the field number and the field type.
The data computing platform may convert the raw data set into a formatted data set in a target data format.
Step 204, acquiring arrangement information aiming at a preset operator;
the operator can be an operator which can be provided for a user by a data computing platform, can also be an operator written by the user on line, or can be an operator imported by the user from the outside.
Referring to fig. 3, a schematic diagram of an operator editing component provided by a data computing platform is shown.
The operator editing assembly is a program assembly of a visual management operator provided by the data computing platform, and a user can perform operations such as operator writing, storage, calling and the like on the operator editing assembly. A developer or a user can edit operators on line in the operator editing assembly, and the written operators can be led into the operator editing assembly.
The operator editing assembly can correspond to the user account, and operators in the operator editing assembly can be managed after the user account logs in the data computing platform.
All operators managed by a user are recorded in an operator editing assembly, including general operators and ordinary operators.
In this embodiment of the present application, the data computing platform may further include, before acquiring the arrangement information for the preset operator;
generating and displaying a data set operation model editing interface;
detecting operator selection operation triggered by a user on the data set operation model editing interface, and determining a corresponding target preset operator in response to the operator selection operation;
and detecting the arrangement operation of a user on the editing interface of the data set operation model, responding to the arrangement operation, and generating arrangement information aiming at the target preset operator.
Referring to fig. 4, a schematic diagram of a data set operation model editing component provided by a data computing platform is shown. The data set operation model editing assembly is a visual program assembly which is provided by a data computing platform and can manage the data set operation model, and a user can write, store, call and the like the data set operation model in the data set operation model editing assembly.
The user can trigger operator selection operations (e.g., operations of clicking on an operator, dragging an operator, and selecting a box operator) from the operator editing component to select a required operator, and arrange the operators (including operation sequences, logical relations of the operators, and the like) in an arrangement box. For example, the operation order and logical relationship between operators are represented by arrows.
Step 205, generating a data set operation model by using the preset operator and the arrangement information;
the data set operation model in the embodiment of the application may be a model in a common format, and the model in the common format may include an input node, an output field, an association condition, and a screening condition in the common format.
The following is a general cross-computing model:
insert overwrite table${output}
select col1,col2,col3,col4,col5,col6,col7,col8,col9,col10
from(select${col1}as col1
,${col2}as col2
,${col3}as col3
,${col4}as col4
,${col5}as col5
,${col6}as col6
,${col7}as col7
,${col8}as col8
,${col9}as col9
,${col10}as col10
,ROW_NUMBER()OVER(PARTITION BY${col11},${col13}ORDER BY${col16}${col17})
AS rnfrom${input1}t1join${input2}t2 on${col11}=${col12}${col15}${col13}=${col14})t where t.rn=1;
wherein, col1 and col2 … … col10 are respectively defined 10 indefinite-length fields, and the fields are represented by general formats $ { col1}, $ { col2} … … $ { col10 }.
The input nodes of the model are represented in the general formats $ { input1} and $ { input2 }.
The output fields of the model include all the fields.
The correlation conditions of the models, i.e., the keys between the joins, are represented in the general format on $ { col11} $ { col12} $ { col15} $ { col13} $ { col14 }.
The association condition refers to what field is used for association between the data set a and the data set B, for example: by associating with the production date, records in both sets with the same production date can be obtained.
The screening conditions for the model are expressed in the general format where r.n. 1.
The screening condition is how to screen out the data set a and the data set B, for example: setting screening conditions: red wine, namely screening the data set A of the red wine.
In this embodiment, the data set operation model may have at least two input nodes, and the user may select a formatted data set to be input for each input node.
Before performing data set operation on at least two formatted data sets according to the data set operation model to obtain a data analysis result, generating and displaying an input node selection interface; receiving selection information input by a user on the input node selection interface; and taking the formatted data set corresponding to the selection information as the formatted data set corresponding to the input node.
The data computing platform of the embodiment of the application provides a visual input node selection mode, and a user can select a formatted data set corresponding to an input node of a data set operation model more conveniently on an input node selection interface.
In the following, an example of determining that the service requirement is satisfied by performing an intersection operation on two data sets by using an intersection operation model selected by a user is described.
Fig. 5 is a schematic diagram of an input node selection interface in the embodiment of the present application, and fig. 6 is a schematic diagram of another input node selection interface in the embodiment of the present application.
In the node selection interface, a user can set source nodes of a data set, including data sets of two source nodes, namely a first node and a second node.
The node selection interface may include a data set operation model edit item, and a user may set a data set operation model used for performing data set operation on the data set operation model edit item, such as a "general-format discretionary field intersection operation" model, and may write or modify the content of the data set operation model as needed.
The node selection interface may include an input setting item, and an edit item of "input name", "source node", and "source node output" may be set in the input setting item.
In the general operation model, the data set is represented by input names in a general format, and as shown in input table 1(input1) and input table 2(input2), the user can set the input names in the general format corresponding to the data set output from the source node.
As shown in fig. 5, the data set output from the first node may be taken as "input table 1".
As shown in fig. 6, the data set output from the second node may be taken as "input table 2".
In practice, the node selection interface may also be based on other editorial items, for example, editorial items of parameters such as association conditions and screening conditions in a common format in the dataset calculation model.
The data set operation model can respectively obtain input data from two source nodes set by a user, and the output of the data set operation model is the intersected data in the input table 1 and the input table 2.
And step 206, analyzing and processing at least two formatted data sets according to the data set operation model to obtain a data analysis result.
Since the model is expressed in a common format, a corresponding instance needs to be generated to implement the data analysis. When actually performing analysis processing, an executable instance can be generated by adopting the data set operation model and the formatted data; and operating the executable example to obtain a data analysis result.
The following is illustrated with an instantiated executable instance:
insert overwrite table dws_obj_rk_jbxx_hz_df
select
a.zjzldm as col1
,a.zjzl as col2
,a.zjhm as col3
,a.xm as col4
,b.ywx as col5
,b.ywm as col6
,b.xmqp as col7
,b.cym as col8
,b.mch as col9
mzdm as col 10- (col1-col10 are all return fields)
from dwd_rk_jcxx_czrklsxx_di a
join dwd_rk_jcxx_czrklsxx_di b
Zjhm ═ b.zjhm — (key value for join, i.e. associated field)
where a. mzdm ═ red wine — (for screening conditions, i.e. the parameter part of the model)
And completing the data analysis processing through the instantiated executable instance.
In the data processing method of the embodiment of the application, a user can load the original data set from different data sources, convert the original data set into the formatted data set with the preset data format, and complete the unified docking of various data sets. Acquiring arrangement information aiming at a preset operator and input by a user according to the requirements of a service scene, and generating a data set operation model by adopting the preset operator and the arrangement information; and analyzing and processing the at least two formatted data sets according to the data set operation model to obtain a data analysis result. For users who do not understand operator compiling, the data set operation model can be generated only by simply combining preset operators, the operation is simple and convenient, and the users can conveniently realize required data set calculation on the data set. The whole process is realized on line, and various calculation logics such as online collision, fusion, combination, intersection, union set, difference set, sequencing, regular calculation and the like can be realized on the data of different calculation platforms, so that the online value and the real-time value of the data are obtained through analysis.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.
Referring to fig. 7, a block diagram of a data processing apparatus according to an embodiment of the present application is shown, which may specifically include the following modules:
an original data set loading module 701, configured to load an original data set from a preset data source;
a format conversion module 702, configured to convert the original data set into a formatted data set in a preset data format;
an arrangement information obtaining module 703, configured to obtain arrangement information for a preset operator;
a data set operation model generating module 704, configured to generate a data set operation model by using the preset operator and the layout information;
and the data analysis module 705 is configured to analyze and process at least two formatted data sets according to the data set operation model to obtain a data analysis result.
In this embodiment of the present application, the raw data set loading module 701 may include:
the mounting point creating submodule is used for creating a mounting point aiming at a preset data source;
and the original data set loading submodule is used for loading the original data set from the mounting point.
In this embodiment, the apparatus may further include:
the field information acquisition module is used for acquiring the number and the type of fields input by a user before the format conversion module converts the original data set into a formatted data set with a preset data format;
and the target data format determining module is used for determining the target data format according to the field number and the field type.
In this embodiment of the application, the format conversion module 702 may include:
and the format conversion submodule is used for converting the original data set into a formatted data set in the target data format.
In the embodiment of the application, the device can further comprise;
the data set operation model editing interface display module is used for generating and displaying a data set operation model editing interface before the arrangement information acquisition module acquires the arrangement information aiming at the preset operator;
the target preset operator determining module is used for detecting operator selection operation triggered by a user on the data set operation model editing interface and determining a corresponding target preset operator in response to the operator selection operation;
and the arrangement information generation module is used for detecting the arrangement operation of a user on the data set operation model editing interface and responding to the arrangement operation to generate the arrangement information aiming at the target preset operator.
In an embodiment of the present application, the dataset operational model has at least two input nodes; the apparatus may further include:
the input node selection interface display module is used for generating and displaying an input node selection interface before the data analysis module performs data set operation on at least two formatted data sets according to the data set operation model to obtain a data analysis result;
the selection information receiving module is used for receiving selection information input by a user on the input node selection interface;
and the input node data determining module is used for taking the formatted data set corresponding to the selection information as the formatted data set corresponding to the input node.
In an embodiment of the present application, the data analysis module 705 may include:
the executable instance generation sub-module is used for generating an executable instance by adopting the data set operation model and the formatted data;
and the executable instance running sub-module is used for running the executable instance to obtain a data analysis result.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
An embodiment of the present application further provides an apparatus, including:
one or more processors; and
one or more machine-readable media having instructions stored thereon, which when executed by the one or more processors, cause the apparatus to perform methods as described in embodiments of the present application.
Embodiments of the present application also provide one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause the processors to perform the methods of embodiments of the present application.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one of skill in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The foregoing detailed description is directed to a data processing method and a data processing apparatus provided in the present application, and specific examples are applied herein to explain the principles and implementations of the present application, and the descriptions of the foregoing examples are only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (16)

1. A data processing method, comprising:
loading an original data set from a preset data source;
converting the original data set into a formatted data set with a preset data format;
acquiring arrangement information aiming at a preset operator;
generating a data set operation model by adopting the preset operator and the arrangement information;
and analyzing and processing at least two formatted data sets according to the data set operation model to obtain a data analysis result.
2. The method of claim 1, wherein loading the raw data set from a predetermined data source comprises:
creating a mounting point aiming at a preset data source;
and loading the original data set from the mounting point.
3. The method of claim 1, further comprising, prior to converting the raw data set to a formatted data set in a predetermined data format:
acquiring the number and the type of fields input by a user;
and determining the target data format according to the field number and the field type.
4. The method of claim 3, wherein converting the raw data set into a formatted data set in a predetermined data format comprises:
and converting the original data set into a formatted data set in the target data format.
5. The method according to claim 1, wherein before acquiring the layout information for the preset operator, the method further comprises;
generating and displaying a data set operation model editing interface;
detecting operator selection operation triggered by a user on the data set operation model editing interface, and determining a corresponding target preset operator in response to the operator selection operation;
and detecting the arrangement operation of a user on the editing interface of the data set operation model, responding to the arrangement operation, and generating arrangement information aiming at the target preset operator.
6. The method of claim 1, wherein the dataset operational model has at least two input nodes; before performing data set operation on at least two formatted data sets according to the data set operation model to obtain a data analysis result, the method further comprises the following steps:
generating and displaying an input node selection interface;
receiving selection information input by a user on the input node selection interface;
and taking the formatted data set corresponding to the selection information as the formatted data set corresponding to the input node.
7. The method of claim 1, wherein analyzing at least two formatted data sets according to the data set operation model to obtain data analysis results comprises:
generating an executable instance by using the dataset operational model and the formatted data;
and operating the executable example to obtain a data analysis result.
8. A data processing apparatus, comprising:
the original data set loading module is used for loading an original data set from a preset data source;
the format conversion module is used for converting the original data set into a formatted data set with a preset data format;
the arrangement information acquisition module is used for acquiring arrangement information aiming at a preset operator;
the data set operation model generation module is used for generating a data set operation model by adopting the preset operator and the arrangement information;
and the data analysis module is used for analyzing and processing at least two formatted data sets according to the data set operation model to obtain a data analysis result.
9. The apparatus of claim 8, wherein the raw data set loading module comprises:
the mounting point creating submodule is used for creating a mounting point aiming at a preset data source;
and the original data set loading submodule is used for loading the original data set from the mounting point.
10. The apparatus of claim 8, further comprising:
the field information acquisition module is used for acquiring the number and the type of fields input by a user before the format conversion module converts the original data set into a formatted data set with a preset data format;
and the target data format determining module is used for determining the target data format according to the field number and the field type.
11. The apparatus of claim 10, wherein the format conversion module comprises:
and the format conversion submodule is used for converting the original data set into a formatted data set in the target data format.
12. The apparatus of claim 8, further comprising;
the data set operation model editing interface display module is used for generating and displaying a data set operation model editing interface before the arrangement information acquisition module acquires the arrangement information aiming at the preset operator;
the target preset operator determining module is used for detecting operator selection operation triggered by a user on the data set operation model editing interface and determining a corresponding target preset operator in response to the operator selection operation;
and the arrangement information generation module is used for detecting the arrangement operation of a user on the data set operation model editing interface and responding to the arrangement operation to generate the arrangement information aiming at the target preset operator.
13. The apparatus of claim 8, wherein the dataset operational model has at least two input nodes; the device further comprises:
the input node selection interface display module is used for generating and displaying an input node selection interface before the data analysis module performs data set operation on at least two formatted data sets according to the data set operation model to obtain a data analysis result;
the selection information receiving module is used for receiving selection information input by a user on the input node selection interface;
and the input node data determining module is used for taking the formatted data set corresponding to the selection information as the formatted data set corresponding to the input node.
14. The apparatus of claim 8, wherein the data analysis module comprises:
the executable instance generation sub-module is used for generating an executable instance by adopting the data set operation model and the formatted data;
and the executable instance running sub-module is used for running the executable instance to obtain a data analysis result.
15. An apparatus, comprising:
one or more processors; and
one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of one or more of claims 1-7.
16. One or more machine readable media having instructions stored thereon that, when executed by one or more processors, cause the processors to perform the method of one or more of claims 1-7.
CN201911245124.5A 2019-12-06 2019-12-06 Data processing method and device Pending CN112925838A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911245124.5A CN112925838A (en) 2019-12-06 2019-12-06 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911245124.5A CN112925838A (en) 2019-12-06 2019-12-06 Data processing method and device

Publications (1)

Publication Number Publication Date
CN112925838A true CN112925838A (en) 2021-06-08

Family

ID=76162179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911245124.5A Pending CN112925838A (en) 2019-12-06 2019-12-06 Data processing method and device

Country Status (1)

Country Link
CN (1) CN112925838A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326238A (en) * 2021-06-25 2021-08-31 深信服科技股份有限公司 Data processing method, device, equipment and storage medium
CN113469284A (en) * 2021-07-26 2021-10-01 浙江大华技术股份有限公司 Data analysis method, device and storage medium
CN114417408A (en) * 2022-01-18 2022-04-29 百度在线网络技术(北京)有限公司 Data processing method, device, equipment and storage medium
CN113469284B (en) * 2021-07-26 2024-07-02 浙江大华技术股份有限公司 Data analysis method, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874482A (en) * 2017-02-20 2017-06-20 山东鲁能软件技术有限公司 A kind of device and method of the patterned data prediction based on big data technology
CN106874483A (en) * 2017-02-20 2017-06-20 山东鲁能软件技术有限公司 A kind of device and method of the patterned quality of data evaluation and test based on big data technology
CN107403403A (en) * 2016-05-20 2017-11-28 大唐移动通信设备有限公司 A kind of business datum analysis method and device
CN110187829A (en) * 2019-04-22 2019-08-30 上海蔚来汽车有限公司 A kind of data processing method, device, system and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403403A (en) * 2016-05-20 2017-11-28 大唐移动通信设备有限公司 A kind of business datum analysis method and device
CN106874482A (en) * 2017-02-20 2017-06-20 山东鲁能软件技术有限公司 A kind of device and method of the patterned data prediction based on big data technology
CN106874483A (en) * 2017-02-20 2017-06-20 山东鲁能软件技术有限公司 A kind of device and method of the patterned quality of data evaluation and test based on big data technology
CN110187829A (en) * 2019-04-22 2019-08-30 上海蔚来汽车有限公司 A kind of data processing method, device, system and electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326238A (en) * 2021-06-25 2021-08-31 深信服科技股份有限公司 Data processing method, device, equipment and storage medium
CN113469284A (en) * 2021-07-26 2021-10-01 浙江大华技术股份有限公司 Data analysis method, device and storage medium
CN113469284B (en) * 2021-07-26 2024-07-02 浙江大华技术股份有限公司 Data analysis method, device and storage medium
CN114417408A (en) * 2022-01-18 2022-04-29 百度在线网络技术(北京)有限公司 Data processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US11106626B2 (en) Managing changes to one or more files via linked mapping records
RU2340937C2 (en) Declarative sequential report parametrisation
US20140310053A1 (en) Method and systems for providing business process suggestions and recommendations utilizing a business process modeler
CN102193781A (en) Integrated design application
US10417924B2 (en) Visual work instructions for assembling product
CN103984818A (en) AUV (autonomous underwater vehicle) design flow visualization modeling method based on Flex technology
WO2018093493A1 (en) Method and system for multi-modal lineage tracing & impact assessment in a concept lineage data flow network
KR100672894B1 (en) Apparatus and method for product-line architecture description and verification
CN109947399A (en) Code structure generation method, device, computer installation and readable storage medium storing program for executing
CN112925838A (en) Data processing method and device
CN112465448A (en) Cross-organization workflow operation method and system based on block chain
EP1548581A2 (en) Methods, apparatus and programs for system development
US8296725B2 (en) Framework for variation oriented analysis for service-oriented architecture
CN110019207B (en) Data processing method and device and script display method and device
US8533616B2 (en) Time support for service adaptation
Kesäniemi et al. Using Wikibase for managing cultural heritage linked open data based on CIDOC CRM
KR20200061851A (en) VR authoring platform and language conversion system for utilizing multiple VR development engines based on HTML5
Rossbacher et al. Flexible parameterization strategies in automotive 3D vehicle layout
CN112764742A (en) Data processing method and device
Caro-Martinez et al. iSeeE3—The Explanation Experiences Editor
CN105844448A (en) Human resource management system
Mukhtar et al. WSDMDA: An enhanced model driven web engineering methodology
US7650260B1 (en) Method and system for designing objects using functional object representation
US20130090962A1 (en) Providing variability and materialization over links connecting product line resources
CN109471410A (en) Dynamic previewing in Production Lifecycle Management environment generates

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination