CN105573836A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN105573836A
CN105573836A CN201610098936.1A CN201610098936A CN105573836A CN 105573836 A CN105573836 A CN 105573836A CN 201610098936 A CN201610098936 A CN 201610098936A CN 105573836 A CN105573836 A CN 105573836A
Authority
CN
China
Prior art keywords
node
data processing
processing model
object instance
model object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610098936.1A
Other languages
Chinese (zh)
Other versions
CN105573836B (en
Inventor
刘志丹
王鑫毅
刘龙
曹震
于雪龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN201610098936.1A priority Critical patent/CN105573836B/en
Publication of CN105573836A publication Critical patent/CN105573836A/en
Application granted granted Critical
Publication of CN105573836B publication Critical patent/CN105573836B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/483Multiproc

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data processing method and device. An oriented graph is used for representing a data processing model, if a data set corresponding to a father node of any one node in the node list is not processed when an instruction sent by a client-side and carrying a node list is received, the data set corresponding to the father node of the node is preferentially processed; if the data set corresponding to the father node of the node is already processed, an output data set of the father node is directly read from an execution context to serve as an input data set of the node, the input data set of the node is processed based on the data set corresponding to the node to generate an output data set of the node, and the output data set of the node is recorded into the execution context. It is thus clear that by the adoption of the data processing method, the successively processed data sets of nodes are repeatedly processed no longer, only parts of node data are processed, and accordingly the data processing efficiency is improved.

Description

Data processing method and device
Technical field
The present invention relates to technical field of data processing, more particularly, relate to a kind of data processing method and device.
Background technology
Spark is a kind of distributed computing system efficiently, and Spark under the data scale of terabyte (TB) rank, can carry out data mining and analysis.Spark is used to carry out data processing, need to be grasped the one in Java, Scala, Python tri-kinds of language, usual analyst needs the scene of data analysis one of above three kinds of language to be embodied as fixing program, then be machine recognizable file by program compilation, loaded by Java Virtual Machine and explain and perform this file.
But in the scene of data analysis, analyst does not often have clear and definite analytical mathematics in the early stage, need in data, use various statistic algorithm to attempt, finally in conjunction with experience, the most effective or explainable data analysis process are solidified.In this process, analyst needs to carry out a large amount of changes to program, and each change all needs flow process program file being re-started to compiling, execution, this brings the inconvenience of two aspects: one be the amendment of each program file, compiling, execution all need to spend analyst's regular hour, two is that re-executing of program will cause all nodes in flow chart of data processing to need all to re-execute, under large data processing background, the performance period of program will be very consuming time, and analyst needs the result of wasting the rear programs to be modified such as a large amount of time.Data-handling efficiency entirety is lower.
Therefore how to improve data-handling efficiency and become problem demanding prompt solution.
Summary of the invention
The object of this invention is to provide a kind of data processing method and device, to improve data-handling efficiency.
For achieving the above object, the invention provides following technical scheme:
A kind of data processing method, comprising:
The data processing model description document sent based on client obtains the data processing model object instance corresponding with described data processing model description document; Described data processing model description document is converted to by data processing model, described data processing model is digraph, node in described digraph comprises, comprise the running node of at least one father node and do not comprise the data source nodes of any father node, the corresponding data set of each node in described digraph;
When receive client send carry the execution instruction of the node listing be made up of the some nodes in described data processing model object instance time, for the first node in described node listing, if the input data of described first node come from the father node of described first node, and the data set corresponding to the father node of described first node is not successfully processed, then the father node of described first node is added described node listing and priority processing; If the input data of described first node come from the father node of described first node, and the data set corresponding to the father node of described first node is successfully processed, the input data set of output data set as described first node of the father node of described first node is then obtained from execution context, the data set input data set to described first node corresponding based on described first node processes, generate the output data set of described first node, the output data set of described first node is charged to execution context; Described first node is any node in described node listing.
Said method, preferably, the described data processing model description document sent based on client obtains the data processing model object instance corresponding with described data processing model description document and comprises:
The data processing model description document described client sent is converted into the first data processing model object instance;
Unique identifier according to data processing model judges whether described data processing model description document was created data processing model object instance;
If described data processing model description document was not created data processing model object instance, then described first data processing model object instance was defined as the data processing model object instance corresponding with described data processing model description document;
If described data processing model description document has been created data processing model object instance, then described first data processing model object instance is merged with the second data processing model object instance corresponding with described data processing model description document created, obtain the data processing model object instance corresponding with described data processing model description document.
Said method, preferably, described merging with the second data processing model object instance corresponding with described data processing model description document created by described first data processing model object instance comprises:
Described first data processing model object instance and described second data processing model object instance are compared;
For Section Point in described first data processing model object instance, and there is with described Section Point in described second data processing model object instance the 3rd node of identical unique identifier, if the parameter of the data centralization that the parameter of the data centralization that described Section Point is corresponding is corresponding from described 3rd node is different, data set corresponding for described Section Point is updated to described 3rd node, and is not processed state by described 3rd vertex ticks;
If there is the 4th node in described first data processing model object instance, and in described second data processing model object instance, do not comprise described 4th node, by in the second data processing model object instance described in described 4th node city, and be not processed state by the 4th vertex ticks described in described second data processing model object instance;
If there is the 5th node in described second data processing model object instance, and in described first data processing model object instance, do not comprise described 5th node, by the 5th knot removal in described second data processing model object instance, and all child nodes of described 5th node are labeled as not processed state;
By in described second data processing model object instance, the state of all father nodes is the vertex ticks of not processed state is not processed state.
Said method, preferably, the input data set of the described data set corresponding based on described first node to described first node processes, and the output data set generating described first node comprises:
The data set corresponding based on described first node generates the handling function file corresponding with described first node;
Handling function file described in on-the-flier compiler also loads corresponding function object;
Described function object is performed to the input data set of described first node, generates the output data set of described first node.
Said method, preferably, the described data set corresponding based on described first node generates the handling function file corresponding with described first node and comprises:
Type and the parameter of described first node is read from the data centralization that described first node is corresponding;
Type based on described first node determines the program file template corresponding with described first node;
Described parameter is inserted the described program file template generation program source file corresponding with described first node;
Described program source file is compiled, obtains the handling function file corresponding with described first node.
A kind of data processing equipment, comprising:
Acquisition module, obtains the data processing model object instance corresponding with described data processing model description document for the data processing model description document sent based on client; Described data processing model description document is converted to by data processing model, described data processing model is digraph, node in described digraph comprises, comprise the running node of at least one father node and do not comprise the data source nodes of any father node, the corresponding data set of each node in described digraph;
Processing module, for when receive client send carry the execution instruction of the node listing be made up of the some nodes in described data processing model object instance time, for the first node in described node listing, if the input data of described first node come from the father node of described first node, and the data set corresponding to the father node of described first node is not successfully processed, then the father node of described first node is added described node listing and priority processing; If the input data of described first node come from the father node of described first node, and the data set corresponding to the father node of described first node is successfully processed, the input data set of output data set as described first node of the father node of described first node is then obtained from execution context, the data set input data set to described first node corresponding based on described first node processes, generate the output data set of described first node, the output data set of described first node is charged to execution context; Described first node is any node in described node listing.
Said apparatus, preferably, described acquisition module comprises:
Transformant module, the data processing model description document for described client being sent is converted into the first data processing model object instance;
Judge submodule, judge whether described data processing model description document was created data processing model object instance for the unique identifier according to data processing model;
Determine submodule, if be not created data processing model object instance for described data processing model description document, then described first data processing model object instance was defined as the data processing model object instance corresponding with described data processing model description document;
Merge submodule, if be created data processing model object instance for described data processing model description document, then described first data processing model object instance is merged with the second data processing model object instance corresponding with described data processing model description document created, obtain the data processing model object instance corresponding with described data processing model description document.
Said apparatus, preferably, described merging submodule comprises:
Comparing unit, for comparing described first data processing model object instance and described second data processing model object instance;
First processing unit, for for Section Point in described first data processing model object instance, and there is with described Section Point in described second data processing model object instance the 3rd node of identical unique identifier, if the parameter of the data centralization that the parameter of the data centralization that described Section Point is corresponding is corresponding from described 3rd node is different, data set corresponding for described Section Point is updated to described 3rd node, and is not processed state by described 3rd vertex ticks;
Second processing unit, if for there is the 4th node in described first data processing model object instance, and in described second data processing model object instance, do not comprise described 4th node, by in the second data processing model object instance described in described 4th node city, and be not processed state by the 4th vertex ticks described in described second data processing model object instance;
3rd processing unit, if for there is the 5th node in described second data processing model object instance, and in described first data processing model object instance, do not comprise described 5th node, by the 5th knot removal in described second data processing model object instance, and all child nodes of described 5th node are labeled as not processed state;
Fourth processing unit, for by described second data processing model object instance, the state of all father nodes is the vertex ticks of not processed state is not processed state.
Said apparatus, preferably, process at the input data set of data set to described first node corresponding based on described first node, generate the aspect of the output data set of described first node, described processing module specifically for, generate the handling function file corresponding with described first node based on data set corresponding to described first node; Handling function file described in on-the-flier compiler also loads corresponding function object; Described function object is performed to the input data set of described first node, generates the output data set of described first node.
Said apparatus, preferably, the data set corresponding based on described first node generate the handling function file corresponding with described first node in, described processing module specifically for, read type and the parameter of described first node from the data centralization that described first node is corresponding; Type based on described first node determines the program file template corresponding with described first node; Described parameter is inserted the described program file template generation program source file corresponding with described first node; Described program source file is compiled, obtains the handling function file corresponding with described first node.
Known by above scheme, a kind of data processing method that the application provides and device, data processing model is represented with digraph, receive client send carry the instruction of node listing time, to any node in node listing, if the data set that the father node of this node is corresponding is not processed, then the preferential data set corresponding to the father node of this node processes, if the data set that the father node of this node is corresponding is processed, the then direct input data set of output data set as this node from performing context reading father node, the data set input data set to this node corresponding based on this node processes, generate the output data set of this node, the output data set of this node is charged to execution context.Visible, the data processing method that the embodiment of the present invention provides, the data set no longer re-treatment of the node be successfully processed, realizes only processing the data of part of nodes, thus improves data-handling efficiency.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
A kind of realization flow figure of the data processing method that Fig. 1 provides for the embodiment of the present application;
A kind of exemplary plot of the data processing model that Fig. 2 provides for the embodiment of the present application;
A kind of realization flow figure of the data processing model object instance corresponding with data processing model description document based on the data processing model description document acquisition of client transmission that Fig. 3 provides for the embodiment of the present application;
The data set input data set to first node corresponding based on first node that Fig. 4 provides for the embodiment of the present application processes, and generates a kind of realization flow figure of the output data set of first node;
Fig. 5 generates a kind of realization flow figure of the handling function file corresponding with first node for the data set corresponding based on first node that the embodiment of the present application provides;
The another kind of realization flow figure of the data processing method that Fig. 6 provides for the embodiment of the present application;
A kind of structural representation of the data processing equipment that Fig. 7 provides for the embodiment of the present application;
A kind of structural representation of the acquisition module that Fig. 8 provides for the embodiment of the present application;
A kind of structural representation of the merging submodule that Fig. 9 provides for the embodiment of the present application.
Term " first ", " second ", " the 3rd " " 4th " etc. (if existence) in instructions and claims and above-mentioned accompanying drawing are for distinguishing similar part, and need not be used for describing specific order or precedence.Should be appreciated that the data used like this can be exchanged in the appropriate case, so that the embodiment of the application described herein can be implemented with the order except illustrated here.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not paying the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
The data processing method that the embodiment of the present invention provides and device, can be applied in distributed computing system Spark, to realize the interactive processing of data set.
Refer to Fig. 1, a kind of realization flow figure of the data processing method that Fig. 1 provides for the embodiment of the present application, can comprise:
Step S11: the data processing model description document sent based on client obtains the data processing model object instance corresponding with this data processing model description document;
Data processing model description document is converted to by data processing model, the information of the coded system data of description transaction module figure of data processing model description document agreement.Data processing model is digraph, and the node in digraph comprises, and comprises the running node of at least one father node and does not comprise the data source nodes of any father node, the corresponding data set of each node in digraph;
In the embodiment of the present invention, user sets up data processing model in client according to the scene of data analysis, and the data processing model of foundation is converted to data processing model description document and sends to server by client.
Data processing model is a digraph.As shown in Figure 2, a kind of exemplary plot of the data processing model provided for the embodiment of the present invention.Digraph is made up of some nodes, and each node characterizes a data processing unit, and it contains and obtain input data, processes the functional module such as (performing one piece of data analysis logic to input data), specimens preserving result to the data of input.Digraph at least should have a node, and as source node, (such node does not rely on the data of other node as input, but directly read data from other external system), the result of the internodal dependence father node that all the other nodes describe according to directed edge is as the input data of oneself.
Digraph comprises two category nodes, and a class is the data source nodes not comprising any father node, and as the 1-3 node in Fig. 2, another kind of is the running node comprising at least one father node, as the 4-9 node in Fig. 2.And, all corresponding data set of each node in digraph.Wherein, the father node of No. 5 nodes is No. 4 nodes, and No. 5 nodes are the father nodes of No. 6 nodes.
Data set corresponding to node is for generating the program file corresponding with this node.The data centralization that each node is corresponding comprises: the type information of node, and user configured node parameter.Wherein,
The running node of corresponding set operation class, node type can comprise: Map (mapping one to one), Filter (filtration), FlatMap (one-to-many mapping), Union (union), sample (sampling), intersection (common factor), distinct (remove and repeat record), reduceByKey (merging according to major key), join (connecting according in major key), cartesian (cartesian product), subtract (difference set)
Correspondence imports and exports the running node of class of operation, and node type can comprise: HDFSInput (importing HDFS file), HDFSOutput (exporting to HDFS)
The running node of corresponding mining algorithm class, node type can comprise: classification, cluster, frequent episode three major types algorithm, and an algorithm abstraction is a node.
Different according to node type, node parameter also can be different.Such as, for HDFSInput node, user configured node parameter is needed to comprise: the path, file layout, document No. etc. of input file; And for Filter node, need user input according to figure data filtering rule etc.
In addition, in data processing model, each node comprises a state flag bit, the state of each node converts between Dirty, Running, Clean, Error tetra-kinds of states, this node of Dirty state representation is not yet processed, Running represents that this node is processed, and Clean represents that this node is successfully processed, and Error represents that this node is processed in process and makes mistakes.
In addition, after each node is successfully executed, also the execution result of this node can be charged to execution context, so that the child node of this node uses the Output rusults of this node.
Optionally, after each node is successfully executed, by the execution context of this node stored in the buffer memory pre-set, so that the child node of this node reads input data set from buffer memory, treatment effeciency can be promoted further.
Step S12: when receive client send carry the execution instruction of the node listing be made up of the some nodes in data processing model object instance time, the data set corresponding to the node of specifying in described node listing processes;
The above-mentioned execution instruction carrying node listing is triggered after specified node by user and generates in data processing model example, user can specify a node, also can specify two or more nodes, certainly, user also can in specific data transaction module example in whole nodes.The node comprised in node listing is the node that user specifies.
For any one node in node listing, for sake of convenience, be designated as first node, if the input data of first node come from the father node of first node, and data set corresponding to the father node of first node is not processed, then the father node of first node is added node listing and priority processing; If the input data of first node come from the father node of first node, and the data set corresponding to the father node of first node is successfully processed, the input data set of output data set as first node of the father node of first node is then obtained from execution context, the data set input data set to first node corresponding based on first node processes, generate the output data set of first node, the output data set of first node is charged to execution context; First node is any node in node listing.
The execution instruction that client sends comprises node listing, and the node comprised in this node listing is the part or all of node in data processing model example.
For the first node in node listing, if the output being input as the father node of this first node of first node, then first judge whether the father node of first node is successfully processed, if the father node of first node also (comprising: not yet processed for being successfully processed, be processed, make mistakes in processed process), then first process the father node of first node, after the father node of first node is successfully processed, reprocessing first node; If the father node of first node is successfully processed, then the direct output data set reading the father node of first node from execution context, and need not process the father node of first node again.
The data processing method that the embodiment of the present invention provides, data processing model is represented with digraph, receive client send carry the instruction of node listing time, to any node in node listing, if the data set that the father node of this node is corresponding is not processed, then the preferential data set corresponding to the father node of this node processes, if the data set that the father node of this node is corresponding is processed, the then direct input data set of output data set as this node from performing context reading father node, the data set input data set to this node corresponding based on this node processes, generate the output data set of this node, the output data set of this node is charged to execution context.Visible, the data processing method that the embodiment of the present invention provides, the data set no longer re-treatment of the node be successfully processed, realizes only processing the data of part of nodes, thus improves data-handling efficiency.
Optionally, the data processing model description document based on client transmission that the embodiment of the present invention provides obtains a kind of realization flow figure of the data processing model object instance corresponding with data processing model description document as shown in Figure 3, can comprise:
Step S31: the data processing model description document that client sends is converted into the first data processing model object instance;
In the embodiment of the present invention, after receiving the data processing model description document of client transmission, the data processing model description document that client sends is converted into data processing model object instance (for sake of convenience, being designated as the first data processing model object instance).
Step S32: the unique identifier according to data processing model judges whether data processing model description document was created data processing model object instance;
In the embodiment of the present invention, each data processing model has a unique identifier, as UUID (UniversallyUniqueIdentifier, general unique identifier), after data processing model description document is converted into data processing model object instance, the corresponding relation between unique identifier and data processing model object instance can be set up.
The unique identifier that the unique identifier that data processing model object instance is corresponding if having is corresponding with the first data processing model object instance is consistent, illustrate that data processing model description document had been created data processing model object instance, otherwise, determine that data processing model description document was not created data processing model object instance.
Step S33: if data processing model description document was not created data processing model object instance, be then defined as the data processing model object instance corresponding with data processing model description document by the first data processing model object instance;
Step S34: if data processing model description document has been created data processing model object instance, then by the first data processing model object instance with the data processing model object instance corresponding with data processing model description document created (for ease of describing, be designated as second according to transaction module object instance) merge, obtain the data processing model object instance corresponding with data processing model description document.
If data processing model description document has been created data processing model object instance, illustrate that user revises data processing model, needed to upgrade the data processing model object instance corresponding with data processing model description document.
First data processing model object instance is merged with the second data processing model object instance corresponding with data processing model description document created and is specially: according to the first data processing model object instance, the second data processing model object instance is upgraded.
Optionally, what the embodiment of the present invention provided realizes the one that the first data processing model object instance and the second data processing model object instance corresponding with data processing model description document created merge can be:
First data processing model object instance and the second data processing model object instance are compared;
By comparing, determine that the first data processing model object instance is compared with the second data processing model object instance, whether the node with identical unique identifier exists difference, and, whether the first data processing model object instance, compared with the second data processing model object instance, increases or decreases node.
For Section Point in the first data processing model object instance, and second has identical unique identifier in data processing model object instance the 3rd node with aforementioned Section Point, if the parameter of the data centralization that the parameter of the data centralization that Section Point is corresponding is corresponding from the 3rd node is different, data set corresponding for Section Point is updated to the 3rd node, and is not processed state by the 3rd vertex ticks;
And if the parameter of the parameter of data centralization corresponding to the Section Point data centralization corresponding with the 3rd node is identical, then not corresponding to the 3rd node data set is modified.
If have the 4th node in the first data processing model object instance, and in the second data processing model object instance, do not comprise the 4th node, by in the 4th node city second data processing model object instance, and be not processed state by the 4th vertex ticks in the second data processing model object instance;
There is in first data processing model object instance the 4th node, and in the second data processing model object instance, do not comprise the 4th node, the node that user increases in the process of Update Table transaction module is described.
If have the 5th node in the second data processing model object instance, and in the first data processing model object instance, do not comprise the 5th node, by the 5th knot removal in the second data processing model object instance, and all child nodes of the 5th node are labeled as not processed state;
There is in second data processing model object instance the 5th node, and in the first data processing model object instance, do not comprise the 5th node, illustrate that user deletes the 5th node in the process of Update Table transaction module.
By in the second data processing model object instance, the state of all father nodes is the vertex ticks of not processed state is not processed state.
Upgrading through aforementioned nodes, after increase or deletion of node operate, traveling through from data source nodes the second data processing model object instance, is that the vertex ticks of not processed state is not processed state by the state of all father nodes.That is, for any node (for convenience of describing, being designated as the 6th node) in the second data processing model object instance, if the father node of the 6th node is not processed state, then the 6th node is also labeled as not processed state.
Optionally, the data set input data set to first node corresponding based on first node that the embodiment of the present invention provides processes, and a kind of realization flow figure generating the output data set of first node as shown in Figure 4, can comprise:
Step S41: the data set corresponding based on first node generates the handling function file corresponding with first node;
Step S42: the handling function file that on-the-flier compiler generates also loads corresponding function object;
Step S43: function object is performed to the input data set of first node, generates the output data set of first node.
Optionally, the data set corresponding based on first node that the embodiment of the present invention provides generates a kind of realization flow figure of the handling function file corresponding with first node as shown in Figure 5, can comprise:
Step S51: the type and the parameter that read first node from the data centralization that first node is corresponding;
Step S52: the type based on first node determines the program file template corresponding with first node;
In the embodiment of the present invention, the program file template that different node types is corresponding different.
Step S53: parameter is inserted the program source file that determined program file template generation is corresponding with first node;
Step S54: compile generated program source file, obtains the handling function file corresponding with first node.
Optionally, the another kind of realization flow figure of the data processing method that the embodiment of the present invention provides as shown in Figure 6, can comprise:
Step S61: receive the data processing model description document that client sends, and data processing model description document is converted into an example M' of data processing model object;
Step S62: judge whether the example of data processing model object was created according to the unique identifier of M'; If not, then step S63 is entered; If so, then step S64 is entered;
Step S63: use Map data structure to preserve the start address of the storage of data processing model example M', can step S65 be performed afterwards;
Step S64: the data model example M finding record in Map, perform M and M' merge algorithm, the information updating of carrying in M' the most at last, in M, can perform step S65 afterwards;
Step S65: receive the execution instruction that user is sent by client, performs in instruction and comprises the node listing be made up of nodes some in M, and this list must perform all nodes listed in list for illustration of this;
Step S66: determine whether that performing data processing model application crosses Spark resource; If not, enter step S67, if so, enter step S68;
Step S67: to Spark cluster application computational resource, performs step S68 afterwards;
Step S68: the node in node listing is processed, for each node in node listing, if the output being input as the father node of this node of this node, then first judge whether the father node of this node is successfully processed, if the father node of this node is also for being successfully processed, then first process the father node of this node, after the father node of this node is successfully processed, this node of reprocessing; If the father node of this node is successfully processed, then the direct output data set reading the father node of this node from execution context, and no longer the father node of this node is processed;
Step S69: judge whether client sends termination signal (this termination signal triggers generation by user in client); If so, then enter step S610, if not, then return step S61;
Step S610: send release computational resource signal to Spark.
Corresponding with embodiment of the method, the embodiment of the present invention also provides a kind of data processing equipment, and a kind of structural representation of the data processing equipment that the embodiment of the present invention provides as shown in Figure 7, can comprise:
Acquisition module 71 and processing module 72; Wherein,
Acquisition module 71 obtains the data processing model object instance corresponding with data processing model description document for the data processing model description document sent based on client; Data processing model description document is converted to by data processing model, data processing model is digraph, node in digraph comprises, and comprises the running node of at least one father node and does not comprise the data source nodes of any father node, the corresponding data set of each node in digraph;
Processing module 72 for when receive client send carry the execution instruction of the node listing be made up of the some nodes in data processing model object instance time, for the first node in node listing, if the input data of first node come from the father node of first node, and the data set corresponding to the father node of first node is not successfully processed, then the father node of first node is added node listing and priority processing; If the input data of first node come from the father node of first node, and the data set corresponding to the father node of first node is successfully processed, the input data set of output data set as first node of the father node of first node is then obtained from execution context, the data set input data set to first node corresponding based on first node processes, generate the output data set of first node, the output data set of first node is charged to execution context; First node is any node in node listing.
The data processing equipment that the embodiment of the present invention provides, data processing model is represented with digraph, receive client send carry the instruction of node listing time, to any node in node listing, if the data set that the father node of this node is corresponding is not processed, then the preferential data set corresponding to the father node of this node processes, if the data set that the father node of this node is corresponding is processed, the then direct input data set of output data set as this node from performing context reading father node, the data set input data set to this node corresponding based on this node processes, generate the output data set of this node, the output data set of this node is charged to execution context.Visible, the data processing equipment that the embodiment of the present invention provides, the data set no longer re-treatment of the node be successfully processed, realizes only processing the data of part of nodes, thus improves data-handling efficiency.
Optionally, a kind of structural representation of the acquisition module 71 that the embodiment of the present invention provides as shown in Figure 8, can comprise:
Transformant module 81, judges submodule 82, determines submodule 83 and merges submodule 84; Wherein,
Transformant module 81 is converted into the first data processing model object instance for data processing model description document client sent;
Judge for the unique identifier according to data processing model, submodule 82 judges whether data processing model description document was created data processing model object instance;
If determine, submodule 83 was not created data processing model object instance for data processing model description document, then the first data processing model object instance is defined as the data processing model object instance corresponding with data processing model description document;
If merge submodule 84 be created data processing model object instance for data processing model description document, then the first data processing model object instance is merged with the second data processing model object instance corresponding with data processing model description document created, obtain the data processing model object instance corresponding with data processing model description document.
Optionally, a kind of structural representation of the merging submodule 84 that the embodiment of the present invention provides as shown in Figure 9, can comprise:
Comparing unit 91, the first processing unit 92, second processing unit the 93, three processing unit 94 and fourth processing unit 95; Wherein,
Comparing unit 91 is for comparing the first data processing model object instance and the second data processing model object instance;
First processing unit 92 is for for Section Point in the first data processing model object instance, and second has identical unique identifier in data processing model object instance the 3rd node with Section Point, if the parameter of the data centralization that the parameter of the data centralization that Section Point is corresponding is corresponding from the 3rd node is different, data set corresponding for Section Point is updated to the 3rd node, and is not processed state by the 3rd vertex ticks;
If the second processing unit 93 is for having the 4th node in the first data processing model object instance, and in the second data processing model object instance, do not comprise the 4th node, by in the 4th node city second data processing model object instance, and be not processed state by the 4th vertex ticks in the second data processing model object instance;
If the 3rd processing unit 94 is for having the 5th node in the second data processing model object instance, and in the first data processing model object instance, do not comprise the 5th node, by the 5th knot removal in the second data processing model object instance, and all child nodes of the 5th node are labeled as not processed state;
Fourth processing unit 95 is for by the second data processing model object instance, and the state of all father nodes is the vertex ticks of not processed state is not processed state.
Optionally, process at the input data set of data set to first node corresponding based on first node, generate the aspect of output data set of first node, processing module 72 specifically for, the data set corresponding based on first node generates the handling function file corresponding with first node; Handling function file described in on-the-flier compiler also loads corresponding function object; Function object is performed to the input data set of first node, generates the output data set of first node.
Optionally, the data set corresponding based on first node generate the handling function file corresponding with first node in, processing module 72 specifically for, from type and the parameter of data centralization reading first node corresponding to first node; Type based on first node determines the program file template corresponding with first node; Parameter is inserted the program source file that program file template generation is corresponding with first node; Program source file is compiled, obtains the handling function file corresponding with first node.
To the above-mentioned explanation of the disclosed embodiments, professional and technical personnel in the field are realized or uses the present invention.To be apparent for those skilled in the art to the multiple amendment of these embodiments, General Principle as defined herein can without departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention can not be restricted to these embodiments shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims (10)

1. a data processing method, is characterized in that, comprising:
The data processing model description document sent based on client obtains the data processing model object instance corresponding with described data processing model description document; Described data processing model description document is converted to by data processing model, described data processing model is digraph, node in described digraph comprises, comprise the running node of at least one father node and do not comprise the data source nodes of any father node, the corresponding data set of each node in described digraph;
When receive client send carry the execution instruction of the node listing be made up of the some nodes in described data processing model object instance time, for the first node in described node listing, if the input data of described first node come from the father node of described first node, and the data set corresponding to the father node of described first node is not successfully processed, then the father node of described first node is added described node listing and priority processing; If the input data of described first node come from the father node of described first node, and the data set corresponding to the father node of described first node is successfully processed, the input data set of output data set as described first node of the father node of described first node is then obtained from execution context, the data set input data set to described first node corresponding based on described first node processes, generate the output data set of described first node, the output data set of described first node is charged to execution context; Described first node is any node in described node listing.
2. method according to claim 1, is characterized in that, the described data processing model description document sent based on client obtains the data processing model object instance corresponding with described data processing model description document and comprises:
The data processing model description document described client sent is converted into the first data processing model object instance;
Unique identifier according to data processing model judges whether described data processing model description document was created data processing model object instance;
If described data processing model description document was not created data processing model object instance, then described first data processing model object instance was defined as the data processing model object instance corresponding with described data processing model description document;
If described data processing model description document has been created data processing model object instance, then described first data processing model object instance is merged with the second data processing model object instance corresponding with described data processing model description document created, obtain the data processing model object instance corresponding with described data processing model description document.
3. method according to claim 2, is characterized in that, described merging with the second data processing model object instance corresponding with described data processing model description document created by described first data processing model object instance comprises:
Described first data processing model object instance and described second data processing model object instance are compared;
For Section Point in described first data processing model object instance, and there is with described Section Point in described second data processing model object instance the 3rd node of identical unique identifier, if the parameter of the data centralization that the parameter of the data centralization that described Section Point is corresponding is corresponding from described 3rd node is different, data set corresponding for described Section Point is updated to described 3rd node, and is not processed state by described 3rd vertex ticks;
If there is the 4th node in described first data processing model object instance, and in described second data processing model object instance, do not comprise described 4th node, by in the second data processing model object instance described in described 4th node city, and be not processed state by the 4th vertex ticks described in described second data processing model object instance;
If there is the 5th node in described second data processing model object instance, and in described first data processing model object instance, do not comprise described 5th node, by the 5th knot removal in described second data processing model object instance, and all child nodes of described 5th node are labeled as not processed state;
By in described second data processing model object instance, the state of all father nodes is the vertex ticks of not processed state is not processed state.
4. method according to claim 1, is characterized in that, the input data set of the described data set corresponding based on described first node to described first node processes, and the output data set generating described first node comprises:
The data set corresponding based on described first node generates the handling function file corresponding with described first node;
Handling function file described in on-the-flier compiler also loads corresponding function object;
Described function object is performed to the input data set of described first node, generates the output data set of described first node.
5. method according to claim 4, is characterized in that, the described data set corresponding based on described first node generates the handling function file corresponding with described first node and comprise:
Type and the parameter of described first node is read from the data centralization that described first node is corresponding;
Type based on described first node determines the program file template corresponding with described first node;
Described parameter is inserted the described program file template generation program source file corresponding with described first node;
Described program source file is compiled, obtains the handling function file corresponding with described first node.
6. a data processing equipment, is characterized in that, comprising:
Acquisition module, obtains the data processing model object instance corresponding with described data processing model description document for the data processing model description document sent based on client; Described data processing model description document is converted to by data processing model, described data processing model is digraph, node in described digraph comprises, comprise the running node of at least one father node and do not comprise the data source nodes of any father node, the corresponding data set of each node in described digraph;
Processing module, for when receive client send carry the execution instruction of the node listing be made up of the some nodes in described data processing model object instance time, for the first node in described node listing, if the input data of described first node come from the father node of described first node, and the data set corresponding to the father node of described first node is not successfully processed, then the father node of described first node is added described node listing and priority processing; If the input data of described first node come from the father node of described first node, and the data set corresponding to the father node of described first node is successfully processed, the input data set of output data set as described first node of the father node of described first node is then obtained from execution context, the data set input data set to described first node corresponding based on described first node processes, generate the output data set of described first node, the output data set of described first node is charged to execution context; Described first node is any node in described node listing.
7. device according to claim 6, is characterized in that, described acquisition module comprises:
Transformant module, the data processing model description document for described client being sent is converted into the first data processing model object instance;
Judge submodule, judge whether described data processing model description document was created data processing model object instance for the unique identifier according to data processing model;
Determine submodule, if be not created data processing model object instance for described data processing model description document, then described first data processing model object instance was defined as the data processing model object instance corresponding with described data processing model description document;
Merge submodule, if be created data processing model object instance for described data processing model description document, then described first data processing model object instance is merged with the second data processing model object instance corresponding with described data processing model description document created, obtain the data processing model object instance corresponding with described data processing model description document.
8. device according to claim 7, is characterized in that, described merging submodule comprises:
Comparing unit, for comparing described first data processing model object instance and described second data processing model object instance;
First processing unit, for for Section Point in described first data processing model object instance, and there is with described Section Point in described second data processing model object instance the 3rd node of identical unique identifier, if the parameter of the data centralization that the parameter of the data centralization that described Section Point is corresponding is corresponding from described 3rd node is different, data set corresponding for described Section Point is updated to described 3rd node, and is not processed state by described 3rd vertex ticks;
Second processing unit, if for there is the 4th node in described first data processing model object instance, and in described second data processing model object instance, do not comprise described 4th node, by in the second data processing model object instance described in described 4th node city, and be not processed state by the 4th vertex ticks described in described second data processing model object instance;
3rd processing unit, if for there is the 5th node in described second data processing model object instance, and in described first data processing model object instance, do not comprise described 5th node, by the 5th knot removal in described second data processing model object instance, and all child nodes of described 5th node are labeled as not processed state;
Fourth processing unit, for by described second data processing model object instance, the state of all father nodes is the vertex ticks of not processed state is not processed state.
9. device according to claim 6, it is characterized in that, process at the input data set of data set to described first node corresponding based on described first node, generate the aspect of the output data set of described first node, described processing module specifically for, generate the handling function file corresponding with described first node based on data set corresponding to described first node; Handling function file described in on-the-flier compiler also loads corresponding function object; Described function object is performed to the input data set of described first node, generates the output data set of described first node.
10. device according to claim 9, it is characterized in that, the data set corresponding based on described first node generate the handling function file corresponding with described first node in, described processing module specifically for, read type and the parameter of described first node from the data centralization that described first node is corresponding; Type based on described first node determines the program file template corresponding with described first node; Described parameter is inserted the described program file template generation program source file corresponding with described first node; Described program source file is compiled, obtains the handling function file corresponding with described first node.
CN201610098936.1A 2016-02-23 2016-02-23 Data processing method and device Active CN105573836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610098936.1A CN105573836B (en) 2016-02-23 2016-02-23 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610098936.1A CN105573836B (en) 2016-02-23 2016-02-23 Data processing method and device

Publications (2)

Publication Number Publication Date
CN105573836A true CN105573836A (en) 2016-05-11
CN105573836B CN105573836B (en) 2018-12-28

Family

ID=55884006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610098936.1A Active CN105573836B (en) 2016-02-23 2016-02-23 Data processing method and device

Country Status (1)

Country Link
CN (1) CN105573836B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109743202A (en) * 2018-12-26 2019-05-10 中国联合网络通信集团有限公司 Management method, device, equipment and the readable storage medium storing program for executing of data
CN112598506A (en) * 2020-12-25 2021-04-02 中国农业银行股份有限公司 Method for determining false mortgage user and related device
CN113434323A (en) * 2021-06-28 2021-09-24 浙江大华技术股份有限公司 Task flow control method of data center station and related device
CN113918126A (en) * 2021-09-14 2022-01-11 威讯柏睿数据科技(北京)有限公司 AI modeling flow arrangement method and system based on graph algorithm
CN114840265A (en) * 2022-03-23 2022-08-02 阿里巴巴(中国)有限公司 Data processing method based on executable graph

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1500228A1 (en) * 2002-04-30 2005-01-26 Nokia Corporation Method and device for management of tree data exchange
US20050243740A1 (en) * 2004-04-16 2005-11-03 Microsoft Corporation Data overlay, self-organized metadata overlay, and application level multicasting
CN102819536A (en) * 2011-09-27 2012-12-12 金蝶软件(中国)有限公司 Processing method and device of tree type data
CN103049580A (en) * 2013-01-17 2013-04-17 北京工商大学 Method and device for visualization of layering data
CN104281681A (en) * 2014-10-07 2015-01-14 北京工商大学 Tetragonal ordered tree map layout method for hierarchical data
CN104714947A (en) * 2013-12-11 2015-06-17 深圳市腾讯计算机系统有限公司 Preset type number recognition method and device
JP2015167041A (en) * 2015-05-20 2015-09-24 大澤 昇平 Machine learning model design support device, machine learning model design support method, program for machine learning model design support device
CN104955068A (en) * 2015-06-18 2015-09-30 湖南大学 Data aggregation and transmission method based on association pattern
CN105117468A (en) * 2015-08-28 2015-12-02 广州酷狗计算机科技有限公司 Network data processing method and apparatus
US20160012152A1 (en) * 2014-04-09 2016-01-14 Introspective Systems LLC Executable graph framework for the management of complex systems

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1500228A1 (en) * 2002-04-30 2005-01-26 Nokia Corporation Method and device for management of tree data exchange
US20050243740A1 (en) * 2004-04-16 2005-11-03 Microsoft Corporation Data overlay, self-organized metadata overlay, and application level multicasting
CN102819536A (en) * 2011-09-27 2012-12-12 金蝶软件(中国)有限公司 Processing method and device of tree type data
CN103049580A (en) * 2013-01-17 2013-04-17 北京工商大学 Method and device for visualization of layering data
CN104714947A (en) * 2013-12-11 2015-06-17 深圳市腾讯计算机系统有限公司 Preset type number recognition method and device
US20160012152A1 (en) * 2014-04-09 2016-01-14 Introspective Systems LLC Executable graph framework for the management of complex systems
CN104281681A (en) * 2014-10-07 2015-01-14 北京工商大学 Tetragonal ordered tree map layout method for hierarchical data
JP2015167041A (en) * 2015-05-20 2015-09-24 大澤 昇平 Machine learning model design support device, machine learning model design support method, program for machine learning model design support device
CN104955068A (en) * 2015-06-18 2015-09-30 湖南大学 Data aggregation and transmission method based on association pattern
CN105117468A (en) * 2015-08-28 2015-12-02 广州酷狗计算机科技有限公司 Network data processing method and apparatus

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109743202A (en) * 2018-12-26 2019-05-10 中国联合网络通信集团有限公司 Management method, device, equipment and the readable storage medium storing program for executing of data
CN112598506A (en) * 2020-12-25 2021-04-02 中国农业银行股份有限公司 Method for determining false mortgage user and related device
CN113434323A (en) * 2021-06-28 2021-09-24 浙江大华技术股份有限公司 Task flow control method of data center station and related device
CN113918126A (en) * 2021-09-14 2022-01-11 威讯柏睿数据科技(北京)有限公司 AI modeling flow arrangement method and system based on graph algorithm
WO2023040372A1 (en) * 2021-09-14 2023-03-23 北京柏睿数据技术股份有限公司 Ai modeling process choreography method and system based on graph algorithm
CN114840265A (en) * 2022-03-23 2022-08-02 阿里巴巴(中国)有限公司 Data processing method based on executable graph

Also Published As

Publication number Publication date
CN105573836B (en) 2018-12-28

Similar Documents

Publication Publication Date Title
CN105573836A (en) Data processing method and device
CN110968325B (en) Applet conversion method and device
Van Deursen et al. Symphony: View-driven software architecture reconstruction
US8826225B2 (en) Model transformation unit
US8869111B2 (en) Method and system for generating test cases for a software application
CN112394942B (en) Distributed software development compiling method and software development platform based on cloud computing
CN106843840B (en) Source code version evolution annotation multiplexing method based on similarity analysis
CN111399853A (en) Templated deployment method of machine learning model and custom operator
Verlage Multi-view modeling of software processes
US20210263833A1 (en) Code Generation Platform with Debugger
CN111309335A (en) Plug-in application compiling method and device and computer readable storage medium
CN108241720B (en) Data processing method, device and computer readable storage medium
CN110737437A (en) compiling method and device based on code integration
CN108304164B (en) Business logic development method and development system
CN110737438A (en) data processing method and device
CN116431668A (en) Metadata acquisition-based data blood-edge analysis method and device and electronic equipment
CN115951890A (en) Method, system and device for code conversion between different front-end frames
CN105426676A (en) Drilling data processing method and system
US9032372B2 (en) Runtime environment and method for non-invasive monitoring of software applications
EP2535813B1 (en) Method and device for generating an alert during an analysis of performance of a computer application
CN113778421A (en) Method and equipment for generating service code
CN113504904A (en) User-defined function implementation method and device, computer equipment and storage medium
JP2019109687A (en) Programming language conversion support device, programming language conversion support method and program
CN113656010B (en) Method, system, equipment and medium for automatically creating code warehouse by micro service
Ziegenhagen et al. Capturing Tracing Data Life Cycles for Supporting Traceability.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant