CN110069465A

CN110069465A - HDFS data managing method, device, equipment and medium based on workflow

Info

Publication number: CN110069465A
Application number: CN201910201985.7A
Authority: CN
Inventors: 沈志刚; 敖挺挺; 付倩
Original assignee: Ping An Urban Construction Technology Shenzhen Co Ltd
Current assignee: Ping An Urban Construction Technology Shenzhen Co Ltd
Priority date: 2019-03-16
Filing date: 2019-03-16
Publication date: 2019-07-30

Abstract

The present invention relates to data more frontiers, provide a kind of HDFS data managing method based on workflow, comprising the following steps: when receiving data update request, obtain the data and update the corresponding user identifier of request and target data to be updated；The more new task and the corresponding element task stream of the more new task for obtaining the target data, obtain in the element task stream with the matched workflow nodes of the user identifier；The workflow nodes are sent to the corresponding terminal of the user identifier, so that the user identifier corresponds to the update confirmation that user carries out target data；When receiving the confirmation message that the terminal is sent, according to the confirmation message by the updating target data into HDFS.The invention also discloses a kind of HDFS data administrator, equipment and medium based on workflow.By element task stream the update of the data in HDFS is more standardized in the present invention.

Description

HDFS data managing method, device, equipment and medium based on workflow

Technical field

The present invention relates to data processing fields, more particularly to the HDFS data managing method based on workflow, device, equipment And medium.

Background technique

With the hair of Hadoop distributed file system (Hadoop Distributed File System, abbreviation HDFS) Exhibition, the data volume of Hadoop is increasing, and data more new management is more and more troublesome.

The online update control of the data of core in current some HDFS, often by manual operation or general process Come carry out entire data more new technological process, generally lack it is a kind of customization and systematic examination & approval, verification, and to checking during The data of change do not have systematic record, i.e., in current HDFS the data of core online update control and it is lack of standardization, such as The HDFS data management what standardizes becomes current technical problem urgently to be resolved.

Summary of the invention

The HDFS data managing method that the main purpose of the present invention is to provide a kind of based on workflow, device, equipment and Storage medium, it is intended to solve the problems, such as that HDFS data more new management is nonstandard.

To achieve the above object, the present invention provides the HDFS data managing method based on workflow, described to be based on workflow HDFS data managing method the following steps are included:

When receiving data and updating request, obtains the data and update and request corresponding user identifier and to be updated Target data；

The more new task and the corresponding element task stream of the more new task for obtaining the target data, obtain the base In plinth workflow with the matched workflow nodes of the user identifier；

The workflow nodes are sent to the corresponding terminal of the user identifier, so that the user identifier corresponds to user Carry out the update confirmation of target data；

When receiving the confirmation message that the terminal is sent, the updating target data is arrived according to the confirmation message In HDFS.

Optionally, described when receiving data update request, it obtains the data and updates the corresponding user identifier of request, Before the step of target data to be updated, comprising:

The first node element of predefined Hadoop Work flow model and the second node element of BPEL procedural model, and build Found the mapping ruler of the first node element Yu the second node element；

First node element in the Hadoop Work flow model is handled by the mapping ruler, obtains BPEL Procedural model；

By the description file of the BPEL procedural model, conversion generates service arrangement file, and by the description file, institute State that each second node element in BPEL process corresponds to the executable code of task and the service arrangement file is packaged into deployment package, And dispose the deployment package；

The deployment package is executed, element task stream is obtained.

Optionally, the more new task and the corresponding element task stream of the more new task for obtaining the target data, The step of obtaining workflow nodes matched with the user identifier in the element task stream, comprising:

The more new task of the target data is obtained, the associated deployment package of the more new task is obtained；

The deployment package is executed, corresponding element task stream is obtained, and obtains the node for including in the element task stream Element and the corresponding role identification of the node elements；

It obtains and is identified with the matched target roles of the user identifier, and the target roles are identified into corresponding node member Element is used as in the element task stream and the matched workflow nodes of the user identifier.

Optionally, the more new task and the corresponding element task stream of the more new task for obtaining the target data, After the step of obtaining workflow nodes matched with the user identifier in the element task stream, comprising:

Whether the quantity for judging the workflow nodes is at least two；

When the quantity of the workflow nodes is at least two, the sequence between each workflow nodes is determined；

When each workflow nodes are parallel sequence, it is associated except the user identifier to obtain the workflow nodes Except other role identifications, and send prompt information to the role identification counterpart terminal, to prompt the role identification pair Personnel are answered to execute the more new task.

Optionally, described when receiving the confirmation message that the terminal is sent, according to the confirmation message by the target Data update the step into HDFS, comprising:

When receiving the confirmation message that the terminal is sent, judge whether the confirmation message is to be identified through；

The confirmation message be identified through when, judge the workflow nodes whether be the element task stream most Posterior nodal point；

When the workflow nodes are the final nodes of the element task stream, by the updating target data to HDFS In.

Optionally, described when receiving data update request, it obtains the data and updates the corresponding user identifier of request, After the step of target data to be updated, comprising:

The upstream data of the target data and the verification rule of the upstream data are obtained, according to the verification rule Judge whether the upstream data is abnormal；

In the upstream data exception, transmission pause prompt information to user identifier counterpart terminal, and send abnormal mention Show information to the corresponding default terminal of the upstream data.

Optionally, described when receiving the confirmation message that the terminal is sent, according to the confirmation message by the target Data update after the step into HDFS, comprising:

When detecting the target data exception, the corresponding element task stream of inquiry abnormal object data, and obtain institute State the corresponding workflow nodes of abnormal object data in element task stream；

The history more new record of each workflow nodes is obtained, and determines data convert by the history more new record Point, by the data convert in the HDFS to the restoration point.

In addition, to achieve the above object, the present invention also provides a kind of HDFS data administrator based on workflow is described HDFS data administrator based on workflow includes:

Request module, for obtaining the data and updating the corresponding use of request when receiving data update request Family mark and target data to be updated；

Node determining module, for obtaining more new task and the corresponding basis of the more new task of the target data Workflow, obtain in the element task stream with the matched workflow nodes of the user identifier；

Confirmation module is sent, for the workflow nodes to be sent to the corresponding terminal of the user identifier, for institute It states user identifier and corresponds to the update confirmation that user carries out target data；

Data update module, for when receiving the confirmation message that the terminal is sent, according to the confirmation message by institute Updating target data is stated into HDFS.

In addition, to achieve the above object, the present invention also provides a kind of HDFS data management apparatus based on workflow；

The HDFS data management apparatus based on workflow includes: memory, processor and is stored in the memory Computer program that is upper and can running on the processor, in which:

The HDFS data pipe based on workflow as described above is realized when the computer program is executed by the processor The step of reason method.

In addition, to achieve the above object, the present invention also provides computer storage mediums；

Computer program, the realization when computer program is executed by processor are stored in the computer storage medium Such as the step of the above-mentioned HDFS data managing method based on workflow.

A kind of HDFS data managing method, device, equipment and medium based on workflow that the embodiment of the present invention proposes, When server receives data update request, obtains the data and update the corresponding user identifier of request and target to be updated Data；The more new task and the corresponding element task stream of the more new task of the target data are obtained, the basic work is obtained In flowing with the matched workflow nodes of the user identifier；Element task stream is preset in the present invention, passes through workflow Mode is safeguarded and is updated to the data in HDFS；Workflow nodes are set in element task stream, data are occurring more New demand selects corresponding more new technological process according to different data types, fills in by data, examines, verifies, tests constant pitch Data, are finally updated in HDFS by point.At the same time, the element task stream in the present invention is configurable, and allows different type Data correspond to different creations, verification, approval process；Guarantee data updating efficiency, while data updating process can be chased after Track, it is final to manage with realizing HDFS data normalization.

Detailed description of the invention

Fig. 1 is the apparatus structure schematic diagram for the hardware running environment that the embodiment of the present invention is related to；

Fig. 2 is that the present invention is based on the flow diagrams of the HDFS data managing method first embodiment of workflow；

Fig. 3 is that the present invention is based on the functional block diagrams of one embodiment of HDFS data administrator of workflow.

The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.

Specific embodiment

It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.

Since the online update of the data in some HDFS of the prior art is managed, often by manual operation or general Process carries out entire data more new technological process, generally lacks a kind of customization and systematic examination & approval, verification, and tests examining and revising The data changed in the process do not have systematic record.That is, being had the following disadvantages in current HDFS data management scheme: 1, All online processes of HDFS data are identical, can not accomplish accurate permission control and data control；2, the data during circulating It is relatively difficult to lack systematic record, subsequent retrospect record etc.；3, the monitoring granularity of the data variation in process not enough, mistake Data often only can be just found in final tache, be difficult to be intervened in advance.

The present invention provides a solution, is managed by way of workflow to HDFS data, and what is avoided is above-mentioned Shortcoming, specifically, as shown in Figure 1, Fig. 1 be the server for the hardware running environment that the embodiment of the present invention is related to (again It is the HDFS data management apparatus based on workflow, wherein the HDFS data management apparatus based on workflow can be by independent HDFS data administrator based on workflow constitute, be also possible to by other devices and the HDFS data pipe based on workflow Reason device combines to be formed) structural schematic diagram.

Server of the embodiment of the present invention refers to a management resource and provides the computer of service for user, is generally divided into file Server, database server and apps server.The computer or computer system for running the above software are also referred to as Server.For common PC (personal computer) personal computer, server is in stability, safety, property Energy etc. requires higher；As shown in Figure 1, the server may include: processor 1001, such as central processing unit (Central Processing Unit, CPU), network interface 1004, user interface 1003, memory 1005, communication bus 1002, hardware such as chipset, disk system, network etc..Wherein, communication bus 1002 is for realizing the connection between these components Communication.User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), optional user Interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 optionally may include having for standard Line interface, wireless interface (such as Wireless Fidelity WIreless-FIdelity, WIFI interface).Memory 1005 can be high speed with Machine accesses memory (random access memory, RAM), is also possible to stable memory (non-volatile ), such as magnetic disk storage memory.Memory 1005 optionally can also be the storage dress independently of aforementioned processor 1001 It sets.

Optionally, server can also include camera, RF (Radio Frequency, radio frequency) circuit, sensor, sound Frequency circuit, WiFi module；Input unit, than display screen, touch screen；Network interface can be blue in blanking wireless interface in addition to WiFi Tooth, probe etc..It will be understood by those skilled in the art that server architecture shown in Fig. 1 does not constitute the restriction to server, It may include perhaps combining certain components or different component layouts than illustrating more or fewer components.

As shown in Figure 1, the computer software product, which is stored in a storage medium, (storage medium: is called computer storage Medium, computer media, readable medium, readable storage medium storing program for executing, computer readable storage medium are directly medium etc., storage Medium can be non-volatile readable storage medium, such as RAM, magnetic disk, CD) in, including some instructions use is so that an end End equipment (can be mobile phone, computer, server, air conditioner or the network equipment etc.) executes each embodiment institute of the present invention The method stated, as may include operating system, network communication module, use in a kind of memory 1005 of computer storage medium Family interface module and computer program.

In server shown in Fig. 1, network interface 1004 be mainly used for connect background data base, with background data base into Row data communication；User interface 1003 is mainly used for connection client, and (client, is called user terminal or terminal, and the present invention is implemented Example terminal can be also possible to mobile terminal with fixed terminal, e.g., intelligent air condition, intelligent electric lamp, intelligent power with network savvy, Intelligent sound box, autonomous driving vehicle, PC, smart phone, tablet computer, E-book reader, portable computer etc., are wrapped in terminal Containing sensor such as optical sensor, motion sensor and other sensors, details are not described herein), data are carried out with client Communication；And processor 1001 can be used for calling the computer program stored in memory 1005, and it is real to execute the present invention or less Step in the HDFS data managing method based on workflow of example offer is provided.

Further, the present embodiment proposes a kind of HDFS data managing method based on workflow, applies in such as Fig. 1 institute The server stated, the HDFS data managing method based on workflow in the present embodiment, comprising:

Before the present embodiment executes above step based on the HDFS data managing method of workflow, needed in server pre- The step of element task stream is first set, element task stream is preset, comprising:

Step S01, the first node element of predefined Hadoop Work flow model and the second node of BPEL procedural model Element, and establish the mapping ruler of the first node element Yu the second node element.

Model conversion frame is pre-established in server, implements step are as follows: 1, the predefined Hadoop (Hadoop of definition It is distributed system infrastructure developed by apache foundation) the first node element and BPEL process of Work flow model The second node element of model；2, the mapping ruler of first node element Yu second node element is established, for example, Hadoop works The flow element that branch's fork element (being called first node element) in flow model corresponds in BPEL procedural model (is called Second node element).

Step S02 is handled the first node element in the Hadoop Work flow model by the mapping ruler, Obtain BPEL procedural model.

Server is established according to the mapping ruler for establishing first node element Yu second node element from Hadoop workflow Model is implemented as follows to the model conversion algorithm of BPEL procedural model: 1, the switching strategy used is top-down, often A Hadoop Work flow model is expressed as directed acyclic graph, after conversion, exports as BPEL procedural model；2, each Hadoop work Flow model includes the beginning and end node of workflow；3, all input elements are counted, add Variables for BPEL model Then element successively extracts Hadoop Work flow model interior joint object and is translated, judges its type, if active node, It is then transcribed into basic activity, if control node, then translates assignment statement first, then according to control node type, It is translated as different control node objects, repeats this process, until Hadoop Work flow model all elements are disposed, obtain To BPEL procedural model.

Step S03, by the description file of the BPEL procedural model, conversion generates service arrangement file, and retouches described State that file, each second node element corresponds to the executable code of task in the BPEL process and the service arrangement file is packaged At deployment package, and dispose the deployment package；The deployment package is executed, element task stream is obtained.

After obtaining BPEL procedural model, server automatic deployment executes the BPEL procedural model generated, specific implementation As follows: 1, server describes file according to BPEL procedural model, generates service arrangement file；2, server is by BPEL procedural model Description file, the executable code and BPEL flow services deployment file of the corresponding task of each node are broken into together in BPEL process Deployment package；3, deployment package is placed into BPEL engine by server, and after executable deployment package is completed in deployment, server is held Row deployment package obtains element task stream.

It should be added that: different element task stream can be configured for different types of data update, for example, simple Input node and test node can be set in the element task stream that single data update；And the element task for complicated data Input node, test node, check-node and approval node etc. can be set in stream.

Server presets different type element task stream in the present embodiment, and server can be in element task stream Flow nodes conversion is carried out, so that being safeguarded and being updated effectively by way of workflow.Specifically:

It is described to be based on work the present invention is based in the first embodiment of the HDFS data managing method of workflow referring to Fig. 2 Making the HDFS data managing method that flows includes:

Step S10 obtains the data and updates the corresponding user identifier of request when receiving data update request, and Target data to be updated.

When receiving data update request, server obtains data and updates the user identifier carried in request server, Wherein, user identifier refers to the identification information for unique identification user identity, for example, user account, is getting user's mark After knowledge, server obtains data and updates the corresponding target data to be updated of request.

Wherein, the data that server receives update request triggering mode and are not especially limited, for example, user is at the terminal Inputting specific data, manually trigger data updates request, and data are updated request and are sent to server by terminal, and server receives When the data sent to terminal update request, data are updated the user account information carried in request and marked as user by server Know, the specific data that server inputs user is updated as data requests corresponding target data to be updated；Again alternatively, user Preset on the server: automatic trigger data update request when getting new data, when server gets new data, Server judgement meets preset automatic trigger condition, and server automatic trigger data update request, and server will be arranged automatic The subscriber identity information of trigger condition, updates as data and requests corresponding user identifier, and server will acquire new data As target data to be updated.

Step S20 obtains the more new task and the corresponding element task stream of the more new task of the target data, obtains Take in the element task stream with the matched workflow nodes of the user identifier.

After server gets target data, server needs the data type according to target data, determines target The element task stream of data, to carry out updating target data according to the workflow nodes in element task stream, specifically, comprising:

Step a1 obtains the more new task of the target data, obtains the associated deployment package of the more new task；

Step b1 executes the deployment package, obtains corresponding element task stream, and obtains in the element task stream and include Node elements and the corresponding role identification of the node elements；

Step c1 is obtained and is identified with the matched target roles of the user identifier, and the target roles are identified and are corresponded to Node elements as in the element task stream with the matched workflow nodes of the user identifier.

That is, server obtains the data type of target data, and the corresponding more new task of the data type is obtained, server obtains The associated deployment package of more new task in workflow engine is taken, server executes deployment package, obtains corresponding element task stream, and obtain The node elements for including in element task stream are taken, for example, the node elements for including in element task stream are data approval node, number According to check-node, data test node etc., then, server obtain the corresponding role identification of each node elements, Jiao Sebiao Knowledge refers to the preset user identifier with nodal operation permission, for example, the corresponding role identification of data approval node is data Examination & approval person's account.

Then, server data are updated request corresponding user identifier role identification corresponding with each node elements into Row compares, and server obtains target roles mark identical with user identifier, and target roles are identified corresponding node by server Element is used as in basic workflow and the matched workflow nodes of user identifier.For example, the user identifier that server obtains are as follows: account Name in an account book is proclaimed oneself king xx, and server is by the role identification king xx of name on account xx and each node data approval node, data check section What xx of the role identification of the role identification Lee xx and data test node that put is compared, and server determines that data approval node is In element task stream with the matched workflow nodes of user identifier king xx.

Server determines the node of data update according to target data and user identifier jointly in the present embodiment, so that data It is more rigorous to update operation.

The workflow nodes are sent to the corresponding terminal of the user identifier, for the user identifier by step S30 Corresponding user carries out the update confirmation of target data.

Workflow nodes are sent to the corresponding terminal of user identifier by server, for example, server determines data examination & approval section Data approval node is sent to king's xx counterpart terminal by the point matched workflow nodes of king xx, server, so that king xx carries out mesh Mark the update confirmation of data.

Step S40, when receiving the confirmation message that the terminal is sent, according to the confirmation message by the target data It updates in HDFS.

When server receives the confirmation message that the terminal is sent, server is according to the confirmation message received, judgement Whether execute data and update operation, specifically, comprising:

Step a2 judges whether the confirmation message is to be identified through when receiving the confirmation message that the terminal is sent；

Step b2 judges whether the workflow nodes are the basic work when the confirmation message is to be identified through Make the final node flowed；

Step c2, when the workflow nodes are the final nodes of the element task stream, more by the target data Newly into HDFS.

That is, when server receives confirmation message, server judges whether confirmation message is to be identified through；In confirmation message To be obstructed out-of-date, server sends prompt information to user identifier counterpart terminal；When confirmation message is to be identified through, server Need further to judge current workflow nodes whether be basic workflow final node；In current workflow nodes When not being the final node of element task stream, server goes to element task according to element task stream, by data update node-flow The next node of the workflow nodes in stream；When workflow nodes are the final nodes of element task stream, server is by target Data are updated into HDFS.

Element task stream is preset in the present embodiment, and the data in HDFS are safeguarded by way of workflow With update；Workflow nodes are set in element task stream, in the demand for data update occur, according to different data types Corresponding more new technological process is selected, the nodes such as fills in, examine, verifying, testing by data, finally data are updated in HDFS. At the same time, the present invention in element task stream be configurable, allow different types of data correspond to different creations, verification, Approval process；Guarantee data updating efficiency, while making data updating process traceable, final realization HDFS data normalization Management.

Further, on the basis of first embodiment of the invention, the HDFS data the present invention is based on workflow are proposed The second embodiment of management method,

The present embodiment be in first embodiment after step S20 the step of, the present embodiment and first embodiment of the invention Difference is: specifically illustrating in element task stream that there are multiple places that workflow nodes are matched with user identifier in the present embodiment Step is managed, specifically, comprising:

Step S50 judges whether the quantity of the workflow nodes is at least two, in the quantity of the workflow nodes When being at least two, the sequence between each workflow nodes is determined.

After server gets the step of workflow nodes matched with user identifier in element task stream, server is true Determine the quantity of workflow nodes, is if server determines the quantity in element task stream with the matched workflow nodes of user identifier At one, server executes step S30 in first embodiment: it is corresponding that the workflow nodes being sent to the user identifier Terminal, so that the user identifier corresponds to the update confirmation that user carries out target data.

In the quantity at least two of workflow nodes, server determines the sequence between each workflow nodes, each When the workflow nodes are serial order, the workflow nodes information to sort at first is sent to user identifier and corresponded to by server Terminal, for the user identifier correspond to user carry out target data update confirmation.

It is associated except described to obtain the workflow nodes when each workflow nodes are parallel sequence by step S60 Other role identifications except user identifier, and prompt information is sent to the role identification counterpart terminal, to prompt the angle Colour code knows counterpart personnel and executes the more new task.

When it is parallel sequence that server, which determines between each workflow nodes, server obtains that workflow nodes are associated removes Other role identifications except user identifier, for example, the workflow nodes that server obtains are n data uploading nodes, service Device, which determines, needs user a in data uploading nodes, user b and user c upload data simultaneously, corresponds at this point, data update request User identifier be user a, server obtains the mark of user b and user c, and sends prompt information to user b and c couples of user Terminal is answered, to prompt user b and user c to execute more new task.

Parallel, serial node is provided in the present embodiment, for example, same node point can be divided into multiple child nodes, so that Parallel processing also can be set in multidigit operator's collaboration processing or different nodes, improves data updating efficiency.

Further, on the basis of the above embodiments, the data management side HDFS the present invention is based on workflow is proposed The 3rd embodiment of method.

The present embodiment is the refinement in first embodiment after step S10, is receiving number in server in the present embodiment When according to updating request, after determining target data, it is abnormal to need to determine that the upstream data of target data whether there is first, specifically Ground, the HDFS data managing method based on workflow include:

Step S70 obtains the upstream data of the target data and the verification rule of the upstream data, according to described Verification rule judges whether the upstream data is abnormal.

After server gets target data, server obtains the upstream data and the upstream data of target data Verification rule, server carries out automatic Verification to the upstream data of target data by the verification rule obtained, to judge upstream Whether data are abnormal.When the upstream data of target data is normal, server can execute the step of step S20 in the first implementation Suddenly.

Step S80, in the upstream data exception, transmission pause prompt information to user identifier counterpart terminal, concurrently Send abnormal prompt information to the corresponding default terminal of the upstream data.

In the present embodiment when server determines upstream data exception, server sends pause prompt information to user's mark Counterpart terminal is known, so that user identifier corresponds to user and suspends the update target data；It is wrong effectively to avoid batch data from updating Accidentally the case where.While the pause prompt information that server is sent, server sends abnormal prompt information to upstream data pair Terminal (the corresponding default terminal setting upstream data administrator counterpart terminal of its upstream data) should be preset, to prompt upstream data Administrative staff carry out desk checking, i.e., in the present embodiment data update while, school can also be carried out for upstream data It tests, to guarantee the accuracy rate of data update.

Further, on the basis of the above embodiments, the data management side HDFS the present invention is based on workflow is proposed The fourth embodiment of method.The present embodiment can be combined with other embodiments.

The step of the present embodiment is after first embodiment step S40, what is illustrated in the present embodiment passes through workflow When carrying out data update, there is the step of data convert when data update mistake, specifically, the HDFS number based on workflow Include: according to management method

Step S90, when detecting the target data exception, the corresponding element task stream of inquiry abnormal object data, And obtain the corresponding workflow nodes of abnormal object data in the element task stream.

When server detects target data exception, server inquires the corresponding element task stream of abnormal object data, And the corresponding each workflow nodes of abnormal object data in element task stream are obtained, according to the information of each workflow nodes Data convert is carried out, specifically:

Step S100 obtains the history more new record of each workflow nodes, and determines by the history more new record Data information in the HDFS is restored to the restoration point by data convert point.

Server obtains the history more new record of each workflow nodes, and Server history more new record determines target data Abnormal cause, and determine data convert point, the data information in HDFS is restored to restoration point by server, that is, in the present embodiment So that entire data updating process transparence, can monitor data flow, renewal speed, all data updating process numbers are improved According to and renewal process it is traceable, when facilitating retrospect and data rewind.

In addition, the embodiment of the present invention also proposes a kind of HDFS data administrator based on workflow, described referring to Fig. 3 HDFS data administrator based on workflow includes:

Request module 10, for it is corresponding to obtain the data update request when receiving data update request User identifier and target data to be updated；

Node determining module 20, for obtaining the more new task and the corresponding base of the more new task of the target data Plinth workflow, obtain in the element task stream with the matched workflow nodes of the user identifier；

Confirmation module 30 is sent, for the workflow nodes to be sent to the corresponding terminal of the user identifier, for The user identifier corresponds to the update confirmation that user carries out target data；

Data update module 40, for being incited somebody to action according to the confirmation message when receiving the confirmation message that the terminal is sent The updating target data is into HDFS.

Optionally, the HDFS data administrator based on workflow includes:

Predefined module, for predefined Hadoop Work flow model first node element and BPEL procedural model Two node elements, and establish the mapping ruler of the first node element Yu the second node element；

Process mapping block, for the first node element in the Hadoop Work flow model to be pressed the mapping ruler It is handled, obtains BPEL procedural model；

Workflow determining module, for converting the description file of the BPEL procedural model and generating service arrangement file, And by the description file, in the BPEL process each second node element correspond to task executable code and the service department Administration's file is packaged into deployment package, and disposes the deployment package；The deployment package is executed, element task stream is obtained.

Optionally, the node determining module 20, comprising:

Acquiring unit obtains the associated deployment package of the more new task for obtaining the more new task of the target data；

Execution unit obtains corresponding element task stream, and obtain the element task stream for executing the deployment package In include node elements and the corresponding role identification of the node elements；

Node determination unit is identified for obtaining with the matched target roles of the user identifier, and by the target angle Colour code know corresponding node elements as in the element task stream with the matched workflow nodes of the user identifier.

Optionally, the HDFS data administrator based on workflow includes:

Determination module, for judging whether the quantity of the workflow nodes is at least two；

Sequence determining module, for the quantity in the workflow nodes be at least two when, determine each workflow Sequence between node；

Parallel cue module, for obtaining the workflow nodes and closing when each workflow nodes are parallel sequence Other role identifications in addition to the user identifier of connection, and send prompt information to the role identification counterpart terminal, with The role identification counterpart personnel is prompted to execute the more new task.

Optionally, the data update module 40, comprising:

First judging unit, for whether judging the confirmation message when receiving the confirmation message that the terminal is sent To be identified through；

Second judgment unit, for the confirmation message be identified through when, judge the workflow nodes whether be The final node of the element task stream；

Data updating unit will be described for when the workflow nodes are the final nodes of the element task stream Updating target data is into HDFS.

Optionally, the HDFS data administrator based on workflow includes:

Data check module, for obtaining the upstream data of the target data and the verification rule of the upstream data, Judge whether the upstream data is abnormal according to the verification rule；

Cue module is sent, it is corresponding to user identifier in the upstream data exception, sending pause prompt information Terminal, and abnormal prompt information is sent to the corresponding default terminal of the upstream data.

Optionally, the HDFS data administrator based on workflow includes:

Abnormality detection module, for when detecting the target data exception, inquiring the corresponding base of abnormal object data Plinth workflow, and obtain the corresponding workflow nodes of abnormal object data in the element task stream；

Data restoring module is updated for obtaining the history more new record of each workflow nodes, and by the history It records and determines data convert point, by the data convert in the HDFS to the restoration point.

Wherein, the step of each Implement of Function Module of the HDFS data administrator based on workflow can refer to the present invention Each embodiment of HDFS data managing method based on workflow, details are not described herein again.

In addition, the embodiment of the present invention also proposes a kind of computer storage medium.

Computer program, the realization when computer program is executed by processor are stored in the computer storage medium Operation in HDFS data managing method provided by the above embodiment based on workflow.

It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body/operation/object is distinguished with another entity/operation/object, without necessarily requiring or implying these entity/operations/ There are any actual relationship or orders between object；The terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or the system that include a series of elements not only include that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of system.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in process, method, article or the system for including the element.

For device embodiment, since it is substantially similar to the method embodiment, related so describing fairly simple Place illustrates referring to the part of embodiment of the method.The apparatus embodiments described above are merely exemplary, wherein making It may or may not be physically separated for the unit of separate part description.In can selecting according to the actual needs Some or all of the modules realize the purpose of the present invention program.Those of ordinary skill in the art are not making the creative labor In the case where, it can it understands and implements.

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone, Computer, server, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.

The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims

1. a kind of HDFS data managing method based on workflow, which is characterized in that the HDFS data pipe based on workflow Reason method the following steps are included:

When receiving data update request, obtains the data and update the corresponding user identifier of request and target to be updated Data；

The more new task and the corresponding element task stream of the more new task of the target data are obtained, the basic work is obtained In flowing with the matched workflow nodes of the user identifier；

The workflow nodes are sent to the corresponding terminal of the user identifier, are carried out so that the user identifier corresponds to user The update of target data confirms；

When receiving the confirmation message that the terminal is sent, according to the confirmation message by the updating target data to HDFS In.

2. the HDFS data managing method based on workflow as described in claim 1, which is characterized in that described to receive number When according to updating request, obtain before the data update the step of requesting corresponding user identifier and target data to be updated, Include:

The first node element of predefined Hadoop Work flow model and the second node element of BPEL procedural model, and establish institute State the mapping ruler of first node element Yu the second node element；

First node element in the Hadoop Work flow model is handled by the mapping ruler, obtains BPEL process Model；

By the description file of the BPEL procedural model, conversion generates service arrangement file, and by the description file, described Each second node element corresponds to the executable code of task in BPEL process and the service arrangement file is packaged into deployment package, and Dispose the deployment package；

The deployment package is executed, element task stream is obtained.

3. the HDFS data managing method based on workflow as claimed in claim 2, which is characterized in that described to obtain the mesh Mark data more new task and the corresponding element task stream of the more new task, obtain in the element task stream with the use Family identifies the step of matched workflow nodes, comprising:

The deployment package is executed, corresponding element task stream is obtained, and obtains the node elements for including in the element task stream, And the corresponding role identification of the node elements；

It obtains and is identified with the matched target roles of the user identifier, and the target roles are identified into corresponding node elements and are made For in the element task stream with the matched workflow nodes of the user identifier.

4. the HDFS data managing method based on workflow as claimed in claim 3, which is characterized in that described to obtain the mesh Mark data more new task and the corresponding element task stream of the more new task, obtain in the element task stream with the use Family identified after the step of matched workflow nodes, comprising:

Whether the quantity for judging the workflow nodes is at least two；

When each workflow nodes are parallel sequence, it is associated in addition to the user identifier to obtain the workflow nodes Other role identifications, and send prompt information to the role identification counterpart terminal, to prompt the role identification to correspond to people Member executes the more new task.

5. the HDFS data managing method based on workflow as claimed in claim 4, which is characterized in that described described in the reception When the confirmation message that terminal is sent, the step according to the confirmation message by the updating target data into HDFS, comprising:

The confirmation message be identified through when, judge the workflow nodes whether be the element task stream most deutomerite Point；

When the workflow nodes are the final nodes of the element task stream, by the updating target data into HDFS.

6. the HDFS data managing method based on workflow as described in claim 1, which is characterized in that described to receive number When according to updating request, obtain after the data update the step of requesting corresponding user identifier and target data to be updated, Include:

The upstream data of the target data and the verification rule of the upstream data are obtained, according to the verification rule judgement Whether the upstream data is abnormal；

In the upstream data exception, transmission pause prompt information to user identifier counterpart terminal, and send abnormal prompt letter Breath presets terminal to the upstream data is corresponding.

7. the HDFS data managing method based on workflow as described in claim 1, which is characterized in that described described in the reception When the confirmation message that terminal is sent, according to the confirmation message by after step of the updating target data into HDFS, packet It includes:

When detecting the target data exception, the corresponding element task stream of inquiry abnormal object data, and obtain the base The corresponding workflow nodes of abnormal object data in plinth workflow；

The history more new record of each workflow nodes is obtained, and determines data convert point by the history more new record, it will Data convert in the HDFS is to the restoration point.

8. a kind of HDFS data administrator based on workflow, which is characterized in that the HDFS data pipe based on workflow Managing device includes:

Request module updates the corresponding user's mark of request for when receiving data update request, obtaining the data Knowledge and target data to be updated；

Node determining module, for obtaining the more new task and the corresponding element task of the more new task of the target data Stream, obtain in the element task stream with the matched workflow nodes of the user identifier；

Confirmation module is sent, for the workflow nodes to be sent to the corresponding terminal of the user identifier, for the use Family identifies the update confirmation that corresponding user carries out target data；

Data update module, for when receiving the confirmation message that the terminal is sent, according to the confirmation message by the mesh Data are marked to update into HDFS.

9. a kind of HDFS data management apparatus based on workflow, which is characterized in that the HDFS data pipe based on workflow Reason equipment includes: memory, processor and is stored in the computer journey that can be run on the memory and on the processor Sequence, in which:

When the computer program is executed by the processor realize as described in any one of claims 1 to 7 based on work The step of HDFS data managing method of stream.

10. a kind of computer storage medium, which is characterized in that be stored with computer program, institute in the computer storage medium State the HDFS number based on workflow realized as described in any one of claims 1 to 7 when computer program is executed by processor The step of according to management method.