CN110069465A - HDFS data managing method, device, equipment and medium based on workflow - Google Patents
HDFS data managing method, device, equipment and medium based on workflow Download PDFInfo
- Publication number
- CN110069465A CN110069465A CN201910201985.7A CN201910201985A CN110069465A CN 110069465 A CN110069465 A CN 110069465A CN 201910201985 A CN201910201985 A CN 201910201985A CN 110069465 A CN110069465 A CN 110069465A
- Authority
- CN
- China
- Prior art keywords
- data
- workflow
- hdfs
- user identifier
- nodes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 65
- 238000012790 confirmation Methods 0.000 claims abstract description 55
- 238000011144 upstream manufacturing Methods 0.000 claims description 32
- 230000008569 process Effects 0.000 claims description 25
- 230000002159 abnormal effect Effects 0.000 claims description 22
- 238000012795 verification Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 10
- 238000013523 data management Methods 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 238000007726 management method Methods 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000012360 testing method Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 230000009897 systematic effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 2
- 230000008570 general process Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013524 data verification Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012367 process mapping Methods 0.000 description 1
- 239000000523 sample Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Abstract
The present invention relates to data more frontiers, provide a kind of HDFS data managing method based on workflow, comprising the following steps: when receiving data update request, obtain the data and update the corresponding user identifier of request and target data to be updated;The more new task and the corresponding element task stream of the more new task for obtaining the target data, obtain in the element task stream with the matched workflow nodes of the user identifier;The workflow nodes are sent to the corresponding terminal of the user identifier, so that the user identifier corresponds to the update confirmation that user carries out target data;When receiving the confirmation message that the terminal is sent, according to the confirmation message by the updating target data into HDFS.The invention also discloses a kind of HDFS data administrator, equipment and medium based on workflow.By element task stream the update of the data in HDFS is more standardized in the present invention.
Description
Technical field
The present invention relates to data processing fields, more particularly to the HDFS data managing method based on workflow, device, equipment
And medium.
Background technique
With the hair of Hadoop distributed file system (Hadoop Distributed File System, abbreviation HDFS)
Exhibition, the data volume of Hadoop is increasing, and data more new management is more and more troublesome.
The online update control of the data of core in current some HDFS, often by manual operation or general process
Come carry out entire data more new technological process, generally lack it is a kind of customization and systematic examination & approval, verification, and to checking during
The data of change do not have systematic record, i.e., in current HDFS the data of core online update control and it is lack of standardization, such as
The HDFS data management what standardizes becomes current technical problem urgently to be resolved.
Summary of the invention
The HDFS data managing method that the main purpose of the present invention is to provide a kind of based on workflow, device, equipment and
Storage medium, it is intended to solve the problems, such as that HDFS data more new management is nonstandard.
To achieve the above object, the present invention provides the HDFS data managing method based on workflow, described to be based on workflow
HDFS data managing method the following steps are included:
When receiving data and updating request, obtains the data and update and request corresponding user identifier and to be updated
Target data;
The more new task and the corresponding element task stream of the more new task for obtaining the target data, obtain the base
In plinth workflow with the matched workflow nodes of the user identifier;
The workflow nodes are sent to the corresponding terminal of the user identifier, so that the user identifier corresponds to user
Carry out the update confirmation of target data;
When receiving the confirmation message that the terminal is sent, the updating target data is arrived according to the confirmation message
In HDFS.
Optionally, described when receiving data update request, it obtains the data and updates the corresponding user identifier of request,
Before the step of target data to be updated, comprising:
The first node element of predefined Hadoop Work flow model and the second node element of BPEL procedural model, and build
Found the mapping ruler of the first node element Yu the second node element;
First node element in the Hadoop Work flow model is handled by the mapping ruler, obtains BPEL
Procedural model;
By the description file of the BPEL procedural model, conversion generates service arrangement file, and by the description file, institute
State that each second node element in BPEL process corresponds to the executable code of task and the service arrangement file is packaged into deployment package,
And dispose the deployment package;
The deployment package is executed, element task stream is obtained.
Optionally, the more new task and the corresponding element task stream of the more new task for obtaining the target data,
The step of obtaining workflow nodes matched with the user identifier in the element task stream, comprising:
The more new task of the target data is obtained, the associated deployment package of the more new task is obtained;
The deployment package is executed, corresponding element task stream is obtained, and obtains the node for including in the element task stream
Element and the corresponding role identification of the node elements;
It obtains and is identified with the matched target roles of the user identifier, and the target roles are identified into corresponding node member
Element is used as in the element task stream and the matched workflow nodes of the user identifier.
Optionally, the more new task and the corresponding element task stream of the more new task for obtaining the target data,
After the step of obtaining workflow nodes matched with the user identifier in the element task stream, comprising:
Whether the quantity for judging the workflow nodes is at least two;
When the quantity of the workflow nodes is at least two, the sequence between each workflow nodes is determined;
When each workflow nodes are parallel sequence, it is associated except the user identifier to obtain the workflow nodes
Except other role identifications, and send prompt information to the role identification counterpart terminal, to prompt the role identification pair
Personnel are answered to execute the more new task.
Optionally, described when receiving the confirmation message that the terminal is sent, according to the confirmation message by the target
Data update the step into HDFS, comprising:
When receiving the confirmation message that the terminal is sent, judge whether the confirmation message is to be identified through;
The confirmation message be identified through when, judge the workflow nodes whether be the element task stream most
Posterior nodal point;
When the workflow nodes are the final nodes of the element task stream, by the updating target data to HDFS
In.
Optionally, described when receiving data update request, it obtains the data and updates the corresponding user identifier of request,
After the step of target data to be updated, comprising:
The upstream data of the target data and the verification rule of the upstream data are obtained, according to the verification rule
Judge whether the upstream data is abnormal;
In the upstream data exception, transmission pause prompt information to user identifier counterpart terminal, and send abnormal mention
Show information to the corresponding default terminal of the upstream data.
Optionally, described when receiving the confirmation message that the terminal is sent, according to the confirmation message by the target
Data update after the step into HDFS, comprising:
When detecting the target data exception, the corresponding element task stream of inquiry abnormal object data, and obtain institute
State the corresponding workflow nodes of abnormal object data in element task stream;
The history more new record of each workflow nodes is obtained, and determines data convert by the history more new record
Point, by the data convert in the HDFS to the restoration point.
In addition, to achieve the above object, the present invention also provides a kind of HDFS data administrator based on workflow is described
HDFS data administrator based on workflow includes:
Request module, for obtaining the data and updating the corresponding use of request when receiving data update request
Family mark and target data to be updated;
Node determining module, for obtaining more new task and the corresponding basis of the more new task of the target data
Workflow, obtain in the element task stream with the matched workflow nodes of the user identifier;
Confirmation module is sent, for the workflow nodes to be sent to the corresponding terminal of the user identifier, for institute
It states user identifier and corresponds to the update confirmation that user carries out target data;
Data update module, for when receiving the confirmation message that the terminal is sent, according to the confirmation message by institute
Updating target data is stated into HDFS.
In addition, to achieve the above object, the present invention also provides a kind of HDFS data management apparatus based on workflow;
The HDFS data management apparatus based on workflow includes: memory, processor and is stored in the memory
Computer program that is upper and can running on the processor, in which:
The HDFS data pipe based on workflow as described above is realized when the computer program is executed by the processor
The step of reason method.
In addition, to achieve the above object, the present invention also provides computer storage mediums;
Computer program, the realization when computer program is executed by processor are stored in the computer storage medium
Such as the step of the above-mentioned HDFS data managing method based on workflow.
A kind of HDFS data managing method, device, equipment and medium based on workflow that the embodiment of the present invention proposes,
When server receives data update request, obtains the data and update the corresponding user identifier of request and target to be updated
Data;The more new task and the corresponding element task stream of the more new task of the target data are obtained, the basic work is obtained
In flowing with the matched workflow nodes of the user identifier;Element task stream is preset in the present invention, passes through workflow
Mode is safeguarded and is updated to the data in HDFS;Workflow nodes are set in element task stream, data are occurring more
New demand selects corresponding more new technological process according to different data types, fills in by data, examines, verifies, tests constant pitch
Data, are finally updated in HDFS by point.At the same time, the element task stream in the present invention is configurable, and allows different type
Data correspond to different creations, verification, approval process;Guarantee data updating efficiency, while data updating process can be chased after
Track, it is final to manage with realizing HDFS data normalization.
Detailed description of the invention
Fig. 1 is the apparatus structure schematic diagram for the hardware running environment that the embodiment of the present invention is related to;
Fig. 2 is that the present invention is based on the flow diagrams of the HDFS data managing method first embodiment of workflow;
Fig. 3 is that the present invention is based on the functional block diagrams of one embodiment of HDFS data administrator of workflow.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
Since the online update of the data in some HDFS of the prior art is managed, often by manual operation or general
Process carries out entire data more new technological process, generally lacks a kind of customization and systematic examination & approval, verification, and tests examining and revising
The data changed in the process do not have systematic record.That is, being had the following disadvantages in current HDFS data management scheme: 1,
All online processes of HDFS data are identical, can not accomplish accurate permission control and data control;2, the data during circulating
It is relatively difficult to lack systematic record, subsequent retrospect record etc.;3, the monitoring granularity of the data variation in process not enough, mistake
Data often only can be just found in final tache, be difficult to be intervened in advance.
The present invention provides a solution, is managed by way of workflow to HDFS data, and what is avoided is above-mentioned
Shortcoming, specifically, as shown in Figure 1, Fig. 1 be the server for the hardware running environment that the embodiment of the present invention is related to (again
It is the HDFS data management apparatus based on workflow, wherein the HDFS data management apparatus based on workflow can be by independent
HDFS data administrator based on workflow constitute, be also possible to by other devices and the HDFS data pipe based on workflow
Reason device combines to be formed) structural schematic diagram.
Server of the embodiment of the present invention refers to a management resource and provides the computer of service for user, is generally divided into file
Server, database server and apps server.The computer or computer system for running the above software are also referred to as
Server.For common PC (personal computer) personal computer, server is in stability, safety, property
Energy etc. requires higher;As shown in Figure 1, the server may include: processor 1001, such as central processing unit
(Central Processing Unit, CPU), network interface 1004, user interface 1003, memory 1005, communication bus
1002, hardware such as chipset, disk system, network etc..Wherein, communication bus 1002 is for realizing the connection between these components
Communication.User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), optional user
Interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 optionally may include having for standard
Line interface, wireless interface (such as Wireless Fidelity WIreless-FIdelity, WIFI interface).Memory 1005 can be high speed with
Machine accesses memory (random access memory, RAM), is also possible to stable memory (non-volatile
), such as magnetic disk storage memory.Memory 1005 optionally can also be the storage dress independently of aforementioned processor 1001
It sets.
Optionally, server can also include camera, RF (Radio Frequency, radio frequency) circuit, sensor, sound
Frequency circuit, WiFi module;Input unit, than display screen, touch screen;Network interface can be blue in blanking wireless interface in addition to WiFi
Tooth, probe etc..It will be understood by those skilled in the art that server architecture shown in Fig. 1 does not constitute the restriction to server,
It may include perhaps combining certain components or different component layouts than illustrating more or fewer components.
As shown in Figure 1, the computer software product, which is stored in a storage medium, (storage medium: is called computer storage
Medium, computer media, readable medium, readable storage medium storing program for executing, computer readable storage medium are directly medium etc., storage
Medium can be non-volatile readable storage medium, such as RAM, magnetic disk, CD) in, including some instructions use is so that an end
End equipment (can be mobile phone, computer, server, air conditioner or the network equipment etc.) executes each embodiment institute of the present invention
The method stated, as may include operating system, network communication module, use in a kind of memory 1005 of computer storage medium
Family interface module and computer program.
In server shown in Fig. 1, network interface 1004 be mainly used for connect background data base, with background data base into
Row data communication;User interface 1003 is mainly used for connection client, and (client, is called user terminal or terminal, and the present invention is implemented
Example terminal can be also possible to mobile terminal with fixed terminal, e.g., intelligent air condition, intelligent electric lamp, intelligent power with network savvy,
Intelligent sound box, autonomous driving vehicle, PC, smart phone, tablet computer, E-book reader, portable computer etc., are wrapped in terminal
Containing sensor such as optical sensor, motion sensor and other sensors, details are not described herein), data are carried out with client
Communication;And processor 1001 can be used for calling the computer program stored in memory 1005, and it is real to execute the present invention or less
Step in the HDFS data managing method based on workflow of example offer is provided.
Further, the present embodiment proposes a kind of HDFS data managing method based on workflow, applies in such as Fig. 1 institute
The server stated, the HDFS data managing method based on workflow in the present embodiment, comprising:
When receiving data and updating request, obtains the data and update and request corresponding user identifier and to be updated
Target data;
The more new task and the corresponding element task stream of the more new task for obtaining the target data, obtain the base
In plinth workflow with the matched workflow nodes of the user identifier;
The workflow nodes are sent to the corresponding terminal of the user identifier, so that the user identifier corresponds to user
Carry out the update confirmation of target data;
When receiving the confirmation message that the terminal is sent, the updating target data is arrived according to the confirmation message
In HDFS.
Before the present embodiment executes above step based on the HDFS data managing method of workflow, needed in server pre-
The step of element task stream is first set, element task stream is preset, comprising:
Step S01, the first node element of predefined Hadoop Work flow model and the second node of BPEL procedural model
Element, and establish the mapping ruler of the first node element Yu the second node element.
Model conversion frame is pre-established in server, implements step are as follows: 1, the predefined Hadoop (Hadoop of definition
It is distributed system infrastructure developed by apache foundation) the first node element and BPEL process of Work flow model
The second node element of model;2, the mapping ruler of first node element Yu second node element is established, for example, Hadoop works
The flow element that branch's fork element (being called first node element) in flow model corresponds in BPEL procedural model (is called
Second node element).
Step S02 is handled the first node element in the Hadoop Work flow model by the mapping ruler,
Obtain BPEL procedural model.
Server is established according to the mapping ruler for establishing first node element Yu second node element from Hadoop workflow
Model is implemented as follows to the model conversion algorithm of BPEL procedural model: 1, the switching strategy used is top-down, often
A Hadoop Work flow model is expressed as directed acyclic graph, after conversion, exports as BPEL procedural model;2, each Hadoop work
Flow model includes the beginning and end node of workflow;3, all input elements are counted, add Variables for BPEL model
Then element successively extracts Hadoop Work flow model interior joint object and is translated, judges its type, if active node,
It is then transcribed into basic activity, if control node, then translates assignment statement first, then according to control node type,
It is translated as different control node objects, repeats this process, until Hadoop Work flow model all elements are disposed, obtain
To BPEL procedural model.
Step S03, by the description file of the BPEL procedural model, conversion generates service arrangement file, and retouches described
State that file, each second node element corresponds to the executable code of task in the BPEL process and the service arrangement file is packaged
At deployment package, and dispose the deployment package;The deployment package is executed, element task stream is obtained.
After obtaining BPEL procedural model, server automatic deployment executes the BPEL procedural model generated, specific implementation
As follows: 1, server describes file according to BPEL procedural model, generates service arrangement file;2, server is by BPEL procedural model
Description file, the executable code and BPEL flow services deployment file of the corresponding task of each node are broken into together in BPEL process
Deployment package;3, deployment package is placed into BPEL engine by server, and after executable deployment package is completed in deployment, server is held
Row deployment package obtains element task stream.
It should be added that: different element task stream can be configured for different types of data update, for example, simple
Input node and test node can be set in the element task stream that single data update;And the element task for complicated data
Input node, test node, check-node and approval node etc. can be set in stream.
Server presets different type element task stream in the present embodiment, and server can be in element task stream
Flow nodes conversion is carried out, so that being safeguarded and being updated effectively by way of workflow.Specifically:
It is described to be based on work the present invention is based in the first embodiment of the HDFS data managing method of workflow referring to Fig. 2
Making the HDFS data managing method that flows includes:
Step S10 obtains the data and updates the corresponding user identifier of request when receiving data update request, and
Target data to be updated.
When receiving data update request, server obtains data and updates the user identifier carried in request server,
Wherein, user identifier refers to the identification information for unique identification user identity, for example, user account, is getting user's mark
After knowledge, server obtains data and updates the corresponding target data to be updated of request.
Wherein, the data that server receives update request triggering mode and are not especially limited, for example, user is at the terminal
Inputting specific data, manually trigger data updates request, and data are updated request and are sent to server by terminal, and server receives
When the data sent to terminal update request, data are updated the user account information carried in request and marked as user by server
Know, the specific data that server inputs user is updated as data requests corresponding target data to be updated;Again alternatively, user
Preset on the server: automatic trigger data update request when getting new data, when server gets new data,
Server judgement meets preset automatic trigger condition, and server automatic trigger data update request, and server will be arranged automatic
The subscriber identity information of trigger condition, updates as data and requests corresponding user identifier, and server will acquire new data
As target data to be updated.
Step S20 obtains the more new task and the corresponding element task stream of the more new task of the target data, obtains
Take in the element task stream with the matched workflow nodes of the user identifier.
After server gets target data, server needs the data type according to target data, determines target
The element task stream of data, to carry out updating target data according to the workflow nodes in element task stream, specifically, comprising:
Step a1 obtains the more new task of the target data, obtains the associated deployment package of the more new task;
Step b1 executes the deployment package, obtains corresponding element task stream, and obtains in the element task stream and include
Node elements and the corresponding role identification of the node elements;
Step c1 is obtained and is identified with the matched target roles of the user identifier, and the target roles are identified and are corresponded to
Node elements as in the element task stream with the matched workflow nodes of the user identifier.
That is, server obtains the data type of target data, and the corresponding more new task of the data type is obtained, server obtains
The associated deployment package of more new task in workflow engine is taken, server executes deployment package, obtains corresponding element task stream, and obtain
The node elements for including in element task stream are taken, for example, the node elements for including in element task stream are data approval node, number
According to check-node, data test node etc., then, server obtain the corresponding role identification of each node elements, Jiao Sebiao
Knowledge refers to the preset user identifier with nodal operation permission, for example, the corresponding role identification of data approval node is data
Examination & approval person's account.
Then, server data are updated request corresponding user identifier role identification corresponding with each node elements into
Row compares, and server obtains target roles mark identical with user identifier, and target roles are identified corresponding node by server
Element is used as in basic workflow and the matched workflow nodes of user identifier.For example, the user identifier that server obtains are as follows: account
Name in an account book is proclaimed oneself king xx, and server is by the role identification king xx of name on account xx and each node data approval node, data check section
What xx of the role identification of the role identification Lee xx and data test node that put is compared, and server determines that data approval node is
In element task stream with the matched workflow nodes of user identifier king xx.
Server determines the node of data update according to target data and user identifier jointly in the present embodiment, so that data
It is more rigorous to update operation.
The workflow nodes are sent to the corresponding terminal of the user identifier, for the user identifier by step S30
Corresponding user carries out the update confirmation of target data.
Workflow nodes are sent to the corresponding terminal of user identifier by server, for example, server determines data examination & approval section
Data approval node is sent to king's xx counterpart terminal by the point matched workflow nodes of king xx, server, so that king xx carries out mesh
Mark the update confirmation of data.
Step S40, when receiving the confirmation message that the terminal is sent, according to the confirmation message by the target data
It updates in HDFS.
When server receives the confirmation message that the terminal is sent, server is according to the confirmation message received, judgement
Whether execute data and update operation, specifically, comprising:
Step a2 judges whether the confirmation message is to be identified through when receiving the confirmation message that the terminal is sent;
Step b2 judges whether the workflow nodes are the basic work when the confirmation message is to be identified through
Make the final node flowed;
Step c2, when the workflow nodes are the final nodes of the element task stream, more by the target data
Newly into HDFS.
That is, when server receives confirmation message, server judges whether confirmation message is to be identified through;In confirmation message
To be obstructed out-of-date, server sends prompt information to user identifier counterpart terminal;When confirmation message is to be identified through, server
Need further to judge current workflow nodes whether be basic workflow final node;In current workflow nodes
When not being the final node of element task stream, server goes to element task according to element task stream, by data update node-flow
The next node of the workflow nodes in stream;When workflow nodes are the final nodes of element task stream, server is by target
Data are updated into HDFS.
Element task stream is preset in the present embodiment, and the data in HDFS are safeguarded by way of workflow
With update;Workflow nodes are set in element task stream, in the demand for data update occur, according to different data types
Corresponding more new technological process is selected, the nodes such as fills in, examine, verifying, testing by data, finally data are updated in HDFS.
At the same time, the present invention in element task stream be configurable, allow different types of data correspond to different creations, verification,
Approval process;Guarantee data updating efficiency, while making data updating process traceable, final realization HDFS data normalization
Management.
Further, on the basis of first embodiment of the invention, the HDFS data the present invention is based on workflow are proposed
The second embodiment of management method,
The present embodiment be in first embodiment after step S20 the step of, the present embodiment and first embodiment of the invention
Difference is: specifically illustrating in element task stream that there are multiple places that workflow nodes are matched with user identifier in the present embodiment
Step is managed, specifically, comprising:
Step S50 judges whether the quantity of the workflow nodes is at least two, in the quantity of the workflow nodes
When being at least two, the sequence between each workflow nodes is determined.
After server gets the step of workflow nodes matched with user identifier in element task stream, server is true
Determine the quantity of workflow nodes, is if server determines the quantity in element task stream with the matched workflow nodes of user identifier
At one, server executes step S30 in first embodiment: it is corresponding that the workflow nodes being sent to the user identifier
Terminal, so that the user identifier corresponds to the update confirmation that user carries out target data.
In the quantity at least two of workflow nodes, server determines the sequence between each workflow nodes, each
When the workflow nodes are serial order, the workflow nodes information to sort at first is sent to user identifier and corresponded to by server
Terminal, for the user identifier correspond to user carry out target data update confirmation.
It is associated except described to obtain the workflow nodes when each workflow nodes are parallel sequence by step S60
Other role identifications except user identifier, and prompt information is sent to the role identification counterpart terminal, to prompt the angle
Colour code knows counterpart personnel and executes the more new task.
When it is parallel sequence that server, which determines between each workflow nodes, server obtains that workflow nodes are associated removes
Other role identifications except user identifier, for example, the workflow nodes that server obtains are n data uploading nodes, service
Device, which determines, needs user a in data uploading nodes, user b and user c upload data simultaneously, corresponds at this point, data update request
User identifier be user a, server obtains the mark of user b and user c, and sends prompt information to user b and c couples of user
Terminal is answered, to prompt user b and user c to execute more new task.
Parallel, serial node is provided in the present embodiment, for example, same node point can be divided into multiple child nodes, so that
Parallel processing also can be set in multidigit operator's collaboration processing or different nodes, improves data updating efficiency.
Further, on the basis of the above embodiments, the data management side HDFS the present invention is based on workflow is proposed
The 3rd embodiment of method.
The present embodiment is the refinement in first embodiment after step S10, is receiving number in server in the present embodiment
When according to updating request, after determining target data, it is abnormal to need to determine that the upstream data of target data whether there is first, specifically
Ground, the HDFS data managing method based on workflow include:
Step S70 obtains the upstream data of the target data and the verification rule of the upstream data, according to described
Verification rule judges whether the upstream data is abnormal.
After server gets target data, server obtains the upstream data and the upstream data of target data
Verification rule, server carries out automatic Verification to the upstream data of target data by the verification rule obtained, to judge upstream
Whether data are abnormal.When the upstream data of target data is normal, server can execute the step of step S20 in the first implementation
Suddenly.
Step S80, in the upstream data exception, transmission pause prompt information to user identifier counterpart terminal, concurrently
Send abnormal prompt information to the corresponding default terminal of the upstream data.
In the present embodiment when server determines upstream data exception, server sends pause prompt information to user's mark
Counterpart terminal is known, so that user identifier corresponds to user and suspends the update target data;It is wrong effectively to avoid batch data from updating
Accidentally the case where.While the pause prompt information that server is sent, server sends abnormal prompt information to upstream data pair
Terminal (the corresponding default terminal setting upstream data administrator counterpart terminal of its upstream data) should be preset, to prompt upstream data
Administrative staff carry out desk checking, i.e., in the present embodiment data update while, school can also be carried out for upstream data
It tests, to guarantee the accuracy rate of data update.
Further, on the basis of the above embodiments, the data management side HDFS the present invention is based on workflow is proposed
The fourth embodiment of method.The present embodiment can be combined with other embodiments.
The step of the present embodiment is after first embodiment step S40, what is illustrated in the present embodiment passes through workflow
When carrying out data update, there is the step of data convert when data update mistake, specifically, the HDFS number based on workflow
Include: according to management method
Step S90, when detecting the target data exception, the corresponding element task stream of inquiry abnormal object data,
And obtain the corresponding workflow nodes of abnormal object data in the element task stream.
When server detects target data exception, server inquires the corresponding element task stream of abnormal object data,
And the corresponding each workflow nodes of abnormal object data in element task stream are obtained, according to the information of each workflow nodes
Data convert is carried out, specifically:
Step S100 obtains the history more new record of each workflow nodes, and determines by the history more new record
Data information in the HDFS is restored to the restoration point by data convert point.
Server obtains the history more new record of each workflow nodes, and Server history more new record determines target data
Abnormal cause, and determine data convert point, the data information in HDFS is restored to restoration point by server, that is, in the present embodiment
So that entire data updating process transparence, can monitor data flow, renewal speed, all data updating process numbers are improved
According to and renewal process it is traceable, when facilitating retrospect and data rewind.
In addition, the embodiment of the present invention also proposes a kind of HDFS data administrator based on workflow, described referring to Fig. 3
HDFS data administrator based on workflow includes:
Request module 10, for it is corresponding to obtain the data update request when receiving data update request
User identifier and target data to be updated;
Node determining module 20, for obtaining the more new task and the corresponding base of the more new task of the target data
Plinth workflow, obtain in the element task stream with the matched workflow nodes of the user identifier;
Confirmation module 30 is sent, for the workflow nodes to be sent to the corresponding terminal of the user identifier, for
The user identifier corresponds to the update confirmation that user carries out target data;
Data update module 40, for being incited somebody to action according to the confirmation message when receiving the confirmation message that the terminal is sent
The updating target data is into HDFS.
Optionally, the HDFS data administrator based on workflow includes:
Predefined module, for predefined Hadoop Work flow model first node element and BPEL procedural model
Two node elements, and establish the mapping ruler of the first node element Yu the second node element;
Process mapping block, for the first node element in the Hadoop Work flow model to be pressed the mapping ruler
It is handled, obtains BPEL procedural model;
Workflow determining module, for converting the description file of the BPEL procedural model and generating service arrangement file,
And by the description file, in the BPEL process each second node element correspond to task executable code and the service department
Administration's file is packaged into deployment package, and disposes the deployment package;The deployment package is executed, element task stream is obtained.
Optionally, the node determining module 20, comprising:
Acquiring unit obtains the associated deployment package of the more new task for obtaining the more new task of the target data;
Execution unit obtains corresponding element task stream, and obtain the element task stream for executing the deployment package
In include node elements and the corresponding role identification of the node elements;
Node determination unit is identified for obtaining with the matched target roles of the user identifier, and by the target angle
Colour code know corresponding node elements as in the element task stream with the matched workflow nodes of the user identifier.
Optionally, the HDFS data administrator based on workflow includes:
Determination module, for judging whether the quantity of the workflow nodes is at least two;
Sequence determining module, for the quantity in the workflow nodes be at least two when, determine each workflow
Sequence between node;
Parallel cue module, for obtaining the workflow nodes and closing when each workflow nodes are parallel sequence
Other role identifications in addition to the user identifier of connection, and send prompt information to the role identification counterpart terminal, with
The role identification counterpart personnel is prompted to execute the more new task.
Optionally, the data update module 40, comprising:
First judging unit, for whether judging the confirmation message when receiving the confirmation message that the terminal is sent
To be identified through;
Second judgment unit, for the confirmation message be identified through when, judge the workflow nodes whether be
The final node of the element task stream;
Data updating unit will be described for when the workflow nodes are the final nodes of the element task stream
Updating target data is into HDFS.
Optionally, the HDFS data administrator based on workflow includes:
Data check module, for obtaining the upstream data of the target data and the verification rule of the upstream data,
Judge whether the upstream data is abnormal according to the verification rule;
Cue module is sent, it is corresponding to user identifier in the upstream data exception, sending pause prompt information
Terminal, and abnormal prompt information is sent to the corresponding default terminal of the upstream data.
Optionally, the HDFS data administrator based on workflow includes:
Abnormality detection module, for when detecting the target data exception, inquiring the corresponding base of abnormal object data
Plinth workflow, and obtain the corresponding workflow nodes of abnormal object data in the element task stream;
Data restoring module is updated for obtaining the history more new record of each workflow nodes, and by the history
It records and determines data convert point, by the data convert in the HDFS to the restoration point.
Wherein, the step of each Implement of Function Module of the HDFS data administrator based on workflow can refer to the present invention
Each embodiment of HDFS data managing method based on workflow, details are not described herein again.
In addition, the embodiment of the present invention also proposes a kind of computer storage medium.
Computer program, the realization when computer program is executed by processor are stored in the computer storage medium
Operation in HDFS data managing method provided by the above embodiment based on workflow.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body/operation/object is distinguished with another entity/operation/object, without necessarily requiring or implying these entity/operations/
There are any actual relationship or orders between object;The terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or the system that include a series of elements not only include that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of system.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in process, method, article or the system for including the element.
For device embodiment, since it is substantially similar to the method embodiment, related so describing fairly simple
Place illustrates referring to the part of embodiment of the method.The apparatus embodiments described above are merely exemplary, wherein making
It may or may not be physically separated for the unit of separate part description.In can selecting according to the actual needs
Some or all of the modules realize the purpose of the present invention program.Those of ordinary skill in the art are not making the creative labor
In the case where, it can it understands and implements.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in one as described above
In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone,
Computer, server, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of HDFS data managing method based on workflow, which is characterized in that the HDFS data pipe based on workflow
Reason method the following steps are included:
When receiving data update request, obtains the data and update the corresponding user identifier of request and target to be updated
Data;
The more new task and the corresponding element task stream of the more new task of the target data are obtained, the basic work is obtained
In flowing with the matched workflow nodes of the user identifier;
The workflow nodes are sent to the corresponding terminal of the user identifier, are carried out so that the user identifier corresponds to user
The update of target data confirms;
When receiving the confirmation message that the terminal is sent, according to the confirmation message by the updating target data to HDFS
In.
2. the HDFS data managing method based on workflow as described in claim 1, which is characterized in that described to receive number
When according to updating request, obtain before the data update the step of requesting corresponding user identifier and target data to be updated,
Include:
The first node element of predefined Hadoop Work flow model and the second node element of BPEL procedural model, and establish institute
State the mapping ruler of first node element Yu the second node element;
First node element in the Hadoop Work flow model is handled by the mapping ruler, obtains BPEL process
Model;
By the description file of the BPEL procedural model, conversion generates service arrangement file, and by the description file, described
Each second node element corresponds to the executable code of task in BPEL process and the service arrangement file is packaged into deployment package, and
Dispose the deployment package;
The deployment package is executed, element task stream is obtained.
3. the HDFS data managing method based on workflow as claimed in claim 2, which is characterized in that described to obtain the mesh
Mark data more new task and the corresponding element task stream of the more new task, obtain in the element task stream with the use
Family identifies the step of matched workflow nodes, comprising:
The more new task of the target data is obtained, the associated deployment package of the more new task is obtained;
The deployment package is executed, corresponding element task stream is obtained, and obtains the node elements for including in the element task stream,
And the corresponding role identification of the node elements;
It obtains and is identified with the matched target roles of the user identifier, and the target roles are identified into corresponding node elements and are made
For in the element task stream with the matched workflow nodes of the user identifier.
4. the HDFS data managing method based on workflow as claimed in claim 3, which is characterized in that described to obtain the mesh
Mark data more new task and the corresponding element task stream of the more new task, obtain in the element task stream with the use
Family identified after the step of matched workflow nodes, comprising:
Whether the quantity for judging the workflow nodes is at least two;
When the quantity of the workflow nodes is at least two, the sequence between each workflow nodes is determined;
When each workflow nodes are parallel sequence, it is associated in addition to the user identifier to obtain the workflow nodes
Other role identifications, and send prompt information to the role identification counterpart terminal, to prompt the role identification to correspond to people
Member executes the more new task.
5. the HDFS data managing method based on workflow as claimed in claim 4, which is characterized in that described described in the reception
When the confirmation message that terminal is sent, the step according to the confirmation message by the updating target data into HDFS, comprising:
When receiving the confirmation message that the terminal is sent, judge whether the confirmation message is to be identified through;
The confirmation message be identified through when, judge the workflow nodes whether be the element task stream most deutomerite
Point;
When the workflow nodes are the final nodes of the element task stream, by the updating target data into HDFS.
6. the HDFS data managing method based on workflow as described in claim 1, which is characterized in that described to receive number
When according to updating request, obtain after the data update the step of requesting corresponding user identifier and target data to be updated,
Include:
The upstream data of the target data and the verification rule of the upstream data are obtained, according to the verification rule judgement
Whether the upstream data is abnormal;
In the upstream data exception, transmission pause prompt information to user identifier counterpart terminal, and send abnormal prompt letter
Breath presets terminal to the upstream data is corresponding.
7. the HDFS data managing method based on workflow as described in claim 1, which is characterized in that described described in the reception
When the confirmation message that terminal is sent, according to the confirmation message by after step of the updating target data into HDFS, packet
It includes:
When detecting the target data exception, the corresponding element task stream of inquiry abnormal object data, and obtain the base
The corresponding workflow nodes of abnormal object data in plinth workflow;
The history more new record of each workflow nodes is obtained, and determines data convert point by the history more new record, it will
Data convert in the HDFS is to the restoration point.
8. a kind of HDFS data administrator based on workflow, which is characterized in that the HDFS data pipe based on workflow
Managing device includes:
Request module updates the corresponding user's mark of request for when receiving data update request, obtaining the data
Knowledge and target data to be updated;
Node determining module, for obtaining the more new task and the corresponding element task of the more new task of the target data
Stream, obtain in the element task stream with the matched workflow nodes of the user identifier;
Confirmation module is sent, for the workflow nodes to be sent to the corresponding terminal of the user identifier, for the use
Family identifies the update confirmation that corresponding user carries out target data;
Data update module, for when receiving the confirmation message that the terminal is sent, according to the confirmation message by the mesh
Data are marked to update into HDFS.
9. a kind of HDFS data management apparatus based on workflow, which is characterized in that the HDFS data pipe based on workflow
Reason equipment includes: memory, processor and is stored in the computer journey that can be run on the memory and on the processor
Sequence, in which:
When the computer program is executed by the processor realize as described in any one of claims 1 to 7 based on work
The step of HDFS data managing method of stream.
10. a kind of computer storage medium, which is characterized in that be stored with computer program, institute in the computer storage medium
State the HDFS number based on workflow realized as described in any one of claims 1 to 7 when computer program is executed by processor
The step of according to management method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910201985.7A CN110069465A (en) | 2019-03-16 | 2019-03-16 | HDFS data managing method, device, equipment and medium based on workflow |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910201985.7A CN110069465A (en) | 2019-03-16 | 2019-03-16 | HDFS data managing method, device, equipment and medium based on workflow |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110069465A true CN110069465A (en) | 2019-07-30 |
Family
ID=67365278
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910201985.7A Pending CN110069465A (en) | 2019-03-16 | 2019-03-16 | HDFS data managing method, device, equipment and medium based on workflow |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110069465A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310936A (en) * | 2020-04-15 | 2020-06-19 | 光际科技(上海)有限公司 | Machine learning training construction method, platform, device, equipment and storage medium |
CN112202899A (en) * | 2020-09-30 | 2021-01-08 | 北京百度网讯科技有限公司 | Workflow processing method and device, intelligent workstation and electronic equipment |
CN112416476A (en) * | 2020-11-25 | 2021-02-26 | 武汉联影医疗科技有限公司 | Workflow execution method and device, computer equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100050183A1 (en) * | 2008-08-25 | 2010-02-25 | Fujitsu Limited | Workflow developing apparatus, workflow developing method, and computer product |
US20120271860A1 (en) * | 2011-04-25 | 2012-10-25 | Cbs Interactive, Inc. | User data store |
CN103581332A (en) * | 2013-11-15 | 2014-02-12 | 武汉理工大学 | HDFS framework and pressure decomposition method for NameNodes in HDFS framework |
CN103761111A (en) * | 2014-02-19 | 2014-04-30 | 中国科学院软件研究所 | Method and system for constructing data-intensive workflow engine based on BPEL language |
CN106529917A (en) * | 2016-12-15 | 2017-03-22 | 平安科技(深圳)有限公司 | Workflow processing method and device |
-
2019
- 2019-03-16 CN CN201910201985.7A patent/CN110069465A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100050183A1 (en) * | 2008-08-25 | 2010-02-25 | Fujitsu Limited | Workflow developing apparatus, workflow developing method, and computer product |
US20120271860A1 (en) * | 2011-04-25 | 2012-10-25 | Cbs Interactive, Inc. | User data store |
CN103581332A (en) * | 2013-11-15 | 2014-02-12 | 武汉理工大学 | HDFS framework and pressure decomposition method for NameNodes in HDFS framework |
CN103761111A (en) * | 2014-02-19 | 2014-04-30 | 中国科学院软件研究所 | Method and system for constructing data-intensive workflow engine based on BPEL language |
CN106529917A (en) * | 2016-12-15 | 2017-03-22 | 平安科技(深圳)有限公司 | Workflow processing method and device |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310936A (en) * | 2020-04-15 | 2020-06-19 | 光际科技(上海)有限公司 | Machine learning training construction method, platform, device, equipment and storage medium |
CN111310936B (en) * | 2020-04-15 | 2023-06-20 | 光际科技(上海)有限公司 | Construction method, platform, device, equipment and storage medium for machine learning training |
CN112202899A (en) * | 2020-09-30 | 2021-01-08 | 北京百度网讯科技有限公司 | Workflow processing method and device, intelligent workstation and electronic equipment |
CN112202899B (en) * | 2020-09-30 | 2022-10-25 | 北京百度网讯科技有限公司 | Workflow processing method and device, intelligent workstation and electronic equipment |
CN112416476A (en) * | 2020-11-25 | 2021-02-26 | 武汉联影医疗科技有限公司 | Workflow execution method and device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110546606A (en) | Tenant upgrade analysis | |
CN102769659B (en) | The web services communication that engagement process control system is used | |
CN110069465A (en) | HDFS data managing method, device, equipment and medium based on workflow | |
EP2360871A1 (en) | Machine to machine architecture | |
US10684890B2 (en) | Network deployment for cellular, backhaul, fiber optic and other network infrastructure | |
CN109687993A (en) | A kind of Internet of Things alarm and control system and method based on block chain | |
CN108681975A (en) | A kind of household services approaches to IM, device and equipment | |
CN105635297A (en) | Terminal device control method and system | |
CN104135381A (en) | Hierarchical service management method and system | |
CN112948217B (en) | Server repair checking method and device, storage medium and electronic equipment | |
CN109976724A (en) | Development approach, device, equipment and the computer readable storage medium of leasing system | |
CN112685287B (en) | Product data testing method and device, storage medium and electronic device | |
CN109254914A (en) | Software development kit test method, system, computer installation and readable storage medium storing program for executing | |
CN104954412B (en) | The firmware management method, apparatus and generic service entity of internet-of-things terminal | |
CN106162715A (en) | Method for managing and monitoring and device | |
CN105101040A (en) | Resource creating method and device | |
CN106485338A (en) | Medical information appointment registration system and method | |
JP2003022196A (en) | Method for automatically executing test program in portable terminal | |
CN109656964A (en) | The method, apparatus and storage medium of comparing | |
CN107609843A (en) | Contract renewal method and server | |
CN109102248A (en) | Amending method, device and the computer readable storage medium of nodal information | |
CN113988819A (en) | Processing method and system of approval process, electronic device and storage medium | |
Faria et al. | A testing and certification methodology for an Ambient-Assisted Living ecosystem | |
CN105988431A (en) | Management information system and product process configuration data updating method and product process configuration data updating device thereof | |
CN105988932B (en) | A kind of test method and system of ESB |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |