CN105912588A

CN105912588A - Visualization processing method and system for big data based on memory calculations

Info

Publication number: CN105912588A
Application number: CN201610203223.7A
Authority: CN
Inventors: 赵维平; 刘龙; 王鑫毅; 钟新斌; 于雪龙
Original assignee: Agricultural Bank of China
Current assignee: Agricultural Bank of China
Priority date: 2016-03-31
Filing date: 2016-03-31
Publication date: 2016-08-31

Abstract

The invention discloses a visualization processing method for big data based on memory calculations. The method comprises following steps: acquiring a node used for data processing; connecting the node according to the data processing demand and setting a node parameter in order to form a working flow; and triggering an execution engine to operate the working flow and outputting a processing result.The visualization processing method for big data based on memory calculations has following beneficial effects: by encapsulating the conventional data analysis process, a visualized work flow capable of hauling editing is utilized for data processing and mining; and by placing calculation processes and results into a memory, more agile and easily-used experience can be provided for users. The invention further discloses a visualization processing system for big data based on memory calculations.

Description

A kind of big data visualization processing method calculated based on internal memory and system

Technical field

The present invention relates to technical field of data processing, particularly relating to a kind of big data based on internal memory calculating can Depending on changing processing method and system.

Background technology

Under the tide of the Internet, financial industry is also into big data age.In the face of the data increased rapidly How amount, excavate useful information, and then utilize it further to assist warp from the data of accumulation Battalion, client's marketing, fraud detection, reduction cost etc. become financial industry facing challenges and problem.

At present, during data are processed, user need coding code carry out process and To user, the operation of analytical data, requires that threshold is higher, and manual compiling program is easily made mistakes, and the time becomes This is higher, and program is difficult to multiplexing, after being adjusted for program or parameter, needs in whole original number According to upper execution, it is impossible to some reusable intermediate data results are carried out multiplexing.

Summary of the invention

The invention provides a kind of big data visualization processing method calculated based on internal memory, by traditional data Analysis process is packaged, and uses the workflow of the visual editor of pulling to carry out data process and excavation Work, is placed on calculating process and result in internal memory, has provided the user quicker, easy-to-use use Experience.

The invention provides a kind of big data visualization processing method calculated based on internal memory, including:

Obtain the node processed for data；

Connect described node according to data processing needs, described node parameter is set, form workflow；

Trigger enforcement engine and run described workflow, export result.

Preferably, described acquisition for data process node particularly as follows:

Add and/or delete the node processed for data.

Preferably, described foundation data processing needs connection node includes:

Obtain and trigger primary nodal point to the running orbit of secondary nodal point；

The unidirectional arrow of described primary nodal point extremely described secondary nodal point is generated according to described running orbit.

Preferably, described described node parameter be set include:

The characterising parameter of node shape, color parameter, location parameter, path parameter, metamessage ginseng are set Number and data mining algorithm parameter.

Preferably, described triggering enforcement engine runs described workflow, and output result includes:

Resolve described workflow, define relation of interdependence between node according to the order of connection of node；

Generate internal memory Computational frame according to described node dependence, node metamessage and node parameter can hold The code of row；

Being performed in internal memory computing cluster by described code distribution, output performs result.

A kind of big data visualization processing system calculated based on internal memory, including:

First acquiring unit, for obtaining the node processed for data；

Form unit, for connecting described node according to data processing needs, described node parameter be set, Form workflow；

Trigger element, is used for triggering enforcement engine and runs described workflow, export result.

Preferably, described first acquiring unit is specifically for adding and/or deleting the node processed for data.

Preferably, described formation unit includes:

Second acquisition unit, triggers primary nodal point to the running orbit of secondary nodal point for obtaining；

First signal generating unit, generates described primary nodal point to described secondary nodal point according to described running orbit Unidirectional arrow.

Preferably, described formation unit also includes:

Unit is set, for arranging the characterising parameter of node shape, color parameter, location parameter, path Parameter, metamessage parameter and data mining algorithm parameter.

Preferably, described trigger element includes:

Resolution unit, is used for resolving described workflow, defines between node mutual according to the order of connection of node Dependence；

Second signal generating unit, for raw according to described node dependence, node metamessage and node parameter Become the executable code of internal memory Computational frame；

Performance element, for being performed in internal memory computing cluster by described code distribution, output performs Result.

From such scheme, a kind of big data visualization calculated based on internal memory that the present invention provides processes Method, by obtaining the node processed for data, and the demand processed according to data connects node, if Put node parameter, generate workflow, then trigger enforcement engine and run the workflow generated, export data Result, uses the workflow of the visual editor of pulling to carry out data process and excacation, logical Cross enforcement engine calculating process and result to be placed in internal memory, provided the user quicker, easy-to-use Experience.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to reality Execute the required accompanying drawing used in example or description of the prior art to be briefly described, it should be apparent that below, Accompanying drawing in description is only some embodiments of the present invention, for those of ordinary skill in the art, On the premise of not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is a kind of big data visualization process side calculated based on internal memory disclosed in the embodiment of the present invention one The flow chart of method；

Fig. 2 is a kind of big data visualization process side calculated based on internal memory disclosed in the embodiment of the present invention two The flow chart of method；

Fig. 3 is that disclosed in the embodiment of the present invention one, a kind of big data visualization calculated based on internal memory processes system The structural representation of system；

Fig. 4 is that disclosed in the embodiment of the present invention two, a kind of big data visualization calculated based on internal memory processes system The structural representation of system.

Detailed description of the invention

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out Clearly and completely describe, it is clear that described embodiment is only a part of embodiment of the present invention, and It is not all, of embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art are not doing Go out the every other embodiment obtained under creative work premise, broadly fall into the scope of protection of the invention.

As it is shown in figure 1, disclosed in the embodiment of the present invention one a kind of based on internal memory calculate big data visualization Change processing method, including:

The node that S101, acquisition process for data；

When needs carry out data process to big data, select corresponding node according to the demand that data process, The each node obtained is borne by data processing task, and each node stores the output number of its correspondence Metamessage according to collection.Metamessage preserves data structure and the type of data, including every every class recorded Name, the role of each column, and the data type of each column.The in store flag bit of node, indicates it Whether corresponding data set is up-to-date and effective.

Node can be divided into Inport And Outport Node, basic handling node and data mining node by function.Wherein: The source of Inport And Outport Node definition data and position to be derived, support multiple storage coding, lattice Formula and mode.

Basic handling node include Filter node, Union node, Map node, FlapMap node, ReduceByKey node, Join node, Sample node.

Filter node realizes filtering according to user-defined condition, and result is to meet the number of specified conditions According to collection.

Union node carries out joint operation to the data set that two structures are consistent, and result is two input data The intersection of collection.

Map node is carried out a specific function to each element in input data set and produces one New data set.Element in any former data set have in new data set and only one of which element therewith Corresponding.

FlatMap node is similar with Map node, and an element in input data set is through FlatMap process After can generate multiple unit and usually build new data set, can be used for the operations such as wide table fractionation.

The corresponding dual operation of ReduceByKey node, to by (key, value) element group to form The data set become operates, and by the incoming input function of value of element identical for key, produces one newly simultaneously Value, in newly generated value and data set, next element carries out same operation again, until a key is only Till one value.

Join node provides the Nature Link operation of two data sets.

The adoption rate that Sample node sets according to user, carries out having that put back to or adopts at random without put back to Sample, is output as the result data collection used.

Data mining node includes that K-Means, Naive Bayes etc. realizes the node of data mining algorithm.

S102, foundation data processing needs connect node, arrange node parameter, form workflow；

Getting after the node that data process, node is attached by the demand processed according to data, Different nodes connects the data handling procedure representing different, the simultaneously genus to the node foundation node obtained Property carry out corresponding parameter setting, form workflow.Workflow is a directed acyclic graph, by node and Line composition with arrow, every line is connected to two different nodes.

S103, triggering enforcement engine run workflow, export result.

After forming workflow, calculate triggering enforcement engine based on internal memory and workflow is run, the most defeated Go out data processed result.

In sum, in the above-described embodiments, by obtaining the node processed for data, and according to number Connect node according to the demand processed, node parameter is set, generate workflow, then trigger enforcement engine fortune The workflow that row generates, exports data processed result, uses the work of the visual editor of pulling to flow to Row data process and excacation, calculating process and result are placed in internal memory by enforcement engine, for User provides quicker, easy-to-use experience.

As in figure 2 it is shown, disclosed in the embodiment of the present invention two a kind of based on internal memory calculate big data visualization Change processing method, including:

S201, add and/or delete for data process node；

When needs big data are carried out data process time, the demand processed according to data, by adding and/or The mode deleted selects corresponding node.The each node obtained is borne by data processing task, each Node all stores the metamessage of the output data set of its correspondence.Metamessage preserve data data structure and Type, including the name of every class of every record, the role of each column, and the data type of each column.Joint The in store flag bit of point, indicates whether its corresponding data set is up-to-date and effective.

Node can be divided into Inport And Outport Node, basic handling node and data mining node by function.Wherein: The source of Inport And Outport Node definition data and position to be derived, support multiple storage coding, form And mode.

Join node provides the Nature Link operation of two data sets.

S202, the running orbit of acquisition triggering primary nodal point to secondary nodal point；

Get after the node that data process, the demand that user processes according to data, dragged by torr Line between visual editing interface definition node, i.e. obtains and triggers primary nodal point to secondary nodal point Running orbit.

S203, foundation running orbit generate the unidirectional arrow of primary nodal point extremely described secondary nodal point；

Primary nodal point is generated to the unidirectional arrow of secondary nodal point according to user's running orbit of dragging of torr.

S204, the characterising parameter of node shape, color parameter, location parameter, path parameter, unit are set Information parameter and data mining algorithm parameter, form workflow；

Then the description of shape, color, position, path, metamessage and the data mining to each node Algorithm carries out parameter setting, forms the workflow that data process.In the implementation, workflow is JSON form Data, the line of each node and band arrow has an id, records connected two in line The id of node.

S205, parsing workflow, define relation of interdependence between node according to the order of connection of node；

After forming workflow, the enforcement engine first step completes from the logical model of workflow to executable code Conversion.First workflow is resolved, mutually depend on according to the order of connection definition of node is internodal The relation of relying.

S206, foundation node dependence, node metamessage and node parameter generate internal memory Computational frame can The code performed；

Then according to the dependence of node, the metamessage of node and user-defined node parameter, generate The executable code of internal memory Computational frame.

S207, being performed in internal memory computing cluster by code distribution, output performs result.

Enforcement engine second step will carry out actual holding in the code distribution generated before to internal memory computing cluster OK, each node for interim findings as the intermediate object program of whole execution flow process, can as required or Person's user setup is stored in internal memory, in order to accelerate to process next time.

In actual data handling procedure, user can be by modifying to workflow, it is achieved fast velocity modulation Entire data processes and analysis process, specifically can be connected by interpolation, deletion of node or concept transfer, change Become the topological structure of workflow, or amendment node parameter, or amendment input data, mark node effectiveness Flag bit change simultaneously, when enforcement engine performs, according to the effectiveness of interim findings in the middle of in internal memory, Interim findings in the middle of selective execution and renewal, for the node not affected by modification stream, Enforcement engine directly quote before result without again performing.

In sum, in the above-described embodiments, using browser as the carrier of user operation, it is not necessary to volume Outer installation client software, it is easy to promote between user.Code process data mode will be write taken out As and promote, it is provided that visual, by the way of pulling formation workflow, significant increase ease for use, Reduce the threshold of data analysis.Calculating based on internal memory, the intermediate data result that each node is corresponding caches In internal memory, result can be rapidly be to when analyzing process adjusting.Workflow nodes can be expanded, Support more data manipulation and algorithm, comply with big data and process and the trend of the Internet finance.

As it is shown on figure 3, disclosed in the embodiment of the present invention one a kind of based on internal memory calculate big data visualization Change processing system, including:

First acquiring unit 301, for obtaining the node processed for data；

Node can be divided into Inport And Outport Node, basic handling node and data mining node by function.Wherein, The source of Inport And Outport Node definition data and position to be derived, support multiple storage coding, form And mode.

Join node provides the Nature Link operation of two data sets.

Form unit 302, for connecting node according to data processing needs, node parameter is set, forms work Flow；

Trigger element 303, is used for triggering enforcement engine and runs workflow, export result.

As shown in Figure 4, a kind of big data visualization calculated based on internal memory disclosed in the embodiment of the present invention two Change processing system, including:

First acquiring unit 401, for adding and/or deleting the node processed for data；

Join node provides the Nature Link operation of two data sets.

Second acquisition unit 402, triggers primary nodal point to the running orbit of secondary nodal point for obtaining；

First signal generating unit 403, for generating described primary nodal point to described second according to described running orbit The unidirectional arrow of node；

Unit 404 is set, for arranging the characterising parameter of node shape, color parameter, location parameter, road Footpath parameter, metamessage parameter and data mining algorithm parameter, form workflow；

Resolution unit 405, is used for resolving described workflow, defines phase between node according to the order of connection of node Dependence mutually；

Second signal generating unit 406, for according to described node dependence, node metamessage and node parameter Generate the executable code of internal memory Computational frame；

Performance element 407, for being performed in internal memory computing cluster by described code distribution, output is held Row result.

If the function described in the present embodiment method realizes and as independent using the form of SFU software functional unit When production marketing or use, a calculating device-readable can be stored in and take in storage medium.Based on so Understanding, part or the part of this technical scheme that prior art is contributed by the embodiment of the present invention can Embodying with the form with software product, this software product is stored in a storage medium, if including Dry instruction with so that calculating equipment (can be personal computer, server, mobile computing device Or the network equipment etc.) perform all or part of step of method described in each embodiment of the present invention.And it is front The storage medium stated includes: USB flash disk, portable hard drive, read only memory (ROM, Read-Only Memory), Random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can To store the medium of program code.

In this specification, each embodiment uses the mode gone forward one by one to describe, and each embodiment stresses Being the difference with other embodiments, between each embodiment, same or similar part sees mutually.

Described above to the disclosed embodiments, makes professional and technical personnel in the field be capable of or uses The present invention.Multiple amendment to these embodiments will be aobvious and easy for those skilled in the art See, generic principles defined herein can without departing from the spirit or scope of the present invention, Realize in other embodiments.Therefore, the present invention is not intended to be limited to the embodiments shown herein, And it is to fit to the widest scope consistent with principles disclosed herein and features of novelty.

Claims

1. the big data visualization processing method calculated based on internal memory, it is characterised in that including:

Obtain the node processed for data；

Trigger enforcement engine and run described workflow, export result.

Method the most according to claim 1, it is characterised in that described acquisition processes for data Node particularly as follows:

Add and/or delete the node processed for data.

Method the most according to claim 2, it is characterised in that described foundation data processing needs is even Connect node to include:

Method the most according to claim 3, it is characterised in that described described node parameter bag is set Include:

Method the most according to claim 4, it is characterised in that described triggering enforcement engine runs institute Stating workflow, output result includes:

6. the big data visualization processing system calculated based on internal memory, it is characterised in that including:

First acquiring unit, for obtaining the node processed for data；

System the most according to claim 6, it is characterised in that described first acquiring unit is specifically used In adding and/or deleting the node processed for data.

System the most according to claim 7, it is characterised in that described formation unit includes:

First signal generating unit, for generating described primary nodal point to described second section according to described running orbit The unidirectional arrow of point.

System the most according to claim 8, it is characterised in that described formation unit also includes:

System the most according to claim 9, it is characterised in that described trigger element includes: