CN107688663A - The forming method of acyclic data analysis queue and the big data support platform for including it - Google Patents
The forming method of acyclic data analysis queue and the big data support platform for including it Download PDFInfo
- Publication number
- CN107688663A CN107688663A CN201710847663.0A CN201710847663A CN107688663A CN 107688663 A CN107688663 A CN 107688663A CN 201710847663 A CN201710847663 A CN 201710847663A CN 107688663 A CN107688663 A CN 107688663A
- Authority
- CN
- China
- Prior art keywords
- operator
- task
- flow
- invalid
- row
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2237—Vectors, bitmaps or matrices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
Abstract
The invention provides a kind of forming method of acyclic data analysis queue and its big data support platform is included, wherein, the forming method of acyclic data analysis queue comprises the following steps:S21, user build flow of task by user terminal and fill in each operator parameter that flow of task uses;S22, server receive each operator parameter that the flow of task for carrying out user terminal and flow of task use;S23, server establish adjacency matrix M:S231, flow of task is set as a digraph G, be that the quantity of operator is N in flow of task, establishes N*N adjacency matrix in flow of task at operator place at digraph G node;S24, ring judge.The forming method of the acyclic data analysis queue and comprising its big data support platform solve the problems, such as in the prior art user establish flow of task close ring be present and cause carry out bottom call be infinite loop phenomenon occur.
Description
Technical field
The present invention relates to Data Analysis Platform, and in particular to a kind of forming method of acyclic data analysis queue and includes it
Big data support platform.
Background technology
A kind of B big data flow modeling analysis engines of Publication No. CN 105550268 are disclosed in Chinese patent, this is big
Data flow modeling analysis engine includes podium level, task scheduling layer and boundary layer;The podium level completes scheduling of resource, distribution
Work;The task scheduling layer includes correction verification module, parsing module, task scheduling modules and algorithm bag;Wherein, the calibration mode
Block provides the verifying function whether data analysis flow meets flow scheme design rule, meets the part of verification rule, can enter
Parsing module;The parsing module provides is converted to executable data analysis flow by the data analysis flow of boundary layer generation
The analytical capabilities of task;The complete data analysis flow that the task scheduling modules generate according to parsing module, described in scheduling
Various types of data parser interface in algorithm bag, the analysis process task program that can completely run is formed, and dispatch bottom
Resource carries out the execution of data analysis program;The boundary layer:The platform interface of data analysis modelling operability, each number are provided
According to the algorithm Bao Jun of analysis on interface with unique mark can towed component exist, user passes through each algorithm groups of interface operation
Part, and with oriented line and curve connection, represent data analysis flow direction and step, be combined into complete business datum parser mould
Type, background task scheduler module and algorithm bag are run by the startup function at interface, scheduling resource completes the quick analysis of data
And processing.Although the big data flow modeling analysis engine can to a certain extent efficiently, rapidly process mass data, should
The shortcomings that big data flow modeling analysis engine is present be:
The big data flow modeling analysis engine is to create data analysis flowcharts for user, and then platform automatically generates
Operator queue corresponding with data analysis flowcharts, then user's input data to be analyzed, data to be analyzed pass through operator queue
In each operator handle one by one, finally give analyze data.But during user creates data analysis flowcharts, pole has can
There can be the ring of closure, therefore when treating analyze data and being handled, reach when being calculated in the ring of closure, it may appear that endless loop
The situation of calculating, take substantial amounts of CPU and calculate space so that other calculate slow.
The content of the invention
The invention provides a kind of forming method of acyclic data analysis queue and its big data support platform is included, is solved
It is asking for infinite loop phenomenon occur that certainly user, which establishes flow of task close ring to be present and cause to carry out bottom to call, in the prior art
Topic.
To achieve the above object, present invention employs following technical scheme:
A kind of forming method of acyclic data analysis queue, comprises the following steps:
S21, user build flow of task by user terminal and fill in each operator parameter that flow of task uses;
S22, server receive each operator parameter that the flow of task for carrying out user terminal and flow of task use;
S23, server establish adjacency matrix M:
S231, flow of task is set as a digraph G, be task in flow of task at operator place at digraph G node
The quantity of operator is N in flow, establishes N*N adjacency matrix;
S232, there is directed edge to be handled as follows between another operator every one-to-one operator:An if operator i1To another
One operator j1Between have directed edge, then i-th in adjacency matrix M1Row jth1The value of rowIt is assigned to 1;To each operator to separately
It is handled as follows between one operator without directed edge:An if operator i2To another operator j2Between without directed edge, then in adjacency matrix
I-th in M2Row jth2The value of rowIt is assigned to 0;
S24, ring judge:
S241, find out i-th in M3Numerical value on row is all 0, and finds out i-th3The corresponding operator opi of row3;
S242, operator opi3Corresponding i-th3OK, i-th is found out3It is not 0 train value j on row3, find out j3Corresponding operator opj3;
I-th in S246, deletion M3Row and i-th3Row, repeat step S41~S45 are complete until numerical value is not present in matrix M
For 0 row;
It is invalid that operator in S247, M corresponding to remaining row is judged as.
Preferably,
Need to carry out operator input number judgement after step S23 and before step S24 and operator parameter judges;
Operator input number judges to specifically include following steps:First, the flow of task built according to user, count each
The input quantity of individual operator;Then, it is defeated specified in the input quantity of each operator counted is designed with the operator
Inbound port quantity is compared, if equal, the operator is the currently active;If it is invalid that the operator is judged as;
Operator parameter judges:Quantity judgement is carried out to the parameter of each operator, if the operator has parameter value
For sky situation, then the operator be judged as invalid.
Preferably, the step of step S24 also includes carrying out between step S242 and step S246 below:
S243, judge operator opj3Whether effectively, if opj3Current is effective, then opj3Validity is equal to opi3It is effective
Property;If opj3Currently to be invalid, then opi3Continue to keep former disarmed state;
S244, repeat step S242~S243, until i-th3It is all not processed for 0 row on row;
S245, judge operator opi3Whether effectively, if opi3Effectively, then by opi3Add executable task queue and carry out
Step S246;If opi3It is invalid, then it is added without executable task queue intQ and keeps invalid;
Preferably, carrying out step S243 while also needing to mark operator opi3Execution classification, mark principle it is as follows;
Classification is performed to be divided into:Commonly, merge and start, merge in end and merging;There is the operator classification to be in operator:Close
And processing starts point operator and merging treatment terminates a point operator;The execution classification that merging treatment starts point operator is opened for merging
Begin, the execution classification that merging treatment terminates point operator terminates for merging, starts a point operator positioned at merging treatment in flow of task
During operator between terminating point operator with merging treatment performs classification to merge, start point positioned at merging treatment in flow of task
It is common that the operator that operator terminates with merging treatment outside point operator, which performs classification,;
Need to carry out operator execution classification Effective judgement after step S247 has been carried out, operator performs classification Effective judgement
Specifically include following steps:
S251, executable task queue intQ heads operator q go out team, obtain its operator classification typeq and perform classification
dtq;
S252, following Effective judgement carried out to operator q according to typeq and dtq:
A, when operator q execution classification starts to merge, then whether the input for judging operator q is operator or another in merging
One merges the output for starting operator, if so, then operator q is invalid;If it is not, then operator q keeps former invalid or effective status;
B, when operator q execution classification is in merging and operator q is multi input operator, then judge whether operator q is one
Individual input be Double Data collection data flow and another input is forms data collection data flow, if so, then operator q keep it is former invalid or
Effective status;If it is not, then operator q is invalid;
C, when operator q execution classification is in merging, judge that being output to for operator q writes data source operator, if so, then calculating
Sub- q is invalid;If it is not, then operator q keeps former invalid or effective status;
D, for the operator q in addition to a, b and c situation without judging, operator q keeps former invalid or effective status;
S253, the invalid or effective status for checking operator q, if operator q is effective status, it is added into final execution queue
taskQ;If operator q is disarmed state, operator q will be rejected;
S254, repeat step S241~S242, until can perform task queue intQ as sky, obtain finally performing queue
taskQ。
The present invention also provides a kind of big data support platform, including:Use the formation side of above-mentioned acyclic data analysis queue
The user terminal and server of method.
Compared to prior art, the present invention has the advantages that:
By setting steps S23 and S24, realize solve user in the prior art and establish flow of task close ring to be present and lead
It is the problem of infinite loop phenomenon occur to cause to carry out bottom to call, and avoids endless loop, offloading the CPU occur when calculating so that
CPU has more spaces to calculate correct flow of task, avoids the occurrence of unnecessary calculating.
Brief description of the drawings
Fig. 1 is the example 1 for being judged as no close ring in embodiment 1 in step S24;
Fig. 2 is the example 2 for being judged as no close ring in embodiment 1 in step S24;
Fig. 3 is the invalid example 3 of Judge Operator in the case of a in step S252 in embodiment 1;
Fig. 4 is the invalid example 4 of Judge Operator in the case of b in step S252 in embodiment 1;
Fig. 5 is the invalid example 5 of Judge Operator in the case of c in step S252 in embodiment 1.
Embodiment
Embodiment 1:
A kind of forming method of acyclic data analysis queue, comprises the following steps:
S21, user build flow of task by user terminal and fill in each operator parameter that flow of task uses;
S22, server receive each operator parameter that the flow of task for carrying out user terminal and flow of task use;
S23, server establish adjacency matrix M:
S231, flow of task is set as a digraph G, be task in flow of task at operator place at digraph G node
The quantity of operator is N in flow, establishes N*N adjacency matrix;
S232, there is directed edge to be handled as follows between another operator every one-to-one operator:An if operator i1To another
One operator j1Between have directed edge, then i-th in adjacency matrix M1Row jth1The value of rowIt is assigned to 1;To each operator to separately
It is handled as follows between one operator without directed edge:An if operator i2To another operator j2Between without directed edge, then in adjacency matrix
I-th in M2Row jth2The value of rowIt is assigned to 0;
S24, ring judge:
S241, find out i-th in M3Numerical value on row is all 0, and finds out i-th3The corresponding operator opi of row3;
S242, operator opi3Corresponding i-th3OK, i-th is found out3It is not 0 train value j on row3, find out j3Corresponding operator opj3;
I-th in S246, deletion M3Row and i-th3Row, repeat step S41~S45 are complete until numerical value is not present in matrix M
For 0 row;
It is invalid that operator in S247, M corresponding to remaining row is judged as.
If the flow of task that user establishes is as shown in Figure 1, then 3*3 adjacency matrix M should be established, due to calculating
Son 11 has directed edge to operator 12, then M12=1;Because operator 12 to operator 13 has directed edge, then M23=1;And remaining is
0.Finally give the adjacency matrix of example 1When carrying out step S24:
When carrying out step S241 for the first time, what step S241 was found arranges for the 1st, and operator corresponding to the 1st row is exactly in figure
Filter operator, filter operator correspond to the 1st row, and numerical value is not 2 for 0 train value in the 1st row, and train value is the conversion of 2 pairs of reply features
Operator, if filter operator is effective, then the transformation operator to feature is effective, and pre-operator is effective, its follow-up operator
Effectively, the 1st row and the 1st row are just deleted, is obtainedNow the 1st the transformation operator to tackling feature is arranged, the 2nd row are corresponding
Clustering operator, for the 1st row to tackling the transformation operator of feature, the 2nd row corresponds to clustering operator;
During second of progress step S241, what step S241 was found arranges for the 1st, and operator corresponding to the 1st row is exactly in figure
To the transformation operator of feature, to corresponding 1st row of transformation operator of feature, numerical value is not 2 for 0 train value in the 1st row, train value 2
Corresponding clustering operator, filter operator is effective, then the transformation operator to feature is effective, and pre-operator is effective, and its is follow-up
Operator is effective, just deletes the 1st row and the 1st row, obtains M=(0), now the corresponding clustering operator of the 1st row, and the corresponding cluster of the 1st row is calculated
Son;
When third time carries out step S241, the 1st row and the 1st row are now deleted, obtains M without residue, now just explanation should
Flow of task is no close ring, meets regulation, is not in the phenomenon that endless loop calculates;
If the flow of task that user establishes is as shown in Figure 2, then 3*3 adjacency matrix M should be established, due to calculating
Son 11 has directed edge to operator 12, then M12=1;Because operator 12 to operator 13 has directed edge, then M23=1;Due to operator
13 have directed edge to operator 11, then M31=1 and remaining be 0.Finally give the adjacency matrix of example 1Carry out
During step S24:
When carrying out step S241, step S241 can not find the row that train value is all 0, and it is invalid that whole operators are set to, filter operator,
Transformation operator, clustering operator to feature just constitute the ring of a closure, can cause endless loop.
Follow-up for convenience to understand, our the calculating bags to server memory storage illustrate here, calculate bag memory storage
There is substantial amounts of operator, the operator calculated for data analysis in bag is roughly divided into following 5 major class from operator classification:
1st class:Data source is read and write
Data source read-write class includes reading data source and writes two operators of data source, can have multiple readings in a working space
Data source and the operator for writing data source.
Reading data source operator is the starting point of flow of task, it is necessary to obtain the row of all data files composition of currently used person
Table, a file is therefrom selected to be passed to as data source in flow.The operator of data source is read in the actual not realization of bottom,
The function of the operator is Transfer Parameters, it is only necessary to the path of the file of selection is passed into its follow-up operator, that is, completes reading
According to the task in source.
The terminal that data source operator is flow of task is write, the intermediate result for acting as being inputted saves as user's setting
The final result of filename, there is the specific implementation for writing data source operator in bottom.
2nd class:Data prediction
Data prediction class includes multiple operators for data prediction, including to string processing, to data set
Processing, processing to table etc..
Processing to character string includes cutting, combines, takes substring etc. to operate.In order to ensure the fine granularity of operator, the present invention
A single string processing operator is provided, a field in DataFrame can only be handled.It is right if necessary
What multiple character strings were handled, it can be realized with the mode that multistep operator is superimposed.Each string processing operator can basis
Concrete condition generates multiple new fields, and such as character string is carried out to take substring to operate, and the design allows to take a row character string more
Individual substring, generate the operation of multiple new fields.
Although it is exactly the i.e. form of table in the form of DataFrame that data set, which enters after operator, internal memory is recorded in
In, but the processing to data set and table in explanation is distinguish between, the processing to table be related to specific field processing and SQL pairs
The operation of table, and the processing to data set represents and only data set is handled in itself, is not related to specific field.
Processing to data set mainly for data set scale, including proportional sampling, in proportion cutting data set, close
And data set etc..
Processing to table includes connection, duplicate removal, filtering, field deletion, modification field type, the mathematical formulae meter to field
Calculate etc..Processing to table has benefited from Spark SQL presence and the optimization enhancings to DataFrame afterwards of Spark 2.0, because
Spark provides SQL interface, supports substantial amounts of SQL core operations, enters so the processing to table can be all based on SQL
OK.All it is the technical ability that must be grasped and SQL is for most of data analyst, so being carried out with SQL to table
Processing can be well adapted for the custom of data analyst, can also complete data processing task well.
3rd class:Feature Engineering
Feature Engineering class operator is based primarily upon the feature bags in Spark MLlib, is related to processing to text, to spy
The conversion of sign, to coding of feature etc..
For Feature Engineering, all processing to feature are all the conversion of feature, but in the present note, to itself plus
To distinguish.In the present invention, the conversion of feature refers to the conversion of the feature of same type, and the coded representation of feature is by other kinds of spy
Levy the feature transformation of encoded formation value type feature.
Processing to text is related to natural language processing NLP, is mainly used in obtaining the key message of text, including divides
Word, stop words deletion, NGram, TF-IDF etc..
Conversion to feature mainly includes principal component analysis PCA, Polynomial Expansion, all kinds of scalings, binaryzation etc..To feature
Conversion be mainly used in the standardization of feature and ensure the independence of feature, the accuracy for the model trained afterwards can be improved.
One-hot coding, string encoding, vector coding etc. are mainly included to the coding of feature.Because most model will
Feature is asked to be indicated with numerical value, so needing just to go to represent with numerical value by various types of features in Feature Engineering part.
4th class:Model
Model class operator mainly includes the model of the type such as classification, recurrence, cluster, collaborative filtering.Because Spark is one
Distributive parallel computation framework, and most models are not adapted to distributed environment at the beginning of its design, so Spark is used
Model algorithm modified and that be adapted to distributed environment, but compared in effect with unit model algorithm, specifically asking
There are some slight differences in topic.
Disaggregated model mainly includes random forest classification, GBDT classification, logistic regression etc.;Regression model mainly includes
Random forest recurrence, GBDT recurrence, linear regression etc.;Clustering Model mainly includes Kmeans clusters etc.;Collaborative filtering mainly wraps
Include alternating least-squares ALS etc..
Core of the model as machine learning, for Data Analysis Platform, and the component of core.The present invention
In provide abundant model, can meet the needs of analysis personnel to a certain extent.
5th class:Merging treatment
In an analysis task, an identical operation may be carried out to different data sets, such as to two data
The field of the identical name of collection carries out identical mathematic(al) representation calculating.Due to there is the presence of such case, it can be needed
The part for carrying out same operation merges processing.Because each operator is as the Spark tasks of a simple function, really need
The operational ton to be carried out is in fact little, time-consuming also not grow, so by contrast, startup program simultaneously initializes the time of context just
Long, the proportion for accounting for total duration is larger.Therefore, using merging treatment, the scale of task can both be reduced, again can be by originally
The task start of two operators and initialization context time shorten to original half or so.
In machine learning, the demand for handling multiple data sets be present, it is general that there are a training dataset, a needs
Classified or the data set of operation such as cluster, it is also possible to have a test set.These data sets are needed to carry out Feature Engineering
Operation, and the operation of some Feature Engineerings is to need while multiple data sets are carried out, its transformation relation needs are applied to more
Individual data are concentrated, and can produce large effect to data result afterwards.With most common string encoding in training set and
Exemplified by being operated on test set, it is ranked up according to the number that character string in training set occurs, numeric coding is then carried out, in order to protect
Result after card coding is same to test set effectively, it is necessary to which the mapping relations encoded according to this step convert to test set, such as
Fruit is cooked string encoding to training set and test set respectively, and the occurrence number of the identical characters string in two datasets is likely to not
Together, it is also different to may result in its coding result, finally results in model error.It is of the invention in order to prevent the appearance of these situations
Merging treatment is abstracted as in design, in merging process, other are counted with the Feature Engineering information handled training set
Operated according to collection, ensure the uniformity of coding result.
In order to meet the definition of the input port number of other classification operators, during merging treatment, two datasets will be by
Operated as a data set, form Double Data collection data flow, other classes can be so used during merging treatment
Other operator, without carrying out the realization of two kinds of input quantity to identical operator, but can be according to each operator institute in bottom
The position at place, is judged, makes correct processing.
Merge the design of class operator, both in the case where not producing mistake, the reasonably optimizing use of resource, and can enough solve
Certainly in machine learning Feature Engineering information transfer problem, improve overall efficiency, also solve the problems, such as potential.
Because there is the presence of merging treatment process, and operator does not carry out two sets of exploitations, so needing to judge Spark operators
Execution classification, tell operator to need the data set number that is operated.
The location of according to operator in task flow, and the classification of operator in itself, operator is roughly divided into four and held
Row classification:Commonly, merging treatment starts, merging treatment terminates, in merging treatment.The execution classification of operator can post-operator for it
Row control provide part control foundation.
It is operator in merging treatment operator terminates operator to merging treatment since merging treatment, other are calculated to be common
Son.
Under general case, the input of each input port of operator for forms data collection data flow.It inputs operator
Operator or merging treatment under general case terminate operator.
It is to merge two datasets that merging treatment, which starts operator with two input ports and an output port, effect,
Into a Data Stream Processing.It inputs operator, and necessarily the operator of general case or merging treatment terminate operator, each input
What port obtained is forms data collection data flow.
Merging treatment, which terminates operator, has an input port and two input ports, and input operator must be merging treatment
Start operator or its follow-up operator, what input port obtained is Double Data collection data flow, and output is forms data collection data flow.
The output port of operator in merging treatment all exports two datasets.When it is single input operator, input
Mouth input is Double Data collection data flow, and output port output is also Double Data collection data flow.When it is dual input operator,
One of input port input is Double Data collection data flow, the output of operator as in merging treatment, another input
Port input is forms data collection data flow, the as output of the operator under general case.
It is above-mentioned, filtering, read the operator classification that data source, cluster etc. are operator.
In order to ensure that each input has input to the operator of multi input, avoids calculating, while in order to ensure that needs are set
The each parameter of operator for putting multiple parameters is set, and needs to carry out operator input number judgement after step S23 and before step S24
And operator parameter judges;
Operator input number judges to specifically include following steps:First, the flow of task built according to user, count each
The input quantity of individual operator;Then, it is defeated specified in the input quantity of each operator counted is designed with the operator
Inbound port quantity is compared, if equal, the operator is the currently active;If it is invalid that the operator is judged as;
Operator parameter judges:Quantity judgement is carried out to the parameter of each operator, if the operator has parameter value
For sky situation, then the operator be judged as invalid.
The step of step S24 also includes carrying out between step S242 and step S246 below:
S243, judge operator opj3Whether effectively, if opj3Current is effective, then opj3Validity is equal to opi3It is effective
Property;If opj3Currently to be invalid, then opi3Continue to keep former disarmed state;(this step, due in flow of task, in order to ensure
Operator invalid then lower floor operator in upper strata is also invalid, avoids still calculating lower floor's operator when upper strata operator is without output result,
Realize invalid transmission)
S244, repeat step S242~S243, until i-th3It is all not processed for 0 row on row;
S245, judge operator opi3Whether effectively, if opi3Effectively, then by opi3Add executable task queue and carry out
Step S246;If opi3It is invalid, then it is added without executable task queue intQ and keeps invalid (this executable task queue
IntQ foundation, subsequent execution classification Effective judgement is simplified, delete ring judgement, operator input number judges, operator parameter
Judge and because upper strata is judged as invalid lower floor's operator also invalid operator, avoid still judging when performing classification without
Whether the operator of effect is invalid);
The follow-up operator that carries out performs classification Effective judgement for convenience, also needs to mark operator simultaneously carrying out step S243
opi3Execution classification, mark principle it is as follows;
Classification is performed to be divided into:Commonly, merge and start, merge in end and merging;There is the operator classification to be in operator:Close
And processing starts point operator and merging treatment terminates a point operator;The execution classification that merging treatment starts point operator is opened for merging
Begin, the execution classification that merging treatment terminates point operator terminates for merging, starts a point operator positioned at merging treatment in flow of task
During operator between terminating point operator with merging treatment performs classification to merge, start point positioned at merging treatment in flow of task
It is common that the operator that operator terminates with merging treatment outside point operator, which performs classification,;
Need to carry out operator execution classification Effective judgement after step S247 has been carried out, operator performs classification Effective judgement
Specifically include following steps:
S251, executable task queue intQ heads operator q go out team, obtain its operator classification typeq and perform classification
dtq;
S252, following Effective judgement carried out to operator q according to typeq and dtq:
A, when operator q execution classification starts to merge, then whether the input for judging operator q is operator or another in merging
One merges the output for starting operator, if so, then operator q is invalid;If it is not, then operator q keeps former invalid or effective status;(this step
Suddenly in order to avoid two merging treatment operators are superimposed processing and cause to be difficult to be divided into after operator output result in merging
Four or three output results, avoid that the chaotic situation of result occurs, such a invalid situation is as shown in Figure 3)
B, when operator q execution classification is in merging and operator q is multi input operator, then judge whether operator q is one
Individual input be Double Data collection data flow and another input is forms data collection data flow, if so, then operator q keep it is former invalid or
Effective status;If it is not, then operator q is invalid;(this step is in order to avoid two output results in merging treatment while is delivered to another more
Cause the output result of the multi input operator can not realize the separation that merging treatment terminates in input operator, avoid that result occurs
Chaotic situation, such a invalid situation are as shown in Figure 4)
C, when operator q execution classification is in merging, judge that being output to for operator q writes data source operator, if so, then calculating
Sub- q is invalid;If it is not, then operator q keeps former invalid or effective status;(because the result obtained in merging is all not required to write
The data and writing mode of server are also illegal, because needs avoid in merging the obtained result of operator by writing data source operator
Storage causes data storage confusion phenomena in the server, and such a invalid situation is as shown in Figure 5)
D, for the operator q in addition to a, b and c situation without judging, operator q keeps former invalid or effective status;
S253, the invalid or effective status for checking operator q, if operator q is effective status, it is added into final execution queue
taskQ;If operator q is disarmed state, operator q will be rejected;(this step is then to arrange tri- kinds of situations of a, b and c, and this will not
Cause confusion).
S254, repeat step S241~S242, until can perform task queue intQ as sky, obtain finally performing queue
taskQ。
The final queue taskQ that performs is ordered into effective operator, and invalid rejecting, therefore server is able to carry out effectively
Operator, result be presented to client, while also prompt invalid operator.
Embodiment 2:
A kind of big data support platform is present embodiments provided, including:Use acyclic data analysis queue described in embodiment 1
Forming method user terminal and server.
Finally illustrate, the above embodiments are merely illustrative of the technical solutions of the present invention and it is unrestricted, although with reference to compared with
The present invention is described in detail good embodiment, it will be understood by those within the art that, can be to the skill of the present invention
Art scheme is modified or equivalent substitution, and without departing from the objective and scope of technical solution of the present invention, it all should cover at this
Among the right of invention.
Claims (5)
1. a kind of forming method of acyclic data analysis queue, it is characterised in that comprise the following steps:
S21, user build flow of task by user terminal and fill in each operator parameter that flow of task uses;
S22, server receive each operator parameter that the flow of task for carrying out user terminal and flow of task use;
S23, server establish adjacency matrix M:
S231, flow of task is set as a digraph G, be flow of task in flow of task at operator place at digraph G node
The quantity of middle operator is N, establishes N*N adjacency matrix;
S232, there is directed edge to be handled as follows between another operator every one-to-one operator:An if operator i1To another operator
j1Between have directed edge, then i-th in adjacency matrix M1Row jth1The value of rowIt is assigned to 1;To each operator to another operator
Between be handled as follows without directed edge:An if operator i2To another operator j2Between without directed edge, then in adjacency matrix M
i2Row jth2The value of rowIt is assigned to 0;
S24, ring judge:
S241, find out i-th in M3Numerical value on row is all 0, and finds out i-th3The corresponding operator opi of row3;
S242, operator opi3Corresponding i-th3OK, i-th is found out3It is not 0 train value j on row3, find out j3Corresponding operator opj3;
I-th in S246, deletion M3Row and i-th3Row, repeat step S41~S45, until being all 0 in the absence of numerical value in matrix M
Row;
It is invalid that operator in S247, M corresponding to remaining row is judged as.
2. the forming method of acyclic data analysis queue according to claim 1, it is characterised in that
Need to carry out operator input number judgement after step S23 and before step S24 and operator parameter judges;
Operator input number judges to specifically include following steps:First, the flow of task built according to user, counts each calculation
The input quantity of son;Then, input specified in the input quantity of each operator counted is designed with the operator
Mouth quantity is compared, if equal, the operator is the currently active;If it is invalid that the operator is judged as;
Operator parameter judges:Quantity judgement is carried out to the parameter of each operator, if the operator has parameter value for sky
Situation, then the operator be judged as invalid.
3. the forming method of acyclic data analysis queue according to claim 2, it is characterised in that step S24 also includes
The step of being carried out below between step S242 and step S246:
S243, judge operator opj3Whether effectively, if opj3Current is effective, then opj3Validity is equal to opi3Validity;If
opj3Currently to be invalid, then opi3Continue to keep former disarmed state;S244, repeat step S242~S243, until i-th3Institute on row
Have not processed for 0 row;
S245, judge operator opi3Whether effectively, if opi3Effectively, then by opi3Add executable task queue and carry out step
S246;If opi3It is invalid, then it is added without executable task queue intQ and keeps invalid.
4. the forming method of acyclic data analysis queue according to claim 3, it is characterised in that carrying out step S243
Also need to mark operator opi simultaneously3Execution classification, mark principle it is as follows;
Classification is performed to be divided into:Commonly, merge and start, merge in end and merging;There is the operator classification to be in operator:At merging
Reason starts point operator and merging treatment terminates a point operator;The execution classification that merging treatment starts point operator starts for merging, closes
And processing terminates the execution classification of point operator and terminated for merging, starts point operator with merging positioned at merging treatment in flow of task
Processing terminates during operator between point operator performs classification to merge, in flow of task positioned at merging treatment start point operator and
It is common that the operator that merging treatment terminates outside point operator, which performs classification,;
Need to carry out operator execution classification Effective judgement after step S247 has been carried out, it is specific that operator performs classification Effective judgement
Comprise the following steps:
S251, executable task queue intQ heads operator q go out team, obtain its operator classification typeq and perform classification dtq;
S252, following Effective judgement carried out to operator q according to typeq and dtq:
A, when operator q execution classification starts to merge, then whether the input for judging operator q is operator or another conjunction in merging
And start the output of operator, if so, then operator q is invalid;If it is not, then operator q keeps former invalid or effective status;
B, when operator q execution classification is in merging and operator q is multi input operator, then judge whether operator q is one defeated
Enter be Double Data collection data flow and another input is forms data collection data flow, if so, then operator q keeps former invalid or effective
State;If it is not, then operator q is invalid;
C, when operator q execution classification is in merging, judge that being output to for operator q writes data source operator, if so, then operator q
It is invalid;If it is not, then operator q keeps former invalid or effective status;
D, for the operator q in addition to a, b and c situation without judging, operator q keeps former invalid or effective status;
S253, the invalid or effective status for checking operator q, if operator q is effective status, it is added into final execution queue
taskQ;If operator q is disarmed state, operator q will be rejected;
S254, repeat step S241~S242, until can perform task queue intQ as sky, obtain finally performing queue taskQ.
A kind of 5. big data support platform, it is characterised in that including:Divide using as described in Claims 1-4 is any without loop data
Analyse the user terminal and server of the forming method of queue.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710847663.0A CN107688663B (en) | 2017-09-19 | 2017-09-19 | Method for forming loop-free data analysis queue and big data support platform comprising loop-free data analysis queue |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710847663.0A CN107688663B (en) | 2017-09-19 | 2017-09-19 | Method for forming loop-free data analysis queue and big data support platform comprising loop-free data analysis queue |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107688663A true CN107688663A (en) | 2018-02-13 |
CN107688663B CN107688663B (en) | 2020-06-05 |
Family
ID=61156311
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710847663.0A Active CN107688663B (en) | 2017-09-19 | 2017-09-19 | Method for forming loop-free data analysis queue and big data support platform comprising loop-free data analysis queue |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107688663B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115167352A (en) * | 2022-07-05 | 2022-10-11 | 南方电网科学研究院有限责任公司 | Algebraic ring identification method and device for power simulation secondary control system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080162453A1 (en) * | 2006-12-29 | 2008-07-03 | Microsoft Corporation | Supervised ranking of vertices of a directed graph |
CN102270204A (en) * | 2010-06-02 | 2011-12-07 | 上海佳艾商务信息咨询有限公司 | Method for calculating influence of online bulletin board system users based on matrix decomposition |
CN102420701A (en) * | 2011-11-28 | 2012-04-18 | 北京邮电大学 | Method for extracting internet service flow characteristics |
CN106682343A (en) * | 2016-08-31 | 2017-05-17 | 电子科技大学 | Method for formally verifying adjacent matrixes on basis of diagrams |
-
2017
- 2017-09-19 CN CN201710847663.0A patent/CN107688663B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080162453A1 (en) * | 2006-12-29 | 2008-07-03 | Microsoft Corporation | Supervised ranking of vertices of a directed graph |
CN102270204A (en) * | 2010-06-02 | 2011-12-07 | 上海佳艾商务信息咨询有限公司 | Method for calculating influence of online bulletin board system users based on matrix decomposition |
CN102420701A (en) * | 2011-11-28 | 2012-04-18 | 北京邮电大学 | Method for extracting internet service flow characteristics |
CN106682343A (en) * | 2016-08-31 | 2017-05-17 | 电子科技大学 | Method for formally verifying adjacent matrixes on basis of diagrams |
Non-Patent Citations (1)
Title |
---|
陈侨安等: "基于运行数据分析的spark任务参数优化", 《计算机工程与科学》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115167352A (en) * | 2022-07-05 | 2022-10-11 | 南方电网科学研究院有限责任公司 | Algebraic ring identification method and device for power simulation secondary control system |
Also Published As
Publication number | Publication date |
---|---|
CN107688663B (en) | 2020-06-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108446540B (en) | Program code plagiarism type detection method and system based on source code multi-label graph neural network | |
US10198422B2 (en) | Information-processing equipment based on a spreadsheet | |
CN107590254A (en) | Big data support platform with merging treatment method | |
CN103226743B (en) | Aircraft equipment technology maturity based on TRL assesses information processing method | |
WO2021128679A1 (en) | Data decision-making-based test data generation method and apparatus, and computer device | |
CN104112026B (en) | A kind of short message text sorting technique and system | |
CN108037973A (en) | A kind of data flow modeling interacted with data processing tools and processing system | |
CN105469204A (en) | Reassembling manufacturing enterprise integrated evaluation system based on deeply integrated big data analysis technology | |
Lagerström et al. | Visualizing and measuring enterprise architecture: an exploratory biopharma case | |
CN109522742A (en) | A kind of batch processing method of computer big data | |
CN110011990A (en) | Intranet security threatens intelligent analysis method | |
CN114900346B (en) | Network security testing method and system based on knowledge graph | |
CN116771576A (en) | Comprehensive fault diagnosis method for hydroelectric generating set | |
CN105991517A (en) | Vulnerability discovery method and device | |
CN107766943A (en) | A kind of Knowledge Component automation exchange method under CPS environment | |
CN107688663A (en) | The forming method of acyclic data analysis queue and the big data support platform for including it | |
CN113626285A (en) | Model-based job monitoring method and device, computer equipment and storage medium | |
Zhong et al. | An empirical study of software metrics diversity for cross-project defect prediction | |
Li et al. | Research and application of computer aided design system for product innovation | |
Carbery et al. | A new data analytics framework emphasising pre-processing in learning AI models for complex manufacturing systems | |
US20190387056A1 (en) | Irc-infoid data standardization for use in a plurality of mobile applications | |
CN115713404A (en) | Credit evaluation method for construction industry enterprises | |
Gokyer et al. | Non-functional requirements to architectural concerns: ML and NLP at crossroads | |
CN113468160A (en) | Data management method and device and electronic equipment | |
Wang et al. | Interactive inconsistency fixing in feature modeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |