CN107688663A - The forming method of acyclic data analysis queue and the big data support platform for including it - Google Patents

The forming method of acyclic data analysis queue and the big data support platform for including it Download PDF

Info

Publication number
CN107688663A
CN107688663A CN201710847663.0A CN201710847663A CN107688663A CN 107688663 A CN107688663 A CN 107688663A CN 201710847663 A CN201710847663 A CN 201710847663A CN 107688663 A CN107688663 A CN 107688663A
Authority
CN
China
Prior art keywords
operator
task
flow
invalid
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710847663.0A
Other languages
Chinese (zh)
Other versions
CN107688663B (en
Inventor
高英
成昱霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201710847663.0A priority Critical patent/CN107688663B/en
Publication of CN107688663A publication Critical patent/CN107688663A/en
Application granted granted Critical
Publication of CN107688663B publication Critical patent/CN107688663B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2237Vectors, bitmaps or matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking

Abstract

The invention provides a kind of forming method of acyclic data analysis queue and its big data support platform is included, wherein, the forming method of acyclic data analysis queue comprises the following steps:S21, user build flow of task by user terminal and fill in each operator parameter that flow of task uses;S22, server receive each operator parameter that the flow of task for carrying out user terminal and flow of task use;S23, server establish adjacency matrix M:S231, flow of task is set as a digraph G, be that the quantity of operator is N in flow of task, establishes N*N adjacency matrix in flow of task at operator place at digraph G node;S24, ring judge.The forming method of the acyclic data analysis queue and comprising its big data support platform solve the problems, such as in the prior art user establish flow of task close ring be present and cause carry out bottom call be infinite loop phenomenon occur.

Description

The forming method of acyclic data analysis queue and the big data support platform for including it
Technical field
The present invention relates to Data Analysis Platform, and in particular to a kind of forming method of acyclic data analysis queue and includes it Big data support platform.
Background technology
A kind of B big data flow modeling analysis engines of Publication No. CN 105550268 are disclosed in Chinese patent, this is big Data flow modeling analysis engine includes podium level, task scheduling layer and boundary layer;The podium level completes scheduling of resource, distribution Work;The task scheduling layer includes correction verification module, parsing module, task scheduling modules and algorithm bag;Wherein, the calibration mode Block provides the verifying function whether data analysis flow meets flow scheme design rule, meets the part of verification rule, can enter Parsing module;The parsing module provides is converted to executable data analysis flow by the data analysis flow of boundary layer generation The analytical capabilities of task;The complete data analysis flow that the task scheduling modules generate according to parsing module, described in scheduling Various types of data parser interface in algorithm bag, the analysis process task program that can completely run is formed, and dispatch bottom Resource carries out the execution of data analysis program;The boundary layer:The platform interface of data analysis modelling operability, each number are provided According to the algorithm Bao Jun of analysis on interface with unique mark can towed component exist, user passes through each algorithm groups of interface operation Part, and with oriented line and curve connection, represent data analysis flow direction and step, be combined into complete business datum parser mould Type, background task scheduler module and algorithm bag are run by the startup function at interface, scheduling resource completes the quick analysis of data And processing.Although the big data flow modeling analysis engine can to a certain extent efficiently, rapidly process mass data, should The shortcomings that big data flow modeling analysis engine is present be:
The big data flow modeling analysis engine is to create data analysis flowcharts for user, and then platform automatically generates Operator queue corresponding with data analysis flowcharts, then user's input data to be analyzed, data to be analyzed pass through operator queue In each operator handle one by one, finally give analyze data.But during user creates data analysis flowcharts, pole has can There can be the ring of closure, therefore when treating analyze data and being handled, reach when being calculated in the ring of closure, it may appear that endless loop The situation of calculating, take substantial amounts of CPU and calculate space so that other calculate slow.
The content of the invention
The invention provides a kind of forming method of acyclic data analysis queue and its big data support platform is included, is solved It is asking for infinite loop phenomenon occur that certainly user, which establishes flow of task close ring to be present and cause to carry out bottom to call, in the prior art Topic.
To achieve the above object, present invention employs following technical scheme:
A kind of forming method of acyclic data analysis queue, comprises the following steps:
S21, user build flow of task by user terminal and fill in each operator parameter that flow of task uses;
S22, server receive each operator parameter that the flow of task for carrying out user terminal and flow of task use;
S23, server establish adjacency matrix M:
S231, flow of task is set as a digraph G, be task in flow of task at operator place at digraph G node The quantity of operator is N in flow, establishes N*N adjacency matrix;
S232, there is directed edge to be handled as follows between another operator every one-to-one operator:An if operator i1To another One operator j1Between have directed edge, then i-th in adjacency matrix M1Row jth1The value of rowIt is assigned to 1;To each operator to separately It is handled as follows between one operator without directed edge:An if operator i2To another operator j2Between without directed edge, then in adjacency matrix I-th in M2Row jth2The value of rowIt is assigned to 0;
S24, ring judge:
S241, find out i-th in M3Numerical value on row is all 0, and finds out i-th3The corresponding operator opi of row3
S242, operator opi3Corresponding i-th3OK, i-th is found out3It is not 0 train value j on row3, find out j3Corresponding operator opj3
I-th in S246, deletion M3Row and i-th3Row, repeat step S41~S45 are complete until numerical value is not present in matrix M For 0 row;
It is invalid that operator in S247, M corresponding to remaining row is judged as.
Preferably,
Need to carry out operator input number judgement after step S23 and before step S24 and operator parameter judges;
Operator input number judges to specifically include following steps:First, the flow of task built according to user, count each The input quantity of individual operator;Then, it is defeated specified in the input quantity of each operator counted is designed with the operator Inbound port quantity is compared, if equal, the operator is the currently active;If it is invalid that the operator is judged as;
Operator parameter judges:Quantity judgement is carried out to the parameter of each operator, if the operator has parameter value For sky situation, then the operator be judged as invalid.
Preferably, the step of step S24 also includes carrying out between step S242 and step S246 below:
S243, judge operator opj3Whether effectively, if opj3Current is effective, then opj3Validity is equal to opi3It is effective Property;If opj3Currently to be invalid, then opi3Continue to keep former disarmed state;
S244, repeat step S242~S243, until i-th3It is all not processed for 0 row on row;
S245, judge operator opi3Whether effectively, if opi3Effectively, then by opi3Add executable task queue and carry out Step S246;If opi3It is invalid, then it is added without executable task queue intQ and keeps invalid;
Preferably, carrying out step S243 while also needing to mark operator opi3Execution classification, mark principle it is as follows;
Classification is performed to be divided into:Commonly, merge and start, merge in end and merging;There is the operator classification to be in operator:Close And processing starts point operator and merging treatment terminates a point operator;The execution classification that merging treatment starts point operator is opened for merging Begin, the execution classification that merging treatment terminates point operator terminates for merging, starts a point operator positioned at merging treatment in flow of task During operator between terminating point operator with merging treatment performs classification to merge, start point positioned at merging treatment in flow of task It is common that the operator that operator terminates with merging treatment outside point operator, which performs classification,;
Need to carry out operator execution classification Effective judgement after step S247 has been carried out, operator performs classification Effective judgement Specifically include following steps:
S251, executable task queue intQ heads operator q go out team, obtain its operator classification typeq and perform classification dtq;
S252, following Effective judgement carried out to operator q according to typeq and dtq:
A, when operator q execution classification starts to merge, then whether the input for judging operator q is operator or another in merging One merges the output for starting operator, if so, then operator q is invalid;If it is not, then operator q keeps former invalid or effective status;
B, when operator q execution classification is in merging and operator q is multi input operator, then judge whether operator q is one Individual input be Double Data collection data flow and another input is forms data collection data flow, if so, then operator q keep it is former invalid or Effective status;If it is not, then operator q is invalid;
C, when operator q execution classification is in merging, judge that being output to for operator q writes data source operator, if so, then calculating Sub- q is invalid;If it is not, then operator q keeps former invalid or effective status;
D, for the operator q in addition to a, b and c situation without judging, operator q keeps former invalid or effective status;
S253, the invalid or effective status for checking operator q, if operator q is effective status, it is added into final execution queue taskQ;If operator q is disarmed state, operator q will be rejected;
S254, repeat step S241~S242, until can perform task queue intQ as sky, obtain finally performing queue taskQ。
The present invention also provides a kind of big data support platform, including:Use the formation side of above-mentioned acyclic data analysis queue The user terminal and server of method.
Compared to prior art, the present invention has the advantages that:
By setting steps S23 and S24, realize solve user in the prior art and establish flow of task close ring to be present and lead It is the problem of infinite loop phenomenon occur to cause to carry out bottom to call, and avoids endless loop, offloading the CPU occur when calculating so that CPU has more spaces to calculate correct flow of task, avoids the occurrence of unnecessary calculating.
Brief description of the drawings
Fig. 1 is the example 1 for being judged as no close ring in embodiment 1 in step S24;
Fig. 2 is the example 2 for being judged as no close ring in embodiment 1 in step S24;
Fig. 3 is the invalid example 3 of Judge Operator in the case of a in step S252 in embodiment 1;
Fig. 4 is the invalid example 4 of Judge Operator in the case of b in step S252 in embodiment 1;
Fig. 5 is the invalid example 5 of Judge Operator in the case of c in step S252 in embodiment 1.
Embodiment
Embodiment 1:
A kind of forming method of acyclic data analysis queue, comprises the following steps:
S21, user build flow of task by user terminal and fill in each operator parameter that flow of task uses;
S22, server receive each operator parameter that the flow of task for carrying out user terminal and flow of task use;
S23, server establish adjacency matrix M:
S231, flow of task is set as a digraph G, be task in flow of task at operator place at digraph G node The quantity of operator is N in flow, establishes N*N adjacency matrix;
S232, there is directed edge to be handled as follows between another operator every one-to-one operator:An if operator i1To another One operator j1Between have directed edge, then i-th in adjacency matrix M1Row jth1The value of rowIt is assigned to 1;To each operator to separately It is handled as follows between one operator without directed edge:An if operator i2To another operator j2Between without directed edge, then in adjacency matrix I-th in M2Row jth2The value of rowIt is assigned to 0;
S24, ring judge:
S241, find out i-th in M3Numerical value on row is all 0, and finds out i-th3The corresponding operator opi of row3
S242, operator opi3Corresponding i-th3OK, i-th is found out3It is not 0 train value j on row3, find out j3Corresponding operator opj3
I-th in S246, deletion M3Row and i-th3Row, repeat step S41~S45 are complete until numerical value is not present in matrix M For 0 row;
It is invalid that operator in S247, M corresponding to remaining row is judged as.
If the flow of task that user establishes is as shown in Figure 1, then 3*3 adjacency matrix M should be established, due to calculating Son 11 has directed edge to operator 12, then M12=1;Because operator 12 to operator 13 has directed edge, then M23=1;And remaining is 0.Finally give the adjacency matrix of example 1When carrying out step S24:
When carrying out step S241 for the first time, what step S241 was found arranges for the 1st, and operator corresponding to the 1st row is exactly in figure Filter operator, filter operator correspond to the 1st row, and numerical value is not 2 for 0 train value in the 1st row, and train value is the conversion of 2 pairs of reply features Operator, if filter operator is effective, then the transformation operator to feature is effective, and pre-operator is effective, its follow-up operator Effectively, the 1st row and the 1st row are just deleted, is obtainedNow the 1st the transformation operator to tackling feature is arranged, the 2nd row are corresponding Clustering operator, for the 1st row to tackling the transformation operator of feature, the 2nd row corresponds to clustering operator;
During second of progress step S241, what step S241 was found arranges for the 1st, and operator corresponding to the 1st row is exactly in figure To the transformation operator of feature, to corresponding 1st row of transformation operator of feature, numerical value is not 2 for 0 train value in the 1st row, train value 2 Corresponding clustering operator, filter operator is effective, then the transformation operator to feature is effective, and pre-operator is effective, and its is follow-up Operator is effective, just deletes the 1st row and the 1st row, obtains M=(0), now the corresponding clustering operator of the 1st row, and the corresponding cluster of the 1st row is calculated Son;
When third time carries out step S241, the 1st row and the 1st row are now deleted, obtains M without residue, now just explanation should Flow of task is no close ring, meets regulation, is not in the phenomenon that endless loop calculates;
If the flow of task that user establishes is as shown in Figure 2, then 3*3 adjacency matrix M should be established, due to calculating Son 11 has directed edge to operator 12, then M12=1;Because operator 12 to operator 13 has directed edge, then M23=1;Due to operator 13 have directed edge to operator 11, then M31=1 and remaining be 0.Finally give the adjacency matrix of example 1Carry out During step S24:
When carrying out step S241, step S241 can not find the row that train value is all 0, and it is invalid that whole operators are set to, filter operator, Transformation operator, clustering operator to feature just constitute the ring of a closure, can cause endless loop.
Follow-up for convenience to understand, our the calculating bags to server memory storage illustrate here, calculate bag memory storage There is substantial amounts of operator, the operator calculated for data analysis in bag is roughly divided into following 5 major class from operator classification:
1st class:Data source is read and write
Data source read-write class includes reading data source and writes two operators of data source, can have multiple readings in a working space Data source and the operator for writing data source.
Reading data source operator is the starting point of flow of task, it is necessary to obtain the row of all data files composition of currently used person Table, a file is therefrom selected to be passed to as data source in flow.The operator of data source is read in the actual not realization of bottom, The function of the operator is Transfer Parameters, it is only necessary to the path of the file of selection is passed into its follow-up operator, that is, completes reading According to the task in source.
The terminal that data source operator is flow of task is write, the intermediate result for acting as being inputted saves as user's setting The final result of filename, there is the specific implementation for writing data source operator in bottom.
2nd class:Data prediction
Data prediction class includes multiple operators for data prediction, including to string processing, to data set Processing, processing to table etc..
Processing to character string includes cutting, combines, takes substring etc. to operate.In order to ensure the fine granularity of operator, the present invention A single string processing operator is provided, a field in DataFrame can only be handled.It is right if necessary What multiple character strings were handled, it can be realized with the mode that multistep operator is superimposed.Each string processing operator can basis Concrete condition generates multiple new fields, and such as character string is carried out to take substring to operate, and the design allows to take a row character string more Individual substring, generate the operation of multiple new fields.
Although it is exactly the i.e. form of table in the form of DataFrame that data set, which enters after operator, internal memory is recorded in In, but the processing to data set and table in explanation is distinguish between, the processing to table be related to specific field processing and SQL pairs The operation of table, and the processing to data set represents and only data set is handled in itself, is not related to specific field.
Processing to data set mainly for data set scale, including proportional sampling, in proportion cutting data set, close And data set etc..
Processing to table includes connection, duplicate removal, filtering, field deletion, modification field type, the mathematical formulae meter to field Calculate etc..Processing to table has benefited from Spark SQL presence and the optimization enhancings to DataFrame afterwards of Spark 2.0, because Spark provides SQL interface, supports substantial amounts of SQL core operations, enters so the processing to table can be all based on SQL OK.All it is the technical ability that must be grasped and SQL is for most of data analyst, so being carried out with SQL to table Processing can be well adapted for the custom of data analyst, can also complete data processing task well.
3rd class:Feature Engineering
Feature Engineering class operator is based primarily upon the feature bags in Spark MLlib, is related to processing to text, to spy The conversion of sign, to coding of feature etc..
For Feature Engineering, all processing to feature are all the conversion of feature, but in the present note, to itself plus To distinguish.In the present invention, the conversion of feature refers to the conversion of the feature of same type, and the coded representation of feature is by other kinds of spy Levy the feature transformation of encoded formation value type feature.
Processing to text is related to natural language processing NLP, is mainly used in obtaining the key message of text, including divides Word, stop words deletion, NGram, TF-IDF etc..
Conversion to feature mainly includes principal component analysis PCA, Polynomial Expansion, all kinds of scalings, binaryzation etc..To feature Conversion be mainly used in the standardization of feature and ensure the independence of feature, the accuracy for the model trained afterwards can be improved.
One-hot coding, string encoding, vector coding etc. are mainly included to the coding of feature.Because most model will Feature is asked to be indicated with numerical value, so needing just to go to represent with numerical value by various types of features in Feature Engineering part.
4th class:Model
Model class operator mainly includes the model of the type such as classification, recurrence, cluster, collaborative filtering.Because Spark is one Distributive parallel computation framework, and most models are not adapted to distributed environment at the beginning of its design, so Spark is used Model algorithm modified and that be adapted to distributed environment, but compared in effect with unit model algorithm, specifically asking There are some slight differences in topic.
Disaggregated model mainly includes random forest classification, GBDT classification, logistic regression etc.;Regression model mainly includes Random forest recurrence, GBDT recurrence, linear regression etc.;Clustering Model mainly includes Kmeans clusters etc.;Collaborative filtering mainly wraps Include alternating least-squares ALS etc..
Core of the model as machine learning, for Data Analysis Platform, and the component of core.The present invention In provide abundant model, can meet the needs of analysis personnel to a certain extent.
5th class:Merging treatment
In an analysis task, an identical operation may be carried out to different data sets, such as to two data The field of the identical name of collection carries out identical mathematic(al) representation calculating.Due to there is the presence of such case, it can be needed The part for carrying out same operation merges processing.Because each operator is as the Spark tasks of a simple function, really need The operational ton to be carried out is in fact little, time-consuming also not grow, so by contrast, startup program simultaneously initializes the time of context just Long, the proportion for accounting for total duration is larger.Therefore, using merging treatment, the scale of task can both be reduced, again can be by originally The task start of two operators and initialization context time shorten to original half or so.
In machine learning, the demand for handling multiple data sets be present, it is general that there are a training dataset, a needs Classified or the data set of operation such as cluster, it is also possible to have a test set.These data sets are needed to carry out Feature Engineering Operation, and the operation of some Feature Engineerings is to need while multiple data sets are carried out, its transformation relation needs are applied to more Individual data are concentrated, and can produce large effect to data result afterwards.With most common string encoding in training set and Exemplified by being operated on test set, it is ranked up according to the number that character string in training set occurs, numeric coding is then carried out, in order to protect Result after card coding is same to test set effectively, it is necessary to which the mapping relations encoded according to this step convert to test set, such as Fruit is cooked string encoding to training set and test set respectively, and the occurrence number of the identical characters string in two datasets is likely to not Together, it is also different to may result in its coding result, finally results in model error.It is of the invention in order to prevent the appearance of these situations Merging treatment is abstracted as in design, in merging process, other are counted with the Feature Engineering information handled training set Operated according to collection, ensure the uniformity of coding result.
In order to meet the definition of the input port number of other classification operators, during merging treatment, two datasets will be by Operated as a data set, form Double Data collection data flow, other classes can be so used during merging treatment Other operator, without carrying out the realization of two kinds of input quantity to identical operator, but can be according to each operator institute in bottom The position at place, is judged, makes correct processing.
Merge the design of class operator, both in the case where not producing mistake, the reasonably optimizing use of resource, and can enough solve Certainly in machine learning Feature Engineering information transfer problem, improve overall efficiency, also solve the problems, such as potential.
Because there is the presence of merging treatment process, and operator does not carry out two sets of exploitations, so needing to judge Spark operators Execution classification, tell operator to need the data set number that is operated.
The location of according to operator in task flow, and the classification of operator in itself, operator is roughly divided into four and held Row classification:Commonly, merging treatment starts, merging treatment terminates, in merging treatment.The execution classification of operator can post-operator for it Row control provide part control foundation.
It is operator in merging treatment operator terminates operator to merging treatment since merging treatment, other are calculated to be common Son.
Under general case, the input of each input port of operator for forms data collection data flow.It inputs operator Operator or merging treatment under general case terminate operator.
It is to merge two datasets that merging treatment, which starts operator with two input ports and an output port, effect, Into a Data Stream Processing.It inputs operator, and necessarily the operator of general case or merging treatment terminate operator, each input What port obtained is forms data collection data flow.
Merging treatment, which terminates operator, has an input port and two input ports, and input operator must be merging treatment Start operator or its follow-up operator, what input port obtained is Double Data collection data flow, and output is forms data collection data flow.
The output port of operator in merging treatment all exports two datasets.When it is single input operator, input Mouth input is Double Data collection data flow, and output port output is also Double Data collection data flow.When it is dual input operator, One of input port input is Double Data collection data flow, the output of operator as in merging treatment, another input Port input is forms data collection data flow, the as output of the operator under general case.
It is above-mentioned, filtering, read the operator classification that data source, cluster etc. are operator.
In order to ensure that each input has input to the operator of multi input, avoids calculating, while in order to ensure that needs are set The each parameter of operator for putting multiple parameters is set, and needs to carry out operator input number judgement after step S23 and before step S24 And operator parameter judges;
Operator input number judges to specifically include following steps:First, the flow of task built according to user, count each The input quantity of individual operator;Then, it is defeated specified in the input quantity of each operator counted is designed with the operator Inbound port quantity is compared, if equal, the operator is the currently active;If it is invalid that the operator is judged as;
Operator parameter judges:Quantity judgement is carried out to the parameter of each operator, if the operator has parameter value For sky situation, then the operator be judged as invalid.
The step of step S24 also includes carrying out between step S242 and step S246 below:
S243, judge operator opj3Whether effectively, if opj3Current is effective, then opj3Validity is equal to opi3It is effective Property;If opj3Currently to be invalid, then opi3Continue to keep former disarmed state;(this step, due in flow of task, in order to ensure Operator invalid then lower floor operator in upper strata is also invalid, avoids still calculating lower floor's operator when upper strata operator is without output result, Realize invalid transmission)
S244, repeat step S242~S243, until i-th3It is all not processed for 0 row on row;
S245, judge operator opi3Whether effectively, if opi3Effectively, then by opi3Add executable task queue and carry out Step S246;If opi3It is invalid, then it is added without executable task queue intQ and keeps invalid (this executable task queue IntQ foundation, subsequent execution classification Effective judgement is simplified, delete ring judgement, operator input number judges, operator parameter Judge and because upper strata is judged as invalid lower floor's operator also invalid operator, avoid still judging when performing classification without Whether the operator of effect is invalid);
The follow-up operator that carries out performs classification Effective judgement for convenience, also needs to mark operator simultaneously carrying out step S243 opi3Execution classification, mark principle it is as follows;
Classification is performed to be divided into:Commonly, merge and start, merge in end and merging;There is the operator classification to be in operator:Close And processing starts point operator and merging treatment terminates a point operator;The execution classification that merging treatment starts point operator is opened for merging Begin, the execution classification that merging treatment terminates point operator terminates for merging, starts a point operator positioned at merging treatment in flow of task During operator between terminating point operator with merging treatment performs classification to merge, start point positioned at merging treatment in flow of task It is common that the operator that operator terminates with merging treatment outside point operator, which performs classification,;
Need to carry out operator execution classification Effective judgement after step S247 has been carried out, operator performs classification Effective judgement Specifically include following steps:
S251, executable task queue intQ heads operator q go out team, obtain its operator classification typeq and perform classification dtq;
S252, following Effective judgement carried out to operator q according to typeq and dtq:
A, when operator q execution classification starts to merge, then whether the input for judging operator q is operator or another in merging One merges the output for starting operator, if so, then operator q is invalid;If it is not, then operator q keeps former invalid or effective status;(this step Suddenly in order to avoid two merging treatment operators are superimposed processing and cause to be difficult to be divided into after operator output result in merging Four or three output results, avoid that the chaotic situation of result occurs, such a invalid situation is as shown in Figure 3)
B, when operator q execution classification is in merging and operator q is multi input operator, then judge whether operator q is one Individual input be Double Data collection data flow and another input is forms data collection data flow, if so, then operator q keep it is former invalid or Effective status;If it is not, then operator q is invalid;(this step is in order to avoid two output results in merging treatment while is delivered to another more Cause the output result of the multi input operator can not realize the separation that merging treatment terminates in input operator, avoid that result occurs Chaotic situation, such a invalid situation are as shown in Figure 4)
C, when operator q execution classification is in merging, judge that being output to for operator q writes data source operator, if so, then calculating Sub- q is invalid;If it is not, then operator q keeps former invalid or effective status;(because the result obtained in merging is all not required to write The data and writing mode of server are also illegal, because needs avoid in merging the obtained result of operator by writing data source operator Storage causes data storage confusion phenomena in the server, and such a invalid situation is as shown in Figure 5)
D, for the operator q in addition to a, b and c situation without judging, operator q keeps former invalid or effective status;
S253, the invalid or effective status for checking operator q, if operator q is effective status, it is added into final execution queue taskQ;If operator q is disarmed state, operator q will be rejected;(this step is then to arrange tri- kinds of situations of a, b and c, and this will not Cause confusion).
S254, repeat step S241~S242, until can perform task queue intQ as sky, obtain finally performing queue taskQ。
The final queue taskQ that performs is ordered into effective operator, and invalid rejecting, therefore server is able to carry out effectively Operator, result be presented to client, while also prompt invalid operator.
Embodiment 2:
A kind of big data support platform is present embodiments provided, including:Use acyclic data analysis queue described in embodiment 1 Forming method user terminal and server.
Finally illustrate, the above embodiments are merely illustrative of the technical solutions of the present invention and it is unrestricted, although with reference to compared with The present invention is described in detail good embodiment, it will be understood by those within the art that, can be to the skill of the present invention Art scheme is modified or equivalent substitution, and without departing from the objective and scope of technical solution of the present invention, it all should cover at this Among the right of invention.

Claims (5)

1. a kind of forming method of acyclic data analysis queue, it is characterised in that comprise the following steps:
S21, user build flow of task by user terminal and fill in each operator parameter that flow of task uses;
S22, server receive each operator parameter that the flow of task for carrying out user terminal and flow of task use;
S23, server establish adjacency matrix M:
S231, flow of task is set as a digraph G, be flow of task in flow of task at operator place at digraph G node The quantity of middle operator is N, establishes N*N adjacency matrix;
S232, there is directed edge to be handled as follows between another operator every one-to-one operator:An if operator i1To another operator j1Between have directed edge, then i-th in adjacency matrix M1Row jth1The value of rowIt is assigned to 1;To each operator to another operator Between be handled as follows without directed edge:An if operator i2To another operator j2Between without directed edge, then in adjacency matrix M i2Row jth2The value of rowIt is assigned to 0;
S24, ring judge:
S241, find out i-th in M3Numerical value on row is all 0, and finds out i-th3The corresponding operator opi of row3
S242, operator opi3Corresponding i-th3OK, i-th is found out3It is not 0 train value j on row3, find out j3Corresponding operator opj3
I-th in S246, deletion M3Row and i-th3Row, repeat step S41~S45, until being all 0 in the absence of numerical value in matrix M Row;
It is invalid that operator in S247, M corresponding to remaining row is judged as.
2. the forming method of acyclic data analysis queue according to claim 1, it is characterised in that
Need to carry out operator input number judgement after step S23 and before step S24 and operator parameter judges;
Operator input number judges to specifically include following steps:First, the flow of task built according to user, counts each calculation The input quantity of son;Then, input specified in the input quantity of each operator counted is designed with the operator Mouth quantity is compared, if equal, the operator is the currently active;If it is invalid that the operator is judged as;
Operator parameter judges:Quantity judgement is carried out to the parameter of each operator, if the operator has parameter value for sky Situation, then the operator be judged as invalid.
3. the forming method of acyclic data analysis queue according to claim 2, it is characterised in that step S24 also includes The step of being carried out below between step S242 and step S246:
S243, judge operator opj3Whether effectively, if opj3Current is effective, then opj3Validity is equal to opi3Validity;If opj3Currently to be invalid, then opi3Continue to keep former disarmed state;S244, repeat step S242~S243, until i-th3Institute on row Have not processed for 0 row;
S245, judge operator opi3Whether effectively, if opi3Effectively, then by opi3Add executable task queue and carry out step S246;If opi3It is invalid, then it is added without executable task queue intQ and keeps invalid.
4. the forming method of acyclic data analysis queue according to claim 3, it is characterised in that carrying out step S243 Also need to mark operator opi simultaneously3Execution classification, mark principle it is as follows;
Classification is performed to be divided into:Commonly, merge and start, merge in end and merging;There is the operator classification to be in operator:At merging Reason starts point operator and merging treatment terminates a point operator;The execution classification that merging treatment starts point operator starts for merging, closes And processing terminates the execution classification of point operator and terminated for merging, starts point operator with merging positioned at merging treatment in flow of task Processing terminates during operator between point operator performs classification to merge, in flow of task positioned at merging treatment start point operator and It is common that the operator that merging treatment terminates outside point operator, which performs classification,;
Need to carry out operator execution classification Effective judgement after step S247 has been carried out, it is specific that operator performs classification Effective judgement Comprise the following steps:
S251, executable task queue intQ heads operator q go out team, obtain its operator classification typeq and perform classification dtq;
S252, following Effective judgement carried out to operator q according to typeq and dtq:
A, when operator q execution classification starts to merge, then whether the input for judging operator q is operator or another conjunction in merging And start the output of operator, if so, then operator q is invalid;If it is not, then operator q keeps former invalid or effective status;
B, when operator q execution classification is in merging and operator q is multi input operator, then judge whether operator q is one defeated Enter be Double Data collection data flow and another input is forms data collection data flow, if so, then operator q keeps former invalid or effective State;If it is not, then operator q is invalid;
C, when operator q execution classification is in merging, judge that being output to for operator q writes data source operator, if so, then operator q It is invalid;If it is not, then operator q keeps former invalid or effective status;
D, for the operator q in addition to a, b and c situation without judging, operator q keeps former invalid or effective status;
S253, the invalid or effective status for checking operator q, if operator q is effective status, it is added into final execution queue taskQ;If operator q is disarmed state, operator q will be rejected;
S254, repeat step S241~S242, until can perform task queue intQ as sky, obtain finally performing queue taskQ.
A kind of 5. big data support platform, it is characterised in that including:Divide using as described in Claims 1-4 is any without loop data Analyse the user terminal and server of the forming method of queue.
CN201710847663.0A 2017-09-19 2017-09-19 Method for forming loop-free data analysis queue and big data support platform comprising loop-free data analysis queue Active CN107688663B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710847663.0A CN107688663B (en) 2017-09-19 2017-09-19 Method for forming loop-free data analysis queue and big data support platform comprising loop-free data analysis queue

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710847663.0A CN107688663B (en) 2017-09-19 2017-09-19 Method for forming loop-free data analysis queue and big data support platform comprising loop-free data analysis queue

Publications (2)

Publication Number Publication Date
CN107688663A true CN107688663A (en) 2018-02-13
CN107688663B CN107688663B (en) 2020-06-05

Family

ID=61156311

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710847663.0A Active CN107688663B (en) 2017-09-19 2017-09-19 Method for forming loop-free data analysis queue and big data support platform comprising loop-free data analysis queue

Country Status (1)

Country Link
CN (1) CN107688663B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115167352A (en) * 2022-07-05 2022-10-11 南方电网科学研究院有限责任公司 Algebraic ring identification method and device for power simulation secondary control system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162453A1 (en) * 2006-12-29 2008-07-03 Microsoft Corporation Supervised ranking of vertices of a directed graph
CN102270204A (en) * 2010-06-02 2011-12-07 上海佳艾商务信息咨询有限公司 Method for calculating influence of online bulletin board system users based on matrix decomposition
CN102420701A (en) * 2011-11-28 2012-04-18 北京邮电大学 Method for extracting internet service flow characteristics
CN106682343A (en) * 2016-08-31 2017-05-17 电子科技大学 Method for formally verifying adjacent matrixes on basis of diagrams

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162453A1 (en) * 2006-12-29 2008-07-03 Microsoft Corporation Supervised ranking of vertices of a directed graph
CN102270204A (en) * 2010-06-02 2011-12-07 上海佳艾商务信息咨询有限公司 Method for calculating influence of online bulletin board system users based on matrix decomposition
CN102420701A (en) * 2011-11-28 2012-04-18 北京邮电大学 Method for extracting internet service flow characteristics
CN106682343A (en) * 2016-08-31 2017-05-17 电子科技大学 Method for formally verifying adjacent matrixes on basis of diagrams

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈侨安等: "基于运行数据分析的spark任务参数优化", 《计算机工程与科学》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115167352A (en) * 2022-07-05 2022-10-11 南方电网科学研究院有限责任公司 Algebraic ring identification method and device for power simulation secondary control system

Also Published As

Publication number Publication date
CN107688663B (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN108446540B (en) Program code plagiarism type detection method and system based on source code multi-label graph neural network
US10198422B2 (en) Information-processing equipment based on a spreadsheet
CN107590254A (en) Big data support platform with merging treatment method
CN103226743B (en) Aircraft equipment technology maturity based on TRL assesses information processing method
WO2021128679A1 (en) Data decision-making-based test data generation method and apparatus, and computer device
CN104112026B (en) A kind of short message text sorting technique and system
CN108037973A (en) A kind of data flow modeling interacted with data processing tools and processing system
CN105469204A (en) Reassembling manufacturing enterprise integrated evaluation system based on deeply integrated big data analysis technology
Lagerström et al. Visualizing and measuring enterprise architecture: an exploratory biopharma case
CN109522742A (en) A kind of batch processing method of computer big data
CN110011990A (en) Intranet security threatens intelligent analysis method
CN114900346B (en) Network security testing method and system based on knowledge graph
CN116771576A (en) Comprehensive fault diagnosis method for hydroelectric generating set
CN105991517A (en) Vulnerability discovery method and device
CN107766943A (en) A kind of Knowledge Component automation exchange method under CPS environment
CN107688663A (en) The forming method of acyclic data analysis queue and the big data support platform for including it
CN113626285A (en) Model-based job monitoring method and device, computer equipment and storage medium
Zhong et al. An empirical study of software metrics diversity for cross-project defect prediction
Li et al. Research and application of computer aided design system for product innovation
Carbery et al. A new data analytics framework emphasising pre-processing in learning AI models for complex manufacturing systems
US20190387056A1 (en) Irc-infoid data standardization for use in a plurality of mobile applications
CN115713404A (en) Credit evaluation method for construction industry enterprises
Gokyer et al. Non-functional requirements to architectural concerns: ML and NLP at crossroads
CN113468160A (en) Data management method and device and electronic equipment
Wang et al. Interactive inconsistency fixing in feature modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant