CN108182147A

CN108182147A - BPEL process data stream error prediction methods based on complexity measure

Info

Publication number: CN108182147A
Application number: CN201711452935.3A
Authority: CN
Inventors: 宋巍; 张成震; 常震
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2017-12-28
Filing date: 2017-12-28
Publication date: 2018-06-19

Abstract

The invention discloses a kind of BPEL process data stream error prediction methods based on complexity measure, the BPEL procedure files in data set are parsed first, are calculated and are detected respectively and obtain the various complexity measures of each BPEL file and refer to target value and various traffic errors；Then, some complexity measure indexs are removed, so as to pick out candidate feature；Finally candidate feature is carried out to enumerate combination, and as input feature vector, with WEKA data mining softwares, using it includes common sorting algorithm classification prediction is carried out to traffic error, according to the classification prediction accuracy of gained as a result, determining the feature of final choice.The present invention can effectively classify prediction BPEL processes with the presence or absence of traffic error, and the strong directive function of offer is modeled and designed to related work person, deepens the understanding to Services Composition traffic error.

Description

BPEL process data stream error prediction methods based on complexity measure

Technical field

The invention belongs to field of service calculation, and in particular to a kind of BPEL process data stream mistakes based on complexity measure Forecasting Methodology

Background technology

Constantly bringing forth new ideas, improve and popularizing along with computer and information technology, software has penetrated into people’s lives, And become the indispensable part of our daily lifes.And cause the network rings residing for computer software due to Internet Border becomes more sophisticated and changeable, moves towards open by original closing, dynamic is turned to by static state and by controllably becoming more to be difficult to Control.Under this background, some advanced technologies and thought generate therewith, and (Software as are serviced including software A service, abbreviation SaaS) and Services Oriented Achitecture (Service Oriented Architecture, referred to as SOA).SaaS and SOA there are one it is common the characteristics of be exactly all around service expansion, and based on this and core.In order to adapt to The fast development of the cutting edge technologies such as service calculating and cloud computing, different types of resource are packaged into the Web service that can be called. Before Web service technology (Web services) results from SOA, the development for SOA provides important reference function.

Single Web service function is not only simple in structure and function is very limited, can not meet answering for existing business demand Hydridization, therefore Web service combination technology generates therewith.Web service combination is according to business demand, by these existing atoms Web service obtains a new service by reconfiguring, it is achieved thereby that the re-using and enhancing service of service are potential Practical value.Mainstream Services Composition technology utilizes Web service Process Execution Language (Web Services both at home and abroad at present Business Process Execution Language, WS-BPEL abbreviation BPEL) come what is realized.Due to Web service quantity It is increasingly huge, the business procedure based on Web service has become the important method and approach of large-scale application development. This direction and development trend are conformed to, therefore BPEL becomes large scale programming language of the structure based on Web service business procedure, And it has developed into describe the standard that the actual industry of Web service is unanimously approved.BPEL is although have been developed more than ten Year, but the part BPEL mass that BPEL worker writes often results in the problem of some are improper still less than our expectation With mistake practice, so as to cause huge economic loss.During BPEL, we this repeat not all When the error pattern that design generates, referred to as anti-pattern (anti-patterns).

Although anti-pattern can there are many different forms or type, of greatest concern at present in business procedure , most basic two aspects be control stream anti-pattern and data flow anti-pattern.Since BPEL is based on general XML format volume The business procedure language write, and BPEL processes have the characteristics that excellent block structure, therefore the control stream phase during BPEL The undesirable mistake closed is less susceptible to occur.In contrast, during BPEL data flow it is relevant mistake relatively more It is easier to generate, main cause has at following 2 points：First, BPEL processes support concurrently performs, concurrent difference point Branch activity can express synchronous rely on by condition (data are relevant) link；Secondly, it is because of the part number during BPEL Usually come from dynamic and changeable external web service according to variable.Three kinds of common data flow anti-patterns difference during BPEL It is input missing (input missing), output redundancy (output redundancy) and Out-Lost (output lost).

Existing method is mostly using the technologies such as model testing or rule match, this kind of method detection data stream mistake often face Face State-explosion problem, and lack the positive research characteristic to traffic error, it is impossible to embody traffic error Which occur actually closely bound up with mistake.

Invention content

The purpose of the present invention is to provide a kind of BPEL process data stream error prediction methods based on complexity measure.

Realize the object of the invention technical solution be：A kind of BPEL process data stream mistakes based on complexity measure Forecasting Methodology for prediction BPEL processes of classifying with the presence or absence of traffic error, is answered with parse that BPEL process data collection obtains The data of polygamy Measure Indexes and traffic error are input, are referred to the complexity measure of classify prediction accuracy and final choice Output is designated as a result, being characterized as complexity measure index, for classified variable whether to have traffic error, the Forecasting Methodology is specific Step is：

Step 1, the BPEL processes in data set are parsed, are calculated each suitable for the complexity measure index of BPEL processes It is worth and detects traffic error included in the process；

Step 2, by complexity measure index and whether there is both traffic errors to analyze, filters out candidate spy Sign；

Step 3, candidate feature is carried out enumerating combination, and as input feature vector, and whether classified variable is has number According to stream mistake, classification prediction is carried out to traffic error with sorting algorithm in WEKA data mining softwares, according to point of gained Class prediction accuracy is as a result, determine the feature of final choice.

Compared with prior art, the present invention its remarkable advantage is：(1) present invention is a kind of according to complexity measure index Classification Forecasting Methodology of the characteristic as the BPEL Services Composition traffic errors of characteristic of division, this method are not only able to classification prediction Whether one BPEL process has traffic error, and can analyze traffic error and which complexity measure index and have Close, the generation of traffic error all with which structure or data flow characteristic；(2) compared to conventional method, institute's extracting method of the present invention More Services Composition the characteristics of, there is no relevant issues such as path explosions, and good accuracy can be obtained, so as to effectively Ground whether there is traffic error during predicting a BPEL.

Description of the drawings

Fig. 1 is the BPEL process data stream error prediction method flow diagrams the present invention is based on complexity measure.

Fig. 2 is the flow chart of present invention screening candidate feature.

Specific embodiment

With reference to Fig. 1, a kind of BPEL process data stream error prediction methods based on complexity measure are predicted for classifying BPEL processes whether there is traffic error, to parse complexity measure index and the data flow mistake that BPEL process data collection obtains Data accidentally are input, using the complexity measure index of classify prediction accuracy and final choice to export as a result, wherein feature For complexity measure index, classified variable is whether to have traffic error, the specific steps are：

Step 1, the BPEL processes in data set are parsed, calculates and each refers to suitable for the common complexity measure of BPEL processes Target value simultaneously detects traffic error included in the process；Specially：

Step 1.1, due to the language that BPEL is write based on XML, there is natural block structure characteristic, therefore will counts on Inapplicable Measure Indexes in complexity measure index remove；

Step 1.2, it calculates remaining complexity measure index value and is detected with existing traffic error detection method Obtain traffic error.

Step 2, by complexity measure index and whether there is both traffic errors to analyze, filters out candidate spy Sign；Specifically include following steps：

Step 2.1, correlation analysis is carried out.All complexity measure indexs and traffic error are subjected to correlation point Analysis, correlation analysis carry out correlation analysis using Spearman correlation calculations methods using statistical and analytical tool SPSS It calculates；

Step 2.2, it is low middling to be removed from all complexity measure indexs with traffic error strength of correlation Complexity measure index；

Step 2.3, remaining complexity measure index is grouped according to the classification and type of Measure Indexes；First press According to control stream and data flow point major class, according still further to the type packet of complexity measure index；

Step 2.4, the principle based on complexity profile in every group Yu traffic error maximum correlation, is selected from every group One index of correlation maximum is added in candidate metrics index set, finally obtains the set of candidate metrics index.

Step 3, candidate feature is carried out enumerating combination, and as input feature vector, and whether classified variable is has number It is wrong (two classified variables) according to stream, classification prediction is carried out to traffic error with sorting algorithm in WEKA data mining softwares, According to the classification prediction accuracy of gained as a result, determining the feature of final choice.

Further, wherein removal low middling strength of correlation refers to remove complexity of the strength of correlation below 0.5 Measure Indexes.

The present invention realizes that the classification to three kinds of common data stream mistakes of BPEL processes is predicted, it is known which complexity measure refers to Mark has vital influence to traffic error.

The following describes the present invention in detail with reference to examples.

Embodiment

In BPEL process data stream mistakes, most common traffic error is that input missing, output redundancy and output are lost It loses.Before three traffic errors are introduced, first illustrate the concept of trace.Trace is one from beginning node and termination node A execution route, i.e., it is the sequence that an activity performs.In a trace, there are input missing errors to work as an activity It not to be defined and if only if there are an input variables；In other words, exactly this is become without activity before activity variable execution Amount is as output variable.In a trace, an activity has output redundancy error and if only if there are an output variables Do not use；In other words, it is exactly using the variable as input variable after the activity variable performs without activity.At one In trace, there is Out-Lost mistake not to be in time in use, in addition and if only if there are an output variables for activity One activity has redefined the variable；In other words, do not have between two activities exactly it is other activity using the variable as Input variable.

The present invention is exactly the specific implementation of classification prediction Services Composition traffic error to being proposed.With reference to The present invention will be further described for attached drawing.

The present invention is based on the BPEL process data stream error prediction methods of complexity measure, overall flow is as shown in Figure 1. First, the complexity measure being calculated suitable for BPEL processes refers to target value and traffic error；Then, complicated degree of analysis degree The relationship of figureofmerit and traffic error, filters out candidate feature；Finally, it is carried out enumerating combination according to candidate feature, thus into Row classification prediction determines the complexity measure index of final choice according to the result of accuracy；Specially：

The first step, the complexity measure being calculated suitable for BPEL processes refer to target value and traffic error.Due to BPEL is based on the characteristic of the XML language write and BPEL processes with block structure, therefore some complexity Measure Indexes in itself And do not apply to, it is therefore desirable to remove this some complexity Measure Indexes.BPEL files are parsed using the JAR packets help of DOM4J, Corresponding each complexity measure that each BPEL process is calculated in the definition of complexity measure obtained according to statistics refers to Target value.Equally, according to the definition of three kinds of common data stream mistakes, each BPEL process is obtained with existing detection method Whether there is traffic error, the present embodiment uses enumerative technique.

The relationship of second step, complicated degree of analysis Measure Indexes and traffic error, filters out candidate feature.Its flow is as schemed Shown in 2：

(1) correlation analysis

Complexity measure index (Metrics) value and traffic error are subjected to correlation analysis first.It uses Spearman correlation calculations methods are calculated by the help of SPSS statistical analysis softwares, and wherein traffic error is a binary Value 0 or 1 represents there is no traffic error or there are traffic errors.

(2) the not strong complexity measure index of removal strength of correlation

Different complexity measure indexs are understood by (1) and whether there is the strength of correlation numerical value between traffic error, and It is strong correlation that strength of correlation absolute value, which is more than 0.5, and 0.3 to 0.5 is medium correlation, and 0.1 to 0.3 is weak correlation.Here By it is medium correlation below complexity measure index remove, and by those do not meet positive correlation or negative correlation complexity measure refer to Mark removes.

(3) it is grouped

Remaining complexity measure index is grouped.The relevant and relevant complexity of data flow is flowed according to control first Property Measure Indexes carry out major class grouping；Then the type of the complexity profile in each major class is grouped, type packet Include Size types, Density types, Partitionability types.

(3) it selects

A complexity measure index is picked out in each grouping as candidate feature.According to the complexity in each grouping Property Measure Indexes with whether there is the strength of correlation size of traffic error, every group to pick out strength of correlation maximum Complexity measure index, so as to obtain the complexity measure index set of candidate feature.

Third walks, and enumerate combination as feature according to candidate feature, uses common Classification Algorithms in Data Mining (Piao Plain Bayes, support vector machines, k nearest neighbor and decision tree) and classification prediction is carried out according to accurate by WEKA data mining softwares The result of degree determines the feature (complexity measure index) of final choice.In line with the few principle of accuracy height and feature, determine most The feature combination selected eventually.

Claims

1. whether a kind of BPEL process data stream error prediction methods based on complexity measure predict BPEL processes for classifying There are traffic errors, which is characterized in that parse complexity measure index and the data flow mistake that BPEL process data collection obtains Data accidentally are input, using the complexity measure index of classify prediction accuracy and final choice as output as a result, being characterized as multiple Polygamy Measure Indexes, classified variable for whether have traffic error, the Forecasting Methodology the specific steps are：

Step 1, the BPEL processes in data set are parsed, calculates and each refers to target value simultaneously suitable for the complexity measure of BPEL processes Detect traffic error included in the process；

Step 2, by complexity measure index and whether there is both traffic errors to analyze, filters out candidate feature；

Step 3, candidate feature is carried out enumerating combination, and as input feature vector, and whether classified variable is has data flow Mistake carries out classification prediction with sorting algorithm in WEKA data mining softwares to traffic error, pre- according to the classification of gained Accuracy result is surveyed, determines the feature of final choice.

2. the BPEL process data stream error prediction methods according to claim 1 based on complexity measure, feature exist In step 1 specifically includes following steps：

Step 1.1, due to the language that BPEL is write based on XML, there is block structure characteristic, therefore the complexity measure that will be counted on Inapplicable Measure Indexes in index remove；

Step 1.2, it calculates remaining complexity measure index value and detects to obtain with existing traffic error detection method Traffic error.

3. the BPEL process data stream error prediction methods according to claim 1 based on complexity measure, feature exist In step 2 specifically includes following steps：

Step 2.1, all complexity measure indexs and traffic error are subjected to correlation analysis, correlation analysis uses Spearman correlation calculations methods carry out correlation analysis calculating using statistical and analytical tool SPSS；

Step 2.2, the complexity with traffic error strength of correlation for low middling is removed from all complexity measure indexs Property Measure Indexes；

Step 2.3, remaining complexity measure index is grouped according to the classification and type of Measure Indexes；I.e. first according to control System stream and data flow point major class, according still further to the type packet of complexity measure index；

Step 2.4, the principle based on complexity profile in every group Yu traffic error maximum correlation, correlation is selected from every group Property a maximum index be added in candidate metrics index set, finally obtain the set of candidate metrics index.

4. the BPEL process data stream error prediction methods according to claim 3 based on complexity measure, feature exist In in step 2.2, the complexity measure index of strength of correlation low middling refers to complexity of the strength of correlation below 0.5 Measure Indexes.