EP1977319A2

EP1977319A2 - Method for the detection of a fault during operation of a system comprising a certain number of data records

Info

Publication number: EP1977319A2
Application number: EP07703593A
Authority: EP
Inventors: Einar Broese
Original assignee: Siemens AG
Current assignee: Siemens AG
Priority date: 2006-01-25
Filing date: 2007-01-03
Publication date: 2008-10-08
Also published as: WO2007085504A2; RU2008134475A; WO2007085504A3; DE102006003611A1; CN101375254A

Abstract

The invention relates to a method for detecting a fault when operating a system comprising a certain number (M) of data records. Each data record (16) encompasses a number (N) of input data (18a,b) and at least one piece of target data (18c) that correlates with said input data. Each piece of input data (18a,b) takes exclusively values lying within a value range (Wn), and the input data (18a,b) thus span a certain input data range (20). The inventive method comprises the following steps: - at least one of the value ranges (Wn) is subdivided into intervals (Ik ,n) such that N-dimensional cells (Z) comprising the intervals (Ik,n) as edge lengths are formed within the input data range (20); - one cell (Z) is selected as the cell (Z) that is to be tested; - a test criterion (42, 44, 48) for the target data (18c) of the data records (16) of the selected cell (Z) is determined from the values (F) of the target data (18c) of the data records (16); - each data record (16) of the selected cell (Z) whose value (F) of the piece of target data (18c) fails to meet the test criterion (42, 44, 48) is marked as faulty.

Description

description

Method for finding an error in the operation of an asset having a set of records

The invention relates to a method for finding an error in the operation of a record ^having a set ^{amount ¬} position.

In many areas of modern technology data is processed on a large scale. As a rule, the data are organized in the form of data records. Each record comprises a certain number of individual data. Each item ^¬ date can here in turn accept different values.

Records are generally structured so that their A ^¬ zeldaten connected in any manner or are correlated to each other. For example, in a motor vehicle, a data record may include the output or measured values of all sensors in the motor vehicle which are related to a specific vehicle

Time or within a certain time interval are detected. In a data processing system e.g. For insurance, a record may be e.g. include all known personal data or insurance data of a particular insured person.

In modern automation systems, measurement data or control data is nowadays recorded or generated on a large scale and used for the control of the system. Here too, many data or data records are generated. Without limiting the general validity, the invention will be explained in the further course of the text using the example of automation technology.

Both when capturing or generating, as well as when forwarding the data, errors can occur. For example, a sensor generating measurement data may fail or provide faulty data. Data sets or parts of these can be interchanged during automatic processing by program errors become. When entering parameters or parameters manually, incorrect entries or typing errors can occur. All this can lead to various problems, from quality defects to production downtime.

Although much effort is spent on controlling the processed data during conceptual design, planning or commissioning of a given system, it often happens that incorrect data is discovered too late. A damage is then often already occurred or it takes a very long time until the source of the erroneous data is lo ^¬ calized.

Effective search techniques to identify bad records in a set of records are missing today. By hand the ever-growing set of records is no longer con ^¬ trollierbar or verifiable. Therefore, today automatable methods for the search of faulty data sets are necessary.

So far, an automatic control of data or data ^¬ sets usually only out as a limit value control ^¬ leads. Validity ranges, eg an interval between the minimum and maximum value or a set of permitted values, are defined for each individual date in a data record. All values that fall outside the permitted rich Gültigkeitsbe ^¬ are marked as invalid. For example, it is known in a rolling mill that only raw material with thicknesses of 2cm to 20cm is rolled. A data set containing a material thickness of 0.2cm or 200cm is thus rather defective.

Thus, only coarse errors can be filtered out by the known method, since the limits for permitted values of data must be opened so far that all theoretically possible values are recognized as not being faulty. Minor errors, such as those caused by swapping materials, are not found. For the above example, insert a worker in the rolling mill, eg material of 20cm thickness and however, if input data with a thickness of 2cm is generated, this error can not be detected by the above method.

The object of the present invention is to specify an improved method for finding a faulty data record in a data set.

The object of the invention is achieved as follows. The invention is based on the following consideration. Typically, a record contains all the data for a complete description of a fact or operation. For example, in automation technology ^¬ assumed that all process parameters of the running process in the automated system are also recorded and are summarized in a data set. Otherwise, even a success of the process would not be predictable or deliver highly scattering results. All data of a data record are thus in some kind together ^¬ menhängen or relationships. These conditions or relationships are also reproducible, ie that data of an approximately same data set fluctuate only by a residual error. These relationships are given, for example, when neural networks can map them. The relations or contexts mentioned here need not be explicitly known, nor can they be described, nor purposefully imitated, they simply have to exist.

The invention is also based on the idea of dividing the data of a data set into so-called input data and target data. The above-mentioned relationships then apply such that the same or similar input data results in similar target data, only slightly scattering around a residual error typical of the respective combination of input data and target data. A neural network would then be able, for example, from the input data as inputs the target data ^¬ be adhered to the corresponding residual error to reproduce at its output. Records for which the target data deviates more than the residual error from this rule, are classified as erroneous or potentially erroneous data.

In other words, the basic idea of the invention is based on recognizing erroneous records in that feh ^¬ lerhafte target data for given or similar input data other than target data completely adopted or are to be derived from the input and target data contexts in conflict.

The method according to the invention are a number of M data records basis, each input data A ^¬ a number of N and at least one cover with these correlated Zielda ^¬ growth. The distinction between input data and target date can be made, for example, by the type of data itself. For example, typically, since all process-relevant data is acquired in an automation system, to equivalent products and constant product quality antee the process output to ga ^¬, there may for example be selected the process-related data as input data and a measured value at the producing well or process product as the target date. Alternatively, however, the classification can also take place arbitrarily, ie, a date to be checked of the data set as the target date and the remaining as input data are selected.

The N input data along with the target date of the records may also be just a subset of larger total records. In other words, are then initially not hidden from any overall record to be considered data and method of the invention is then the total records out ^¬ leads only on the anti ^¬ speaking subset of the data. This leads to computing time advantages. All that remains to be ensured is that the target data is defined as precisely as possible by the input data.

Each input date can now assume different values, but they are limited in their respective value range. For example, a 24V-powered sensor can only apply voltages from OV deliver up to 24V. Since each input data is thus on a number beam within the limited range of values, an N-dimensional hyper-cube or input data space is spanned by combining all N input data into one data set, in the manner of a vector. The input data space is thus finite or limited. All conceivable value combinations of input data are therefore within the hypervolume of the input data space. Each point in the N-dimensional input data space whose coordinates are given by the input data of a data set, then the value of the target date of the corresponding data set is zugeord ^¬ net.

In the method according to the invention for finding a faulty data record in a set of data sets of M data records, at least one of the value ranges of an input datum is subdivided into a number K _n of intervals. In the N-dimensional input space, this results in a total of a number TT ^ _n of N-dimensional cells or hyperqua-

N dern, ie subspaces of the input data space. These are ^limited by the interval limits of the intervals, ie they each have the intervals as edge lengths.

At least one of the resulting cells is selected as the cell to be tested. A test criterion is then calculated for the target data of the data records of the selected cell from the values of the target data of the data sets, since ^¬. What Datensät ^¬ ze here form the basis for determining the test criterion may be different. The test criterion can be determined from the values of one or more or all data records in one or more or all cells.

The test criterion is used to check the values of the target data in the cell to be tested. Thus, all target data of all data sets are checked, which lie with the values of their input data in the cell to be tested or in the respective interval of the value range of the input data. With in other words, the data sets whose N-dimensional coordinates are in the form of the input data in the corresponding N-dimensional hyper-cube of the cell to be tested are checked.

Each record of the selected cell, the target date misses the value of the test criterion is, flags as erroneous ge ^¬.

It is not just the target date, but the entire Da ^¬ cost rate classified as defective. Since the choice of a gangs- ^¬ and target dates may be any arbitrary method of the invention therefore reveals not only errors in the target date ^¬ but in the entire data set.

Since the test criterion is determined from the values of the target data and thus based on the processes or properties underlying the data, this is much narrower than the previously known minimum-maximum control or limit value control. This would correspond to only one Überprü ^¬ fung, whether because the input data is within the input data space.

The test criteria are usually different for each cell, so that apply different test criteria for their target data for different data sets, so Da ^¬ tensätze with different input data. In the test criteria, plausibility checks can be integrated, which were previously not possible.

For example, the data records marked as faulty can be displayed to a plant operator so that they can examine the corresponding data records more precisely. Thus, the method provides some sort of prefiltering to mark suspect records. The number of records actually manually or particularly to be checked by the plant operator is thereby reduced considerably. If the suspicious records prove to be faulty then so On the one hand, this knowledge can be used to improve, for example, the automation system underlying the data records. On the other hand, these records can be excluded from further use.

The faulty data records can provide information on other sources of error, eg sources of error that led to an incorrect target date only after several process steps. For example, an alarm triggering or the like can be linked to the finding of faulty data records if the data records originate from a production system, for example. An alarm system is, however, only makes sense if the target has been optimized ^¬ criteria for a long time, so that possible only in fact erroneous records as faulty and the false alarm ratio does not exceed an acceptable level.

The inventive method will become apparent respects ^¬ Lich of plant operation more efficient troubleshooting, a faster start-up, a higher product quality and fewer accidents.

The test criteria also adapt to newly generated data records in the course of the process or time, i. Value combinations of the input and destination data, since they are created from these.

The intervals for an input date can be formed by equidistant division of its value range. The equidistance aunt division of the value range is particularly simple and oh ^¬ ne knowledge and empirical evidence on the relationship of the input data can be performed with one another or to the target data. Here, for example, a fixed number of intervals can be specified, which is used for all input variables.

However, the intervals for an input date can also be calculated using prior knowledge of the input and / or destination data be formed. Preliminary knowledge is eg theoretical considerations or empirical experiences that certain subdivisions are better than others. On the one hand, you can subdivide important input quantities finer than unimportant ones. On the other hand, you can choose a non-equidistant division so that the increase of the target size len within a cell for all Zel ^¬ is approximately equal.

As a test criterion, a separate local tolerance range is determined for each cell for the value of the target date, which is significantly narrower than the globally permissible value range. Thus, mathematical conditions are created to test data sets or target data against the test criteria. Such a test criterion can therefore be evaluated easily and quickly, which greatly speeds up the entire process.

The local tolerance range consists of a setpoint and ei ^¬ ner allowable deviation. Setpoint and permissible deviation in the test criterion can be determined by statistical methods from the values of the target data. What data base which records ie from which cells, in this case the sta ^¬ tistical methods are used, as described above, possible in various ways.

Especially statistical methods are particularly suitable, not explicitly understandable and random processes in this case - in the present case, the residual dispersion of the target value for known input data - but still have a structure - so just a relatively low residual scattering with otherwise good reproducibility of the target value - to record in numbers.

If target values in a cell are accumulated in a certain value range, it can be assumed that individual values lying outside this value range are faulty.

The target value in the target criterion can therefore be determined as the mean value of all target data of the data records of the selected cell. The single erroneous value will then be the middle hardly change value. Incorrect values can then be detected due to a particularly large distance to the mean ^¬ to. In other words, the search for erroneous data records is performed by a local value, that is, only for the respective cell.

The permissible deviation from the local reference is calculated ^¬ advantageous way legally from the averaged across all cells local standard deviation. Of course, other can, the standard deviation similar extent as the scatter ^¬ used clothes for this example. The weighting or a multiplicative weighting factor determines the actual width of the tolerance range and can be determined empirically, for example.

The use of the local standard deviation (the target size within a cell for the calculation of the permissible deviation) is in principle possible, but less favorable, since it is falsified by the erroneous data sets more than the standard deviation averaged over all cells.

Instead of averaging the local scattering, the global spread of the target value can also be determined for all data records, ie without consideration of the cell boundaries. Subsequently, the tolerance range can be determined by weighting the global variance. This variant of the method provides an alternative measure for the assessment of erroneous data records taking into account all data records. The mentioned alternative dimensions are more or less suitable for certain applications. Which variant is the best for which application can usually be determined by empirical values or preliminary testing.

Alternatively, the data from all or the data to be examined cell determined statistical measures can also predict the under investigation cell for Beurtei ^¬ development of the defectiveness of records of used to be examined cell. In the process, therefore, an empirical model for mapping the input data to the target data, so that the target criterion can then be determined for the selected cell with the empirical model. By using an empirical model, one can attempt to model the relationships between target data and input data. In other words, an attempt is characterized to create a prediction for the value of the target date in gege ^¬ given input data in the cell to be examined. Then, it is then checked whether the target values are worth in the cell outside a tolerance range around the prediction ^¬ to, if necessary, reindeer as faulty to markie ^¬.

Various types of empirical models can be used, such as a multilayer perceptron (MLP), that is, a non-linear neural network or a just-in-time modeling (JIT) without a time term known from DE 102 03 787.6. In a nutshell describes a weighted average Teter is where it generates all records, with the weighting of the N-dimensional hyper distance of the respective input data, so the input data vector to the Middle ^¬ point of the examined cell yields. Data sets close to hyperspace are therefore most heavily weighted. Normalization can here eg len reference to the edge lengths of the newspaper, so the intervals in which the value ranges under ^¬ shares were to be performed. Such a procedure is particularly dependent on the edge length of the cell, semi-wes ^¬ a manual division of Wertebe ^¬ recommended in this case rich. The advantages of a JIT over an MLP are its higher accuracy and reduced computation time, since the learning phase required for the MLP is eliminated.

In determining the target criterion to investi ^¬ sponding cell can be excluded, that is, the endpoint is then a subset of all the cells except for the cell to be examined, determined on the basis of all or. Faulty values in the cell to be examined or faulty data records are therefore not included in the determination of the target criterion with a. The target criterion is not falsified by incorrect data records of the cell to be examined.

Located in one cell at least two groups (CLUSTER) of records that their target values regarding each of close together, between which but there is a gap which is greater than, for example, the local blocking, so be ^¬ is suspected that only one of these data groups is correct, the other or the other data groups are faulty. Cells with multiple data groups (clusters) are referred to below as heterogeneous cells. The more data there is in a bad data group, the less likely it is that the local averaging method will recognize that data. Then only the method of empirical modeling described above with the exclusion of all data from the considered heterogeneous cell will help.

One possible method for finding "heterogeneous" cells, ie cells with multiple data groups, can be as follows: the target values of the selected cell are sorted according to their size, the differences of each two adjacent target values are determined and differences greater than a gap limit value , are defined as a gap, resulting in a steadily increasing number of target values in the cell to be examined, each of which adjoins each other without gaps or gaps between them.Sets with target values that adjoin one another without gaps are grouped into clusters of a cluster in a cell are then jointly designated as faulty or non-defective depending on the target criterion. With such a method ^variant , entire accumulations or groups, namely the clusters, of erroneous data sets can be determined.

In order to check for errors, the target values are then not checked against a test criterion determined in the cell to be examined, but against a test cell determined from the surrounding cells or the remaining cells or all cells. criterion tested. These then come, for example, from the empirical model described above.

After clustering, the average value of the associated target values of the cluster can then be determined for each cluster and checked with the target criterion. In other words, clustering and separate evaluation with the target criterion via corresponding average values can be used to ^subdivide cells into cells. This measure also serves to DA to distinguish stick in their entirety by not fault ^¬ bad clusters.

The test criterion in a cell to be tested can also be determined in such a way that all data records of a cell except the data record or cluster closest to the test criterion are marked as defective. By such a process variant, it is ensured that single ^¬ Lich a single cluster in a heterogeneous cell or a single record in a cell to be correct, and all On the other ren be marked as defective. In other words, the test criterion represents a "best fit" criterion.

As already mentioned, individual outliers of target values in cells to be examined, entire clusters of erroneous values within a cell, or all data records of a cell to be examined can be identified and marked as defective by the various method variants. On geeig ^¬ netsten to recognize all erroneous records, it seems ^¬ possibility, as described in a first process pass up, so to find the outliers and eliminate to seek single faulty records in the remaining records bad clusters and eliminate , and subsequently to search for and also eliminate entire faulty cells.

Eliminate here means after finding a ^{errone ¬} th data set to exclude this from the further investigation. The target criteria in a subsequent step are then determined on a new database, namely excluding the erroneous data found so far. The target criteria are always sert verbes ^¬ so during the proceedings, as more and more erroneous values from the data base be removed.

It may also be advantageous to carry out each of the individual process steps several times, or to the entire filter proze ^¬ Major a second time.

For a further description of the invention reference is made to the embodiments of the drawings. They show, in each case in a schematic outline sketch:

1 shows a rolling mill for steel belts with system control and neural network in a highly simplified representation, FIG 2 in the rolling mill of FIG 1 generated records with two

3 shows a diagram of the target values of the data records from FIG. 2 plotted against the input data, FIG.

4 shows a flow chart for a method for finding faulty data records.

1 shows a steel rolling mill 2 with a corresponding system control 4 in a highly simplified representation. In the steel rolling mill 2, raw material 6, namely metal slabs, is rolled by the rollers indicated symbolically to form a metal strip 8 as a starting product of the rolling mill 2.

The system controller 4 comprises two sensors 10a, b, the sensor 10a detecting the width b of the raw material 6 and the sensor 10b detecting the rolling force F in the steel rolling mill 2. The plant control ^¬ tion 4 also includes a keyboard 12, via which a worker, not shown, which feeds the steel mill 2, the raw material 6, the carbon content c of Rohmateri ^¬ as 6 as numerical value in percent enters. For controlling or monitoring the rolling process in the steel rolling mill 2, a neural network 14 is integrated in the system control 4. The neural network 14 forms the steel after the rolling process and is calculated with a knowledge of the width b and the carbon content c of the raw material 6, before the metal strip is Untitled 8 gefer ^¬ expected or resultant rolling force F. The neural network 14 thus creates a Prädik ^¬ tion of during the rolling process, so at a later time, the measured value F.

6 which tallband to every single piece of raw material to a Me ^¬ 8 is rolled, so there is a record 16 which includes three data items 18a-c or variables, each of numeric values, namely the width b in mm, the carbon content c in percent and the rolling force F in kN be a ^¬ voted individual raw material 6 or created from metal strip 8 containing as values.

On a given production day, a total of a number of M metal strips 8 are produced in a steel mill 2. In the system control 4, therefore, a total of M data records 16, which are shown in FIG.

For the steel rolling mill 2, the following conditions are known loading: The width b of the raw material 6 can only Zvi ^¬ rule 1100mm and 1400mm vary, since only such a raw material is purchased or processed. 8 The concentration c of the carbon content in the raw material 6 is guaranteed by manufacturers whose Her ^¬ and is between 0.30% and 0.60%. Other raw materials are definitely not processed in Rolling Mill 2. The starting material of the steel rolling mill, so as Me ^¬ tallband 8 only those are rolled, the rolling force F is between 20OkN and 200OkN, since these are the mechanically preset by the assets. Metal belts 8 with other values can definitely not leave the rolling mill 2.

The M data records 16 are now to be checked for erroneous data records 16, ie erroneous data records 16 are found and marked as faulty. According to the prior art, only the interval limits of the variables b, c and d, that is to say the individual data 18a-c, have hitherto been checked by a computing unit (not shown) in the system control 4. For example, records 16 whose values were c <0.30% or c> 0.60% were marked as faulty. Such an indication may, for example, be merely an input error on the keyboard 12, as it is definitely known that materials with such a carbon content c are not used. Such records 16 are already sorted out in FIG. The values of b, c and F of all the data sets in FIG 2 ^¬ thus lie within the allowable range ^¬ intervals for the individual data 18a-c.

For carrying out the method according to the invention, a datum, namely the individual datum 18c or the rolling force F, is selected as target date or target value for the method in all datasets 16. The individual data 18a, b, or width b and carbon content c, are selected as two input data or input values for the method. Thus, N = 2. These

It makes sense to make a choice, since the rolling force F actually results from the values b and c in the rolling process on account of the settings, for example roller position, processing temperature etc., of the rolling mill 2, and above all, forcibly. As with the individual data 18a, b in the highly simplified example - all remaining parameters of the rolling mill 2 are constant - all factors influencing the rolling force F be ^¬ are known, is located apart from manufacturing tolerances, b for a given pair of values width and carbon content c a constantly about set the same rolling force F

In FIG 3, all M data records are provides 16 are graphically displayed ^¬. The input data 18a, b, so the values of b and c ^¬ span a NEN, since N = 2 within their permitted ranges of values Wi and W _2, the two-dimensional input space 20. All pairs of values of the input data 18a, b so implausible exclusion after preliminary ^¬ values must according to prior art, as discussed above, are within the input space 20th in the Input space 20 are the records with the numbers 1-3, MI and M each marked by a labeled cross. The values of the individual data 18a, b form the Cartesian coordinates.

In the method according to FIG. 4, in a first step 38, the value range of the individual data 18a, b is divided into three equal intervals In to I _i3 and I ₂ i to I ₂ 3. By dividing into intervals In to I ₂ 3, the input space 20 is divided into nine cells Z ₁₁ to Z ₃₃ .

All M records 16 are thus each unique in one of the new cells Z ₁₁ to Z ₃₃ . Thus, for example, in the cell Z _{31 are} all data sets 16 whose width b between 1300mm and 1400mm see and their carbon content c between ^¬ 0.30% and 0.40%. Each of these data records has a target value, namely a rolling force F. The distribution of the respective rolling forces F is shown separately in FIG. 4 as histogram I ₁₁ to f ₃₃ for each of the cells Z ₁₁ to Z ₃₃ . Each histogram shows for the respective rolling force F as abscissa the frequency of the rolling forces F plotted on the ordinate. For example, the histogram f _{32 of} the cell Z _{32 contributes} the data records 16 in the cell Z ₃₂ with the numbers 1 and M. The corresponding rolling forces F of 180OkN and 120OkN are also marked on the F-axis of the histogram f ₃₂ by crosses.

In order to find faulty data records 16, in the next step 40, an average value F ₁₁ to F ₃₃ and a standard deviation σ _n to σ _{33 are} calculated for each of the nine cells Z ₁₁ to Z ₃₃ . In the example of Figure 3 there is, for example, for cell 31, a mean value of ₁₁ F and a 140OkN Standardab ^¬ deviation σ _n of 230N.

Subsequently, in a step 31, the middle local

Scatter σ _loc calculated as average value of all the deviations of (T ₁₁ to σ _33rd In the example, this gives σ _loc = 250N. DC is ^¬ time, the global dispersion σ of all the values F _lob sämtli- rather, M records 16 are determined as the difference of the maximum value of F and the minimum value of F.

In loop 42, Z ₁₁ to Z _{33 are then selected} once for each cell as the cell to be tested. In the selected cell from ^¬ is checked whether these records contains 16 whose rolling force F by more ₁₁ to F ₃₃ of the cell is different than α σ _loc the appropriate mean F. This test thus represents the test criterion according to the invention. Data sets 16 for which this condition is fulfilled are marked as faulty and excluded from further consideration. In the example of cell Z ₃₁ , for example, a data record 16 is included whose rolling force F is 30OkN. This is indicated in FIG. 3 as a wrong date 22. As parameter α, the value three is selected in cell Z ₃ i. From the mean F _13, the FEHLDA ^¬ tum is 22 to more than 3CR _{/ oc} removed. This data record is thus marked as invalid and excluded from further consideration.

In other words, in the loop 42 individual records 16, its individual data 18a-c differ in terms of the plurality of other records 16, so-called "outliers" detected and kiert as invalid records 16 mar ^¬.

The loop 42 describes the first way to identify bad records 16. The loop 42 may be performed alone or in combination with the other method variants listed below. Steps 40-42 can be repeated several times, until no feh ^¬ lerhaften data or incorrect data 22 are found.

It is expedient, however, to carry out the said process variants one after the other in the sequence listed here in FIG. The detected in the loop 42 fehlerhaf ^¬ th records are removed from the total amount of all records 16, the following studies therefore already in a pre-filtered subset of the original quantity or M DA performed. By eliminating erroneous data records, the statistical measures, etc., such as the mean values F or standard deviations σ become more accurate, ie they more and more correspond to the values for an error-free and correct process flow and the associated values of the individual data 18a-c in the data records 16.

To identify further erroneous data sets, a sorting step 44 follows the loop 42. In the sorting step 44, for each test cell Z ₁₁ to Z ₃₃ each ^¬ weils the individual data 18c, so the rolling forces F of all the data sets ^¬ 16 of the corresponding cell according to their size, ranked in ascending or descending so. In FIG 3, this is by the appearance of a histogram f _n indicates anyway overall to f _33, again shown explicitly in Fig. 4 Between every two adjacent rolling forces weils ^¬ F whose distances are determined 26th Distances 26 which are greater than a threshold value β σ _loc are identified as gap 28. in the

Example, the value of the parameter ß = 2 was selected. Ze data records 16 with rolling forces F, which together ^¬ hanging without a gap 28 are in clusters or groups 30a, b summarized.

In FIG. 3, the cell Zi _{2 shows} a value distribution of the WaIz forces F of the data sets 16 lying in the cell 12, which are divided by the gap 28 into the two groups 30 a, b. The cell Z _i2 is therefore declared as a "heterogeneous cell".

In the case of such a heterogeneous cell Z _{i2, it} is suspected that one of the groups 30a or 30b, in contrast to the outliers or incorrect data 22 found above, represents a whole group of erroneous values 18c.

In one, the step 40 corresponding to step 46 is, therefore, separates inside the cell Z _i2, however, for each group 30a and b ge ^¬ whose respective mean value F _12a and F _12b of the roll ^¬ forces F is determined, which are also shown in FIG 3 , Since there are now several, in the example two, mean values in the cell Z ₁₂ , none of these a priori can be assumed to be the correct mean value as above. Therefore, the correct expected target value, ie mean value, in the time Z12 must be estimated otherwise.

Therefore, in a step 48, an empirical estimation Mo ^¬ is designed dell of 32 the data sets 16 underlying process. In the present example, it is therefore attempted to replicate the rolling mill 2 in that, given given values of width b and carbon content c, the model 32 supplies the expected rolling force F as the starting variable. For this purpose, of the nine cells Z ₁₁ to Z _33, all but the cell to be examined, in the example the cell Zi ₂ , are used for the modeling. That is, only the data sets, since ^¬ 16 of the remaining eight cells are used, an estimated value F _s for the test cell Zi ₂ to ermit ^¬ stuffs. The model 32 is in this case for example an MLP or JIT, as explained above.

The estimated value F _s determined in the estimation step 48 is shown in FIG

3 drawn. For all, in the example, both groups Grup ^¬ pen 30a, b is now checked which of the average values of Grup ^¬ pen 30a, b, in the example F _12a or F _12b the estimated value F _s comes next. In FIG. 3, this is the mean value F _12a of FIG

Group 30a. This is considered valid. Since all ^¬ data sets 16 of the remaining groups, in the example of the group

30b, are therefore marked as faulty. The check on the smallest distance of the mean values F _12a or F _{12b of} the groups 30a, b to the estimated value F _s thus represents the test criterion according to the invention in step 48. Of course, the way there, ie the sorting of the data records 16 and the finding of the gaps, is also there 28 in a certain way to the test criterion.

Thus, with the method variant of steps 44 to 48, erroneous data records 16, in the example the group 30 b of data records 16, can also be identified and invalidated be marked and then excluded from further proceedings.

With the third method variant, which adjoins the estimation step 48 in FIG. 4, data records are then searched which are faulty in their entirety within a cell as a closed group without a gap 28. This is shown in FIG. 3 for the cell Z ₁₃ .

In step 50, the average values are in turn F ₁₁ to F ₃₃ agrees ₁₁ to Z ₃₃ for all cells be ^¬ Z corresponding to step 40th Since, however, were as now ruled out some records to be faulty in this instance, he give up this ^¬ other numerical values for the mean values F ₁₁ through F ₃₃ in step 50. Then again, an estimation step

52 according to the estimation step 48, but also carried out on the changed database, so that for each of the cells Z ₁₁ to Z ₃₃ , a corresponding estimated value F _lls to

F _{33s is} determined. This again takes place optionally at the same or changed empirical model 32 as in step 48, in that the corresponding cell Zu to Z _{33 is} excluded for each of the estimated values F _lls to F _33s .

Subsequently, in the test step 54, it is checked for each of the cells Z ₁₁ to Z ₃₃ whether their mean value F ₁₁ to F ₃₃ is more than

Y ₁ σ _loc deviates from the corresponding estimated value F _lls to F _33s . Instead of the local scattering σ _loc , the global scattering can alternatively be used here. The test condition, ie the test criterion, is then whether the deviation is smaller than γ ₂ σ _glob . In the example, the values γil = 3 and γ ₂ 2 = 0.2 were chosen.

In step 54, therefore, the step 42 is a test ^¬ criterion based on the weighted by a factor σ using local or global σ _loc _lob scattering accordingly. Correspondingly found records are again marked as invalid. In Figure 3, Z _i3 is detected in process step 50 to 54 for the cell that the only group 34 Da ^¬ tensätzen 16 with their calculated average by more than ₁₃ F 0,2σ _sM from the from the remaining eight cell Z ₁₁ to Z ₃₃ (without

Z ₁₃ ) estimated mean value F _13s deviates. All data records 16 of the cell Zi ₃ are therefore marked as invalid.

Claims

claims

1. A method for locating a fault in the operation of a set of records (M) having installation, in particular an automation system, each record (16) a number (N) input data (18a, b) and at least one sen with the ^¬ correlated destination date (18c ), and wherein each

The date of receipt (18a, b) only values within a value preparation ^¬ ches (W _n) assumes and thereby the input data (18a, b) span egg NEN input data space (20), comprising the steps of:

- At least one of the value ranges (W _n ) is divided into Interval ^¬ le (I _kπ ), which forms in the input data _space (20) N-dimensional cells (Z) with the intervals (I _kn ) as edge lengths,

at least one cell (Z) is selected as cell to be tested (Z),

from the values (F) of the target data (18c) of the data records (16) a test criterion (42, 44, 48) for the target data (18c) of the data records (16) of the selected cell (Z) is determined;

- Each record (16) of the selected cell (Z), the value (F) of the target date (18c) misses the test criterion (42,44,48) is marked as faulty and based on which an error is found in the system.

2. The method of claim 1, wherein the intervals (I _kn ) for an input date (18a, b) by equidistant division of the value range (W _n ) in a predetermined number of Inter ^¬ vallen (I _kn ) are formed.

3. The method of claim 1, _wherein the intervals (I _kπ ) for an input data (18a, b) using Vorwissen ü- on the input (18a, b) and / or target data (18c) by ma ^¬ nually configurable Division of its value range (W _n ) are formed.

4. The method according to any one of the preceding claims, wherein as the test criterion (42,44,48) a tolerance range (F ± σ) for the value (F) of the target date (18c) is determined.

5. The method of claim 4, wherein the test criterion (42,44,48) by statistical methods from values (F) of the target data (18c) are determined.

6. The method according to claim 4 or 5, wherein the desired value (F) of the average value of the target data (18c) of the data sets (16) of the selected cell (Z) is determined.

7. Method according to one of claims 4 to 6, wherein the tolerance range (F ± σ) is determined by weighting (α, β, γi, γ ₂ ) with the standard deviation of the corresponding target data (18c) of the selected cell (Z) ,

8. The method according to any one of claims 4 to 6, wherein the tolerance range (σ) by weighting (α, ß, γi) with the average over all cells (Z) standard deviation of the corresponding target data (18c) is determined.

9. The method according to any one of claims 4 to 8, wherein:

the global variance (σ) of the target value is determined for all data records (16),

- The tolerance range by weighting (γ ₂ ) of the global dispersion (σ _g ) is determined.

10. The method according to any one of the preceding claims, wherein:

- an empirical model (32) for imaging the Eingangsda ^¬ th (18a, b) is created on the target data (18c)

- The target criterion (48,54) for the selected cell (Z) with the empirical model (32) is determined.

11. The method according to any one of the preceding claims, wherein - The target criterion (42,44,48) based on all or a subset of all cells (Z), except the cell to be examined (Z) is determined.

12. The method according to any one of the preceding claims, wherein

the target values (18c) of the selected cell (Z) are sorted according to their size,

the differences (26) of each two adjacent target values (18c) are determined,

Differences (26) which are greater than a gap limit, defined as gap (28),

- Data sets (16) with target values (18c), which adjoin one another without a gap (28), are combined to form clusters (30a, b),

- Data records (16) of a cluster (30a, b) depending on the test criterion (42,44,48) are jointly marked as faulty or not faulty.

13. The method of claim 12, wherein for each cluster

(30a, b) of the average of the corresponding target values (18c) ermit ^¬ telt and is tested with the test criterion (42,44,48).

14. The method according to any one of claims 12 or 13, wherein the test criterion (42,44,48) is determined so that all Da ^¬ tenätze (16) of a cell (Z) except the the test criterion (42,44,48) be in closest data set (18) or Clus ^¬ ters (34) as defective.