EP2601594A1

EP2601594A1 - Method and apparatus for automatically processing data in a cell format

Info

Publication number: EP2601594A1
Application number: EP11749377.5A
Authority: EP
Inventors: Martin RÜGAMER
Original assignee: SOLYP Informatik GmbH
Current assignee: SOLYP Informatik GmbH
Priority date: 2010-08-06
Filing date: 2011-08-04
Publication date: 2013-06-12
Also published as: WO2012017056A1

Abstract

The invention relates to a method and a system for automatically processing data, in particular soft data, in cell format, wherein a) a start cell is selected as a first data cell for a data square, b) a measurement value for similarity between the first data cell and at least one second data cell, in particular in the vicinity of the first data cell, is then automatically generated, c) a decision is made as to whether the data square is expanded in the horizontal and/or vertical direction as a function of at least one predetermined threshold value for similarity.

Description

Method and device for automatically processing data in a cell format

The invention relates to a method for automatic

Processing of data with the features of claim 1 and a system for automatic processing of data with the features of claim 14.

In many applications data is in a cell format, e.g. is known from spreadsheets. Typically, this allows data from one category (e.g., in vertically arranged cells) to be linked to data from other categories (e.g., in horizontally arranged cells). The terms cells and data cells are used synonymously here.

Data in cell format is used again and again as import / export format for programs. The arrangement of the data in cell format has established itself as an interface between programs.

If you want to import data in a cell format into a program, it is best to import this data before importing

automatically to the information structure of the program

adapt.

It is therefore the object of a method and a

Develop a device in which a record is automatically changed to meet certain specifications.

The object is achieved by a method having the features of claim 1. In this case, data, in particular 'soft data, is automatically processed in cell format, in which a) a start cell is selected as the first data cell for a data rectangle,

b) subsequently automatically generating a measure for a similarity of the first data cell with at least one second data cell, in particular in the vicinity of the first cell,

c) depending on at least one predetermined

The similarity threshold determines whether the data rectangle is expanded in the horizontal and / or vertical direction.

The automatic determination of a measure of the similarity of data cells enables further processing of the data. It is advantageous if steps b) and c) are carried out up to a termination criterion.

The extension of the data rectangle takes place

Advantageously, depending on the comparison of

calculated measure of a similarity and a

predetermined threshold.

In an advantageous embodiment, it is determined that, starting from a data cell filled with data, it is automatically determined whether a label is present. Under a label here is a string to understand, which can be considered as a label for a number of cells. The use of the labeling information is for the subsequent further processing of the pure number information

helpful because the number is placed in a context. Furthermore, it is advantageous if the measure of the similarity between the data cells by comparing criteria of the respective data cells, in particular the respective

Data type, the respective decimal place format, the

respective order of the numbers in the data cells, the respective formatting of the data cells, a

Formula property of the respective data cells, respectively defined protection of the data cell, the respective height of the data cell, the respective width of the data cell, absolute relation between data cells, relative relation between

Data cells and / or the structure of a formula in the

Data cell is determined. In this way, a meaningful evaluation of similarity can be made. The criteria can be applied in particular in combination.

Since not all of these criteria are in a concrete

Case of application, it is advantageous if the criteria are provided with a weighting factor.

For the further evaluation of the data, it is advantageous if caption data for data cells in the vicinity of the data rectangle are automatically detected. This allows an improved allocation of the data.

In many cases datasheets have similar structures, e.g. Sales figures over years. Therefore, it is advantageous if an automatic determination of the similarities is part of a learning system. As a result, over time the process can more quickly and better identify which data should be meaningfully included in the analysis.

Furthermore, it is advantageous if due to the

Similarity Analysis automatically generates a file that has data cells to which certain attributes can be attributed based on the similarity analysis. Also, it is advantageous if the calculation of the measure and the adaptation of the size of the data rectangle in a

Spreadsheet programs are integrated. This makes it possible to analyze soft data in a spreadsheet program.

Spreadsheet programs are 'widely used and offer data in cell formats, so that an advantageous use of the method is possible here.

In a further advantageous embodiment, a determined data rectangle is automatically integrated into a database, which is in particular linked to an input template. Under an input template, e.g. understood an input mask.

It is particularly advantageous when data with their

Labels with data already in the database and their caption are automatically compared.

It is particularly advantageous if a syntactic

Structure of a first data cell and a second data cell, in particular adjacent data cells is automatically compared and, if necessary, a measure of the difference is determined. This automatically determines the similarity of data cells.

Advantageously, the method can be used in conjunction with a

Spreadsheet application. For that, the

Calculation of the measure and the adjustment of the size of the

Data rectangles to be integrated into a spreadsheet program. Thus, e.g. determine which areas in a data sheet are similar to each other so that they

highlighted that it can be directed cursor and / or saved as a separate file. The task is also performed by a system for automatic

Processing of data in cell format according to claim 14, wherein a start cell as the first data cell for a

Data rectangle is selected, with a means for

automatically determining a measure of similarity of the first data cell to at least one second data cell in the vicinity of the first data cell, wherein, depending on at least one predetermined threshold for the first data cell

Similarity is decidable whether the data rectangle in

horizontal and / or vertical direction is expanded.

A particularly advantageous solution is when a

Spreadsheet program has an integrated system according to claim 14.

Embodiments of the method and the system will be described in conjunction with the figures. Showing:

1 is a flowchart of an embodiment of the

Method;

FIG. Figure 2 is an illustration of a uniform XML envelope;

FIG. 3 is a schematic representation of the data exchange

between a client and a server;

Fig. 10 is a screen shot of an Exce 1 file as

Data source for the procedure;

Fig. 5 is a detail of the table of Fig. 4;

Fig. 6 is a tabular representation of the calculation of

Similarities between data cells; Fig. 7 is a tabular representation of the calculation of the similarities between further data lines;

Fig. 8-10 is an illustration of the characterization of adjacent ones

Data cells;

Fig. 11 is a flow chart of the basic algorithm;

Fig. 12-13 an example of the determination of orders of magnitude

of cell contents;

Fig. 14 shows an example of the detection of stripe patterns;

Fig. 15 is an example of the capturing of labels;

Fig. 16 is a flowchart similar to that for detection

Areas;

Figs. 17-18 show an example of a similarity ssucne;

19 shows an example of the automatic assignment of a

Data rectangles over the label to a

Questionnaire ;

an example of a table after editing the data rows ,;

an example of syntaktis XML

unification; Fig. 22 is a view of a questionnaire.

In the following, some embodiments will be described by way of example.

In the embodiments, it is a matter of providing a technical interface that ensures automated, intelligent processing of external data. The technical challenge lies in the independent analysis of Internet-based data on exogenous information, such as Strategy information 'such as markets, competitors, trends, financial data and automated mapping

Questionnaire content, without the user this

Manually support transmission process. Also the

Providing technical interfaces to Excel is one of them.

As an example of the embodiments, the processing of data in the cell format in connection with the software Solyp is described, which i.a. in the book by A. Zimmermann,

"Practical Business Planning with Hard and Soft Data: The Strategic Leadership System" is described.

In principle, the embodiments described here can also be implemented with other software systems. So it is e.g. possible, the automatic calculation of the measure of similarity and the adaptation of the data rectangle in one

Integrate spreadsheet program.

In the environment of soft data (e.g., data without a hard,

predetermined format description and / or data with a format description subject to exceptions) it is indispensable to also see the topic "external interfaces" in this light.

An example of soft data is business information that can not be expressed by measures. In addition to the hard system interfaces to IT systems well-known upstream systems such as SAP BW, this is the case

Daily soft data business for many individual users by sharing strategic and soft information with a variety of other people in personal responsibility.

In contrast to the generic Excel export from the SOLYP system for supplying external systems, there is still no satisfactory solution for importing any data from any source systems, ie a soft interface in the sense that no hard, technical format description is required.

First, today's hard excel import for one

Questionnaire {i. a query template for one

Data input) new and individual to develop; not to mention the effort to data in just this form too

deliver. In this. In other words, a questionnaire is a structured template into which data that is not specially adapted to this template can be imported from a data source. The algorithm described here analyzes the information in the data source. et al Similarities, to determine. This calculated information is then imported into the template, with the template only general

Presets that allow mapping of the parsed data from the data source. Such specifications can e.g. the metadata {table name, foreign keys, column names, etc.) of a relational database linked to the template.

This makes it possible to compare the data built into the database with existing data in the database.

Thus, the template does not have extensive presets that allow the mapping; the "intelligence" for the assignment of the data is in the procedure, not in the

Database or template or data source.

On the other hand, the possibility of cutting data via

Copying, pasting (cut / copy / paste) via the clipboard in SOLYP to take over, associated with a lot of manual effort. The aim of the embodiment described herein is to fill precisely this gap and make it possible to accept data from not previously known sources with minimal effort, to automatically analyze its structure on the basis of given patterns, and then to write them in the appropriate solyp data format, i. to file a questionnaire.

One embodiment of the overall method is divided into three phases, with the most important second phase in turn passing through three stages.

In Fig. 1 is a flowchart shown, in which these phases are shown.

The phase of the syntactic unification (FIG. 1, steps 1.1 to 1.5) is already known in principle.

The phase of the automatic analysis (FIG. 1, steps 2.1 to 2.3) relates to the automatic processing of the data in the

Cell format, which is rewritten here.

In the third phase then various possibilities of further processing (Fig. 1, steps 3.1 to 3.2

described).

1. Syntactic Unification

From any source and in any data format data should be transferred to a software, especially Solyp. For this purpose, a data source is selected on a client (eg a browser) (FIG. 1: step 1.1, FIG. 3: step 1), which can be clicked or dragged onto a server (Fig. 1: step 1.2; Fig. 3: step 2) is transmitted. This is also called "binary upload".

On the server, for various file formats (ie file formats - not to be confused with the free format of the data within the file) read routines are installed to open the file and transfer it to a uniform file format (eg xml) (Fig. 1: step 1.3, Fig. 3: step 3).

With this conversion - in the example from an Excel format - all content-related aspects (eg defined by criteria, which will be explained in more detail below) are retained (including layout and the like) and only the technical usability is possible for the program (here Excel) lost (Fig. 1: step 1.4). So it's theoretical

possible, from this XML representation again one

"Original copy" "produce.

Thus, any, especially proprietary

File formats converted into XML data, with which then further processing of the data is possible. Possible file formats may e.g. of word processing programs such as e.g. Word or OpenOffice, or presentation programs, such as PowerPoint are generated. Also, PDF formats and HTML documents can serve as a starting point for the conversion.

The method and system according to the present invention

Description thus has a kind of transformer of

proprietary file formats in an XML format. The

uniform XML format then contains a representation of the cell format and possibly also the connections between the

Data cells (e.g., formulas).

Practically a uniform XML Envelope is defined (Figure 2), in which, depending on the file format, adequate representations can be embedded (two examples): • A usual Excel file (* .xls) is exempted from macros and eg into a derivative of the "CALS Table Model" OASIS Technical Memorandum TM 9502: 1995 (http://www.oasis-open.org/specs/a5Q2. layout information and formula source text are kept together with the numbers and text values, see also M11-M-38784B

Default :

⁸ For a source in HTML, so a common one

Website on the Internet, the HTML source code of

dynamic .Javascript components freed and transcribed in xhtml.

Similarly, PowerPoint files can be stored in .ppt

Format {also a proprietary format) edit.

An example of how an XML download (see FIG. 3, step) may look like is shown in FIGS. 2 and 21. 2 shows a visualization of the XML grammar.

The resulting XML file is now returned to the client component (Figure 1: step 1.5; Figure 3: step 4), which now without the special file conversion libraries

Analyze source data and present restricted.

4 shows a screen image of an Excel file that can serve as a data source for the method. The following describes the automatic analysis of this

Format goes out.

2. Automatic analysis

The automatic analysis of the data advantageously takes place on the client (i.e., the browser) side. to the. one to relieve the expensive, central processing power of the server and to scale arbitrarily.

The goal is to allocate regions (ie data cells) in the source, ie here an .xlsx file or its representation in xml identify features that have specific structural {eg, rectangular range of numbers in a table) or content (eg, "EBIT" as a measure and "2010" as the current year) characteristics. These areas are hereafter referred to as

The term content-related feature is to be understood as meaning that there are identifiers (eg a header) in the data source that categorize certain data (eg, in the adjacent data cells), so the content that follows is not content in the sense of, but in the assignment of data cells to a

Identifier.

Then, this area is automatically assigned to a part of a questionnaire by deducing the form of the information (e.g., first column and column headings) on the subject sizing (e.g., different measures in several years). Thus, it is possible to identify the identifier in the questionnaire (e.g., database associated with input template) to then achieve conversion of the relevant data.

The questionnaire corresponds to a database table, the technical dimensioning corresponds to the. primarv key of this table, the assignment is a search for the primary key in the metadata repository of the database.

2.1 "Data rectangle"

Based on information in the file, it is possible that

Characterize data check (Figure 1: step 2.1). There are a number of parameters available from

Programs that deal with cell formats are provided. Starting from a first data cell, these may be e.g. characterized by the following criteria:

* Data type • decimal places

• Magnitude

• fat / course! / Color / font / frame

• Formula

• cell protection

• cell height / width

Another criterion is the structure of a formula in one of the data cells. Even if the numbers in formulas of neighboring cells are different, the syntactic structure (decomposition into terms) of a formula (e.g., a sum, an exponential expression, etc.) can provide information about the similarity of the cells to be compared. The syntactic structure allows the analysis of the formula without numbers and / or

Data row reference.

Another criterion may be the reference of the data cells in a formula. In this case, an absolute reference or a relative reference can be evaluated.

Also, the semantics of a formula can be used as a criterion by e.g. It is automatically recognized that two types of mean value calculation are contained in two data cells whose syntax is different but the target of the calculation is similar.

It is also possible to automatically recognize that a missing formula is extrapolated or interpolated in the vicinity of existing formulas. For this purpose, a formula is written in the data cell without formula, which results from the surrounding. A plausibility check may then be performed, e.g. a numerical value that is in the data cell instead of the formula, in value matches the extra or interpolated formula, or in value in the same

Magnitude is. In principle, it is possible to use some or all of these criteria for characterization.

FIG. 5 shows by way of example a section of FIG. 4

shown. The number "89.3" is intended to serve as the first data cell from which a similarity to neighboring data cells is automatically determined, since the technical evaluation of the "similarity" of two cells is of particular importance for the automatic method. The similarity between two data cells is calculated by comparing the respective criteria.

For ^"each criterion is a percentage single" formed similarity ". Then, to increase the fault tolerance of

The worst value is deleted and the remaining values are added with a (learned) weighting.

In Fig. 6 is in the form of a table, the calculation of

Similarity is shown between the data cells "89.3" and "161.6" (left neighbor data cell of "89.3", see Fig. 8)

100% match. Since this criterion has a high significance, it goes into the similarity calculation with a weight of 30%. The formatting of decimal places is included in the calculation with a relatively low weight, here 5%. The correspondence between the data cells is 100%.

The similarity also involves the order of the numbers, e.g. To identify outliers. In the present example, the orders of magnitude become over a logarithmic measure

determined. Here the decadic logarithms are determined giving an absolute difference of 0.26 between the values. Converted as a percentage, the agreement is given as 100 - 26 = 74%.

The remaining criteria in Fig. 6, i. Formatting (fat /

Italics color / font type / frame etc.), formula, cell protection and cell height and width are identical for both data cells, so there is 100% match.

For the criterion "formula", either the calculated value can be compared or the formula can be compared as "text". Length and / or structure comparison Vierden.

Once the matches have been determined, an outlier can be determined. In the example of FIG. 6, the criterion of the order of magnitude has been defined as outliers, since in this. Criterion was the least match. The removal of this result gives the best overall value, which, incidentally, can be understood as a definition of the outlier.

The overall similarity (last line in Fig. 6) is then calculated from the matches (considering the weights), where in the divisor is the sum of the relevant weights (i.e., without outliers).

The overall similarity of the data cells with the numbers "89.3" and "161.6" is calculated as 100%.

In connection with FIG. 7, an analogous calculation for the similarity of the data cells with the numbers "89.3" and "2003" (see also FIG. 9) is carried out, although the method does not have the inherent meaning of the number "2003" The year knows, but by the method described later of the

Comparison with dimension values is categorized. In the example of FIG. 7 this circumstance is not taken into consideration.

It should be noted that in the example of FIG. 7, the outlier is determined somewhat differently, since here at three

Criteria a compliance of 0% was determined. As an outlier, the criterion with the highest weighting, here the "order of magnitude", is considered to be an outlier, ie the divisor is 1-0.15 when calculating the overall score. For example, if a formatted year number is

Column heading compared with a sales value, so results in both tolerance thresholds of 90% and 80%, the correct assessment of dissimilarity. Thus, that represents

Procedure automatically (without prior knowledge), that between the data cells with the numbers "89.3" and "2003" one

considerable dissimilarity exists. The tolerance threshold is the limit at which the percentage similarity value is interpreted as a yes / no decision "similar".

Starting from the first data cell, both vertically (see FIG. 8) and horizontally (see FIG. 9) first the immediate and then further neighbors (see FIG. 10) are also characterized and compared with the output characterization. This comparison leads to the positive result "similar" when only a few aspects (up to a weighted average) are different.

In this way, first a row or column of relatively similar data cells is created. The next step will be this one

first, one-dimensional strips in the second dimension are extended to the neighbors if they are "similar" enough.This procedure is repeated over and over again in both directions until the largest possible rectangular area of "similar" data cells results.

Since in practice, the strategically relevant information is not necessarily complete, the constructive

Treating empty cells is an important point. By definition, one data cell is "similar" to an adjacent empty cell, so the spread of the data area does not stop at empty data cells, which of course must prevent completely unfilled areas, and in particular the

Remaining area of a data sheet is not interpreted as belonging to the data area. The basic algorithm (see FIG. 11) for searching the

Data rectangles are outlined. In Fig. 11 is a

Embodiment shown in the starting from a

Start data cell first in the horizontal direction the

Similarity of adjacent data cells is determined. If the similarity reaches a certain threshold, the data rectangle is extended horizontally.

Subsequently, the similarity in the vertical direction is determined. When a certain threshold_2 is reached, the data rectangle is extended by a vertical neighbor data cell and the method is included with the calculation in

continued horizontal direction. If threshold_2 is not reached, it is checked whether a horizontal extension was made in the step before. If so, then with the repeated determination of similarity in horizontal

Direction continued. If not, then the algorithm has identified a stable rectangle that is neither horizontally nor vertically expandable and the program has done its job.

In the following two further advantageous additions will be described.

Starting from the example {Fig. 4) KPI (i.e., Key Performance

Indicator) x year, the size of various KPIs can be very different and, in individual cases, the data type 'such as EBIT and EBIT margin do not match (Fig. 12).

Therefore, different thresholds are used in the two directions for "similar" values: it is not clear at the beginning which direction should be more differentiated, but considers both hypotheses and those with better "overall similarity" (Figure 13). In Fig. 13, the different percentages of similarity are indicated by double arrows in the x and y directions. One step further is the detection of

For example, after each year column, the percentage growth value for the following year can be listed: percentage and absolute values, possibly highlighted by different layouts, alternate with each other

Data cells considered together, then as a pair to

In Fig. 14 it is shown that for each year column still belongs to a percentage change.

2.2 Additional search for keywords

After a data area, i. the data rectangle in the source has been identified (see Figure 1, step 2.1, Figure 11), the classification is completed in this step by adding the missing labeling information.

On all four sides of the data rectangle you can

Column headers, row labels, comment columns,

Connect totals lines and the like. For this purpose, the previously determined data rectangle is extended both horizontally and vertically, so that up to nine gates are created: the

Data rectangle in the middle, four pages with heads and

Labels, as well as four corner areas which are either empty or e.g. contain static text. In Fig. 15

The data rectangle includes 9x4 data cells. The surrounding ones

Labels are shown by circling. The

Caption above includes 9x1 data cells, the caption below also contains 9x1 data cells. 1x4 cells are arranged left and right. In addition, there are cells in each of the corners, which are highlighted in Fig. 15 by crosses.

To determine the extent of the border areas, two combined strategies are used. The first strategy is described in FIG. 16 in the form of a flowchart. It should be noted that this embodiment of the method basically independent as well as in combination with the

Method for detecting a data rectangle (e.g., Fig. 11) is usable.

The described "search for keywords" can also be used in connection with the search for the data rectangle. For this purpose, a feature vector receives one for each subject dimension ("time", "market participant", etc.)

Entry that indicates the membership of the cell value

corresponding dimension with a similarity of 0% to 100%. This happens both via content patterns

(Regular Expressions) as well as under access to the

existing data (e.g., the name of an entity) in the dimensioning systems of the target system.

Thus, another similar semantic feature is available for the automated similarity consideration in the algorithm, which already contributes to the identification of the data or heading rectangle during the analysis. In practice, this allows e.g. Text cells with comments (ie without

Dimension match) of text cells with names of

Market participants (search hits in a table of

Market participants), although they may use the same cell layout.

In addition to the analysis with the algorithm of FIG. 16 for

Identification of similar areas, will

Advantageously, in addition, the concrete content of the data cells considered in the similarity analysis. Since in the inscription frequently the names of the values of a

Dimension of a data cube, each cell value is searched in the space of all previously known dimension values and, in the case of a found value, this criterion is additionally included in the similarity analysis. As a result, for example, year numbers, the names of competitors or scoring symbols "- +" can already be largely unambiguously assigned, thus creating for each data cell a new comparable feature "assigned dimension", which is used in the "similarity search" for areas (Figures 17 and 18).

In Fig. 17 are two different sections of the

Master data of a frame application using the method. In the left-hand illustration, a column with year numbers has been entered, in the right-hand illustration a number of competitors, here automobile companies.

In one embodiment of the method, the database is searched for whether this data has already occurred once. Dimensional attributes can be assigned to these foursides, such as in Fig. 18 for the values BMW and VW; both are market participants {MT). From this a similarity can be calculated again, here 100%.

In practice, this analysis can already be carried out initially for the primary data rectangle, but because of the numerical load it usually does not come into play there.

2.3 Compare with dimensioning of data cubes

The assignment of dimensions to data cells via the search for keywords enables the assignment of

Dimensions for complete row / column labels, ie the classification of the recognized data range into one

multidimensional data model {Fig. 1, step 2, 3), which may also be referred to as a data cube.

A data cube can be thought of as a multidimensional matrix, with the columns and rows being the dimensions

represent the data entries represent the information in the data cube. Thus, for example, an area of fixed-point numbers with year numbers as column headers and business key figures such as "EBIT" for the labeling of the lines as a 2-dimensional grid in the dimensions ,, ΚΡΙ x year recognized.

The last automatic step is now the assignment to a questionnaire (in Fig. 19 with "QUEST_PART"), by the occurrence of this dimension combination in.

Complete catalog of all questionnaires is searched. The

For example, the dimension combination "KPI x year" is used for the central data area of the "Financial Objectives in the

Business Plan "(labeled" BUPLA DATA "in Fig. 19)

assigned.

If this assignment is not clear, either a static text in the corner area of the source indicates the correct questionnaire or, as described below, the end user will be offered the various results for manual selection.

In the above example {see also Fig. 19), "ZI" denotes the sole use of the dimension "year" (KPI is implicit) and of the three possible questionnaires, the decision was made for the "BUPLAN 810".

In Fig. 20, an end product after processing is the

Data cells shown. The similar areas, e.g.

Year numbers are marked. The procedure too

automatically detects that the data cells A6 to A9

Market participants, the data cells B5 to J5 years and the

Data cells B6 to J9 contain market shares.

Thus, based on the similarity analysis, a file is automatically generated whose data cells can be assigned certain attributes.

3. Further data processing For further processing of the automatically calculated

There are a number of other data rectangles

Processing possibilities.

The embodiment according to Fig. 1 can e.g. be coupled with a learning system so that certain relationships between the data cells and the structure of a spreadsheet are stored.

FIG. 22 shows a view of a questionnaire into which the data from FIG. 20 has been read. With the. previously described procedures could collect and analyze the data from an external source where the numbers had a very different context. It does that

Processes automatically relationships that ultimately allow a qualified data transfer as shown in the Fig. 22.

Claims

claims

1. Method for automatic processing of data

in particular soft data, in cell format, wherein

a) a start cell as the first data cell for a

Data rectangle is selected,

b) then automatically a measure for a

Similarity of the first data cell with at least one second data cell, in particular in the neighborhood of the first data cell is generated,

c) depending on at least one predetermined

It is decided whether the data rectangle is horizontal and / or vertical

Direction is expanded.

2. The method according to claim 1, characterized

characterized in that steps b) and c) up to a. Abort criterion to be performed '.

3. The method according to claim 1 or 2, characterized

In addition, the extension of the

Data rectangles in horizontal and / or vertical

Direction depending on a comparison between at least one measure number for the similarity with a predetermined threshold occurs.

4. The method according to at least one of the preceding

Claims, characterized in that

starting from a data cell filled with data

automatically determines whether a label is present.

5. The method according to at least one of the preceding

Claims, characterized in that the measure of the similarity between the data cells by comparing criteria of the respective data cells, in particular the respective data type, the respective decimal place format, the respective order of the numbers in the data cells, the respective formatting of the data cells, a Formula property of each

Data cells, each defined protection of the

Data cell, the respective Hö e of the data cell, the respective width of the data cell, absolute reference between data cells, relative reference between data cells and / or the structure of a formula in the data cell is determined.

6. The method according to claim 5, dadurc h

I do not know that the criteria with a

Weighting factor.

7. The method according to at least one of the preceding

Claims, characterized in that

Caption data for data cells adjacent to the data rectangle are automatically captured.

8. Method according to at least one of the preceding

Claims, that the automatic determination of similarities is part of an adaptive system.

9. The method according to claim 8, characterized

I do not know that due to the

Similarity analysis automatically generates a file that has data cells due to the

Similarity analysis certain attributes are assignable.

10. The method according to at least one of the preceding claims, characterized in that the calculation of the measure and the adaptation of the size of the data rectangle is integrated into a spreadsheet program.

11. The method according to claim 1, wherein the determined data rectangle is automatically integrated into a database, which in particular has a

Input template is linked.

12. The method according to claim 11, characterized

that imported data is automatically compared with its captions in the database already existing data and their captions.

13. The method according to at least one of the preceding claims, characterized in that the syntactic structure of a first data cell and a second data cell, in particular adjacent data cells is automatically compared and, if necessary, a measure of the difference is determined.

14, system for automatic data processing in

Cell format, wherein a start cell is selected as the first data cell for a data rectangle, with a means for automatically determining a measure of a

Similarity of the first data cell with at least one second data cell in the neighborhood of the first one

Data cell, 'depending on at least one predetermined threshold for similarity

It is decidable whether the data rectangle is extended horizontally and / or vertically.

15. Spreadsheet program with an integrated

System according to claim 14.