DE102004059120A1

DE102004059120A1 - Input-document data stream converting method for high performance printing system, involves storing data, extracted based on rule set, in structured data file, where set is formed such that arbitrary data is mapped onto arbitrary data field

Info

Publication number: DE102004059120A1
Application number: DE102004059120A
Authority: DE
Inventors: Matthias Dr. Fromm; Georg Dr. Landmesser; Werner Engbrocks
Original assignee: Oce Printing Systems GmbH and Co KG
Current assignee: Canon Production Printing Germany GmbH and Co KG
Priority date: 2004-12-08
Filing date: 2004-12-08
Publication date: 2006-06-14

Abstract

The method involves extracting data from an input-document data stream according to a preset rule set, and storing the data in a structured data file. Field names are assigned to data fields in the data file, and the data fields are structured in a set of data levels. The rule set is formed in such a manner that arbitrary data of the input data stream is mapped onto an arbitrary data field of the data file. Independent claims are also included for the following: (A) a method for producing a rule of a rule set (B) a computer program-product for executing a method of converting an input-document data stream into a structured data file for creating an output-document data stream (C) a system for executing a method of converting an input-document data stream into a structured data file for creating an output-document data stream.

Description

Die Erfindung betrifft ein Verfahren zum Umwandeln eines Eingangs-Dokumentendatenstroms mit einem oder mehreren Dokumenten in eine strukturierte Datendatei zur Erzeugung eines Ausgangs-Dokumentendatenstroms und ein Computerprogramm-Produkt zum Erstellen eines Regelsatzes für ein solches Verfahren.The The invention relates to a method of converting an input document data stream with one or more documents in a structured data file for generating an output document data stream and a computer program product to create a rule set for such a procedure.

Aus der WO 2004/040432 A1 ist ein Verfahren und eine Vorrichtung zum Verarbeiten eines Dokumentendatenstromes eines Eingangsformates zu einem Ausgangsformat bekannt. Der Eingangs-Dokumentendatenstrom wird mittels eines Übersetzungsstufenmoduls in normierte Daten umgesetzt. Das Übersetzungsstufenmodul wird von einer Regeldatei gesteuert. Die Regeldatei enthält Mapping-Regeln, die aus dem Eingangs-Dokumentendatenstrom und/oder einem gegebenenfalls neu zu erstellenden Design-Datensatzes und/oder aus eingangsdatenspezifischen Hilfsdateien gebildet werden. Sowohl der Design-Datensatz als auch die Regeldatei können frei editierbar sein. Der Design-Datensatz kann aus dem Eingangs-Dokumentendatenstrom und/oder aus eingangsdaten-spezifischen Hilfsdateien gebildet werden und zusätzlich bei der Bildung eines Dokumenten-Templates verwendet werden, das die Formatierung der normierten Daten steuert. Alternativ dazu kann die Regeldatei auch direkt aus dem Eingangs-Dokumentendatenstrom oder anderen Datei-Informationen aus Hilfsdateien gewonnen werden.Out WO 2004/040432 A1 discloses a method and a device for the Processing a document data stream of an input format known to an output format. The input document data stream is by means of a translation stage module converted into standardized data. The translation level module becomes controlled by a rule file. The rules file contains mapping rules, that from the input document data stream and / or any new design dataset to be created and / or are formed from input data-specific auxiliary files. Either the design record as well as the rule file be freely editable. The design record may be from the input document data stream and / or be formed from input data-specific auxiliary files and additionally be used in the formation of a document template that controls the formatting of the normalized data. Alternatively, it can the rules file also directly from the input document data stream or other file information be obtained from auxiliary files.

Die in der Regeldatei angegebenen Mapping-Regeln sind spezifisch für den Eingangs-Dokumentendatenstrom. Sie geben an, welches Element des Eingangs-Dokumentendatenstroms zu welchen Elementen des Design-Datensatzes zuzuordnen ist. Der Design-Datensatz enthält die Strukturdefinition der normierten Daten, wobei für verschiedene Strukturelemente, zum Beispiel für Kundennummern, Namen, Logos usw., Typ-Deklarationen vorgesehen sind. In den normierten Rohdaten können dann auch Datengruppen gebildet werden, die zusammengehören, insbesondere all diejenigen Daten, die zu einem Dokument gehören. Somit sind für jedes Dokument alle zugehörigen Daten im normierten Rohdatenstrom verfügbar. Ein Dokumenten-Template dient als Strukturvorlage für die zu erzeugenden Dokumente und beschreibt, welche Formatierungsanweisungen im normierten Datenstrom hinzuzufügen sind. Es kann Elemente aus dem Design-Datensatz enthalten und/oder frei programmierte statische oder dynamische Elemente enthalten. Das Dokumenten-Template dient dazu, die Formatbildungseinrichtung (Formatter oder document composition engine) zu steuern. Aus dem normierten Rohdatenstrom wird durch die Formatbildungseinrichtung dokumentenweise ein ressourcen-orientierter Datenstrom gebildet. Soweit bereits in den Rohdaten Formatierungen enthalten waren, werden diese beibehalten und soweit die Rohdaten unformatiert sind und im Dokumenten-Template zu den entsprechenden Datenfeldern Formatierungsangaben enthalten sind, werden diese ressourcen-orientiert in der Formatbildungseinrichtung hinzugefügt, wobei Ressourcen, die mehrfach innerhalb eines Datenstromes benötigt werden, weiter verarbeitet werden, d.h. im ressourcen-orientierten Datenstrom hauptsächlich durch Aufrufen der Ressourcen eingefügt werden, wobei die Ressourcen selbst nur einmal intern vorhanden sind oder extern von einer Ressourcen-Datei geladen oder auch nur referenziert werden können.The Mapping rules specified in the rule file are specific to the input document data stream. They indicate which element of the input document data stream to which elements of the design data record is to be assigned. Of the Contains design record the structure definition of the normalized data, wherein for different Structural elements, for example for Customer numbers, names, logos, etc., type declarations are provided. In the Normalized raw data can then also groups of data are formed, which belong together, in particular all the data that belongs to a document. Thus, for each Document all related Data available in normalized raw data stream. A document template serves as a structure template for the documents to be generated and describes which formatting instructions in the normalized data stream. It can be elements from the design record included and / or free-programmed static or contain dynamic elements. The document template is used In addition, the format-forming facility (Formatter or document composition engine). From the normalized raw data stream is through the format builder is a resource-oriented document-by-document Data stream formed. As far as in the raw data formatting were included, these are retained and as far as the raw data are unformatted and in the document template to the appropriate Data fields contain formatting information, these are resource-oriented added in the formatter, using resources that multiply needed within a data stream be further processed, i. in the resource-oriented Data stream mainly by invoking the resources, with the resources itself only once internally or externally from a resource file loaded or can only be referenced.

Bei diesem Verfahren ist das Erstellen der Regeldatei aufwändig und erfordert erhebliche Softwarekenntnisse.at In this procedure, the creation of the rules file is laborious and requires considerable software knowledge.

Die Adobe Systems, Inc., USA bietet unter der Produktbezeichnung Adobe Central Pro Output Server ein Produkt an, mit dem es auch möglich ist, einen Eingangs-Dokumentendatenstrom in eine Datendatei automatisch umzusetzen. Die hierbei verwendeten Regeln können von einem Benutzer mittels einer grafischen Benutzeroberfläche eingegeben werden, wobei ein Vorlagedokument auf der Benutzeroberfläche dargestellt wird. Einzelne Felder des Vorlagedokumentes können vom Benutzer ausgewählt und ihnen kann irgendeine Typ-Deklaration zugeordnet werden. Es können auch bestimmte Abschnitte im Dokument definiert werden, die wiederholt vorkommen. Diese Abschnitte werden anhand eines Regelsatzes festgelegt, der den Abschnitttyp im Eingangs-Dokumentendatenstrom erkennt und dann die entsprechenden Felder ausliest. Diese Abschnitte erstrecken sich jeweils über die gesamte Seitenbreite.The Adobe Systems, Inc., USA offers under the product name Adobe Central Pro Output Server a product with which it is also possible an input document data stream to automatically translate into a data file. The used here Rules can entered by a user by means of a graphical user interface where a template document is displayed on the user interface. Individual fields of the template document can be selected by the user and they can be assigned any type declaration. It can too certain sections are defined in the document that repeats occurrence. These sections are determined by a rule set, the section type in the input document data stream recognizes and then reads the appropriate fields. These sections extend each over the entire page width.

Beim Ausführen der automatischen Umsetzung des Eingangs-Dokumentendatenstroms in die Datendatei werden aus dem Eingangs-Dokumentendatenstrom alle nicht-auszulesenden Daten entfernt und die auszulesenden Daten werden in der gleichen Reihenfolge wie im Eingangs-Dokumentendatenstrom in der Datendatei abgespeichert, wobei den einzelnen Daten jeweils eine Typ-Deklaration hinzugefügt ist. Bei diesem bekannten Verfahren erhält man somit eine Datendatei, in der die einzelnen Daten in der gleichen Reihenfolge wie im Eingangs-Dokumentendatenstrom aufeinander folgend aufgeführt sind.At the To run the automatic translation of the input document data stream into the data file will all be undeliverable from the input document data stream Data is removed and the data to be read is in the same Order as in the input document data stream in the data file stored, wherein the individual data each have a type declaration added is. In this known method, one thus obtains a data file, where the individual data is in the same order as in the input document data stream listed consecutively are.

Es besteht ein erheblicher Bedarf, Eingangs-Dokumentendatenströme von seit langem benutzten Systemen, die jedoch aus sicherheitsrelevanten Gründen weiter benutzt werden sollen, möglichst flexibel in Ausgangs-Dokumentendatenströme umzuwandeln. Derartige seit langem eingesetzte Systeme werden vor allem bei Banken und Versicherungen verwendet und allgemein als Legacy-Anwendungen bezeichnet. Diese Systeme besitzen oftmals nur sehr beschränkte Formatierungsmöglichkeiten und die Daten werden häufig als sogenannte ASCII Zeilendatenstrom ausgegeben, der im wesentlichen nur Schriftzeichen sowie Zeilen- und Seitenumbrüche enthält. Man möchte jedoch diese Daten gegenüber dem Kunden in einem zeitgemäßen Format darstellen.It There is a significant need for long-used input document data streams Systems, however, continue for security reasons should be used, if possible to convert flexibly into output document data streams. Such systems have long been used, especially in banks and insurance and commonly used as legacy applications designated. These systems often have very limited formatting options and the data becomes common issued as a so-called ASCII line data stream, which is essentially contains only characters as well as line and page breaks. However, one would like this data over the Represent customers in a contemporary format.

Bei dem Produkt Adobe Central Pro Output Server wird eine allgemeine Datendatei erstellt, die für unterschiedliche Ausgangs-Dokumentendatenströme geeignet ist. Es hat sich jedoch gezeigt, dass die hierbei erstellte Datenliste nur bedingt für die Weiterverarbeitung geeignet ist, da das Auffinden einzelner Daten, die in der gleichen Reihenfolge im Ursprungsdokument angeordnet sind, sich sehr schwierig gestalten kann.at The product Adobe Central Pro Output Server becomes a generic Data file created for different output document data streams is suitable. It has however, shown that the created data list only conditionally for the Processing, since finding individual data, which are arranged in the same order in the original document are, can be very difficult.

Ferner ist das Erstellen der Regelsätze, insbesondere wenn die Dokumente des Eingangs-Dokumentendatenstroms komplexe Strukturen, wie zum Beispiel Tabellen, besitzen, bei dem oben genannten Verfahren sehr aufwändig.Further is the creation of rulesets, especially if the documents of the input document data stream have complex structures, such as tables, in which the above-mentioned method very expensive.

Der Erfindung liegt die Aufgabe zugrunde, ein Verfahren zum Umwandeln eines Eingangs-Dokumentendatenstroms mit einem oder mehreren Dokumenten in eine Datendatei zur Erzeugung eines Ausgangs-Dokumentendatenstroms zu schaffen, das eine sehr flexibel und einfach in einen beliebig formatierten Ausgangs-Dokumentendatenstrom umsetzbare Datendatei ergibt.Of the Invention is based on the object, a method for converting an input document data stream with one or more documents in a data file for generating an output document data stream to create that one very flexible and easy in any formatted output document data stream convertible data file results.

Die Aufgabe wird durch ein Verfahren mit dem Merkmal des Anspruchs 1 gelöst.The The object is achieved by a method having the feature of claim 1 solved.

Weiterhin liegt der Erfindung die Aufgabe zugrunde, ein Verfahren und ein Computerprogramm-Produkt zu schaffen, das eine einfache Eingabe von Regeln zur Umsetzung eines Eingangs-Dokumentendatenstroms in eine strukturierte Datendatei ermöglicht.Farther the invention has the object, a method and a Computer program product to create a simple input of rules for converting an input document data stream into a structured data file allows.

Diese Aufgabe wird durch ein Verfahren gemäß Anspruch 2 und ein Computerprogramm-Produkt mit den Merkmalen des Anspruchs 24 gelöst.These The object is achieved by a method according to claim 2 and a computer program product solved with the features of claim 24.

Vorteilhafte Ausgestaltungen der Erfindung sind in den jeweiligen Unteransprüchen angegeben.advantageous Embodiments of the invention are specified in the respective subclaims.

Beim erfindungsgemäßen Verfahren zum Umwandeln eines Eingangs-Dokumentendatenstroms mit einem oder mehreren Dokumenten in eine strukturierte Datendatei zur Erzeugung eines Ausgangs-Dokumentendatenstroms werden aus dem Eingangs-Dokumentendatenstrom Daten gemäß einem vorbestimmten Regelsatz extrahiert und in die strukturierte Datendatei gespeichert, wobei in der strukturierten Datendatei den einzelnen Datenfeldern Feldnamen bzw. Typ-Deklarationen zugeordnet sind, die Datenfelder in mehrere Datenebenen strukturierbar sind, und der Regelsatz derart ausgebildet ist, dass beliebige Daten aus dem Eingangs-Dokumentendatenstrom auf ein beliebiges Datenfeld der strukturierten Datendatei abbildbar sind.At the inventive method for converting an input document data stream with one or multiple documents into a structured data file for creation of an output document data stream are from the input document data stream Data according to one extracted predetermined rule set and into the structured data file stored in the structured data file the individual Data fields are assigned field names or type declarations that Data fields in several data levels are structured, and the Rule set is designed such that any data from the input document data stream can be mapped onto any data field of the structured data file are.

Mit dem erfindungsgemäßen Verfahren können beliebige Daten des Eingangs-Dokumentendatenstroms eines Dokumentes auf beliebige Datenfelder der strukturierten Datendatei abgebildet werden. Die strukturierte Datendatei enthält somit nach beliebigen vom Benutzer vorgegebenen Gesichtspunkten geordnete Daten, die auch in mehreren Datenebenen strukturiert sein können. Diese strukturierte Datendatei stellt somit eine Art Datenbank dar, bei der die Daten in einer vom Benutzer vorgegebenen Baumstruktur angeordnet sind.With the method according to the invention can any data of the input document data stream of a document mapped to any data fields of the structured data file become. The structured data file thus contains any of the User given viewpoint-ordered data, too can be structured in several data levels. This structured data file thus represents a kind of database in which the data in one arranged by the user tree structure are arranged.

Verfahren zum Drucken von Daten aus Datenbanken sind hinlänglich bekannt und hierbei können beliebige Formate eingesetzt werden.method for printing data from databases are well known and this can Any formats can be used.

Durch das Erzeugen einer strukturierten Datendatei wird aus dem Eingangs-Dokumentendatenstrom eine in einem Druckprozess sehr flexibel weiter verarbeitbare Datenbank zur Verfügung gestellt.By the creation of a structured data file becomes an input document data stream In a printing process very flexible processable database to disposal posed.

Vorzugsweise ist das erfindungsgemäße Verfahren derart ausgebildet, dass einzeln Regeln des Regelsatzes erstellt werden, indem auf einer graphischen Benutzeroberfläche in einem Fenster ein Vorlagedokument und in einem anderen Fenster Datenfelder in einer Baumstruktur dargestellt werden und durch Markieren von Daten im Vorlagedokument jeweils ein Quelldatenfeld definiert wird, um beim Verknüpfen eines solchen Quelldatenfeldes des Vorlagendokumentes mit einem Datenfeld automatisch eine Regel zu erstellen, mit welcher ein Quelldatenfeld aus dem Eingangs-Dokumentendatenstrom ausgelesen und dessen Inhalt gemäß der strukturierten Datendatei in dem korrespondierenden Datenfeld abgespeichert wird.Preferably, the method according to the invention is designed in such a way that rules of the rule set are created individually by displaying a template document in a window in one window and data fields in a tree structure in another window and by marking data in the template document is defined in each case a source data field to automatically create a rule when linking such a source data field of the original document with a data field, with which a source data field is read from the input document data stream and its content is stored according to the structured data file in the corresponding data field ,

Das erfindungsgemäße Computerprogramm-Produkt zum Erstellen eines Regelsatzes für das oben erläuterte Verfahren umfasst eine graphische Benutzeroberfläche mit mehreren Fenstern, wobei in einem Fenster ein Vorlagedokument darstellbar ist, das dem Format der im Eingangs-Dokumentendatenstrom enthaltenen Dokumente entspricht, und in einem weiteren Fenster die Datenfelder in einer Baumstruktur, die mehrere Ebenen umfassen kann, anordbar sind, und Mittel zum Definieren von Quelldatenfelder und Verknüpfen derselben mit den Datenfeldern, wobei bei einer solchen Verknüpfung automatisch eine Regel erstellt wird, zum Auslesen eines Quelldatenfeldes aus dem Eingangs-Dokumentendatenstrom und zum Abspeichern dessen Inhaltes in die strukturierte Datendatei in dem korrespondierenden Datenfeld.The Computer program product according to the invention to create a rule set for the above procedure includes a graphical user interface with multiple windows, wherein in a window, a template document can be displayed, the the format of the documents contained in the input document data stream corresponds, and in another window, the data fields in one Tree structure, which may include multiple levels, are arrangeable, and Means for defining source data fields and linking them with the data fields, with such a link automatically a rule is created to read out a source data field the input document data stream and storing its contents into the structured data file in the corresponding data field.

Dieses Computerprogramm-Produkt stellt dem Benutzer auf der grafischen Benutzeroberfläche zumindest zwei Fenster zur Verfügung, wobei in einem Fenster das Vorlagendokument dargestellt ist und im anderen Fenster der Benutzer die Datenfelder in einer Baumstruktur darstellen kann. Der Benutzer kann hierbei die Baumstruktur selbst erstellen. Es kann aber auch eine bereits vorhandene Struktur übernommen werden und insbesondere eine Struktur aus mehreren vorgegebenen Vorlagenstrukturen ausgewählt werden kann. Die Quelldatenfelder im Vorlagedokument sind mit einfachen Mitteln mit den strukturierten Datenfeldern verknüpfbar, wobei jeweils automatisch eine Regel erstellt wird.This Computer program product puts the user on the graphic user interface at least two windows available wherein in a window, the template document is shown and in the other window the user will see the data fields in a tree structure can represent. The user can do this the tree structure itself create. However, it is also possible to adopt an already existing structure and in particular a structure of several predetermined Template structures selected can be. The source data fields in the template document are simple Means with the structured data fields linkable, wherein each time a rule is automatically created.

Dieses Computerprogramm-Produkt erlaubt somit eine schnelle und einfache Erstellung eines Regelsatzes zum Umsetzen eines Eingangs-Dokumentendatenstromes in eine erfindungsgemäß strukturierte Datendatei.This Computer program product thus allows a quick and easy Creation of a rule set for converting an input document data stream in a structured according to the invention Data file.

Eine Baumstruktur im Sinne der vorliegenden Erfindung ist jede Struktur, bei welcher ein oder mehrere Datenfelder jeweils einem Oberbegriff untergeordnet werden können. Diese Oberbegriffe sind wiederum weiteren Oberbegriffen unterordbar. Eine solche Baumstruktur umfasst somit Zweige, wobei an den Verzweigungspunkten der Zweige jeweils Oberbegriffe angeordnet sind und die Endpunkte der Zweige durch Datenfelder dargestellt sind. Eine solche Datenstruktur kann mehrere Verzweigungsebenen umfassen, wobei in jeder Ebene Datenfelder angeordnet sein können.A Tree structure in the sense of the present invention is any structure, in which one or more data fields each have a generic term can be subordinated. These generic terms are in turn subordinate to other generic terms. Such a tree structure thus comprises branches, wherein at the branch points the branches are arranged in each case generic terms and the endpoints the branches are represented by data fields. Such a data structure may include multiple branch levels, with data fields in each level can be arranged.

Die Erfindung wird nachfolgend beispielhaft anhand der beigefügten Zeichnungen näher erläutert. Die Zeichnungen zeigen in:The The invention will now be described by way of example with reference to the accompanying drawings explained in more detail. The painting show in:

1 ein Hochleistungsdrucksystem, 1 a high performance printing system,

2 schematisch die Zuordnung von Quelldatenbereichen und Quelldatenfeldern in einem Eingangsdokument auf Oberbegriffe und Datenfelder in der Baumstruktur, 2 schematically the assignment of source data areas and source data fields in an input document to generic terms and data fields in the tree structure,

3 schematisch Daten eines Eingangsdokumentes, die zum Detektieren eines Seitentypes geeignet sind, 3 schematically data of an input document, which are suitable for detecting a page type,

4 schematisch Daten eines Eingangsdokumentes, die zum Detektieren von Dokumentengrenzen geeignet sind, 4 schematically data of an input document, which are suitable for detecting document boundaries,

5 zu extrahierende Daten eines Eingangsdokumentes, die innerhalb von Quelldatenbereichen und auch außerhalb von Quelldatenbereichen angeordnet sein können, 5 data of an input document to be extracted, which can be arranged within source data areas and also outside of source data areas,

6 schematisch ein Eingangsdokument, in dem bei absoluter Adressierung von Quelldatenfeldern möglicherweise auftretende Probleme dargestellt sind, 6 schematically an input document in which problems that may occur in the case of absolute addressing of source data fields are represented,

7 ein Eingangsdokument, bei dem bestimmte Quelldatenbereiche mittels Anfangspositionselementen adressiert sind, 7 an input document in which certain source data areas are addressed by means of initial position elements,

8 ein Ausschnitt eines Ausgangsdokuments, 8th a section of a source document,

9 ein Ausschnitt des Eingangsdokuments der Datei „Lieferschein.txt", nämlich die Seiten 1, 2 und 6 bis 8. 9 a section of the input document of the file "Lieferschein.txt", namely pages 1, 2 and 6 to 8.

In 1 ist ein Dokumenten-Druckproduktionssystem 1 gezeigt, das zum einen eine Main-Frame-Architektur 2 umfasst und zum anderen eine Netzwerk-Architektur 5, in denen jeweils Dokumentendaten bzw. Dokumentendruckdatenströme mittels Anwenderprogrammen (Tools) erzeugt werden. In der Main-Frame-Architektur 2 werden diese Druckdaten von einem Host-Computer zum Beispiel als Zeilendruckdatenstrom (ASCII-Line Data), erzeugt. Vom Host-Computer 3 können die Druckdaten wahlweise über einen sog. S/370-Kanal 14a direkt an ein oder mehrere Druckgeräte 6a, 6b übertragen werden. Alternativ zu diesem Ausgabekanal können die Druckdaten auch vom Host-Computer 3 über ein Netzwerk 13 oder eine direkte Datenverbindung 14b zu einem Bearbeitungscomputer 4 übertragen werden, in dem die Druckdaten zwischengespeichert (z.B. einem zugehörigen File Server) und für nachfolgende Ausgabeschritte bearbeitet werden. In derartigen Host-Computern 3 werden insbesondere Druckdatenströme erzeugt, die aus größeren Datenbeständen (Datenbanken) regelmäßig Listen-Ausdrucke, Rechnungen, Verbrauchsübersichten (für Telefonrechnungen, Gasrechnungen, Bankkonten) etc. zusammenstellen. Derartigen Anwendungen sind häufig bereits seit vielen Jahren im Einsatz und werden nach wie vor in mehr oder weniger unveränderter Weise benötigt (sog. Legacy-Anwendungen).In 1 is a document printing production system 1 On the one hand, this is a main frame architecture 2 and on the other hand a network architecture 5 in which document data or document print data streams are generated by means of user programs (tools). In the main frame architecture 2 For example, this print data is generated by a host computer as a line print data stream (ASCII line data). From the host computer 3 The print data can optionally be sent via a so-called S / 370 channel 14a directly to one or more pressure devices 6a . 6b be transmitted. As an alternative to this output channel, the print data can also be sent from the host computer 3 over a network 13 or a direct data connection 14b to a processing computer 4 be transferred in which the print data is cached (eg an associated file server) and processed for subsequent output steps. In such host computers 3 In particular, print data streams are generated which regularly compile list printouts, invoices, consumption overviews (for telephone bills, gas bills, bank accounts) from larger databases (databases). Such applications have often been in use for many years and are still needed in more or less unchanged ways (so-called legacy applications).

Innerhalb der Main-Frame-Architektur 2 wird der Druckproduktionsablauf von einem Überwachungssystem 7 überwacht. Es umfasst einen Überwachungscomputer 7a, der mit einer Datenbank 7b gekoppelt ist und verschiedene Computerprogrammmodule 7c enthält.Within the main frame architecture 2 The print production process is performed by a monitoring system 7 supervised. It includes a monitoring computer 7a that with a database 7b is coupled and various computer program modules 7c contains.

Das Überwachungssystem 7 ist über ein Gerätesteuerungsnetzwerk 15 und ein Print Manager-Modul 8 mit dem Host-Computer 3 verbunden, sowie über einen Konverter 9 mit z.B. einer V24-Datenleitung, die an die beiden Druckgeräte 6a, 6b ankoppeln. Der Konverter 9 setzt die V24-Signale in DMT-Protokollsignale des Gerätesteuerungsnetzwerkes 15 um. SNMP-Protokollsignale können einem Device-Manager DM als DMT-Protokollsignale umgesetzt bereit gestellt werden bzw. direkt als SNMP-Protokollsignale übergeben werden.The monitoring system 7 is via a device control network 15 and a Print Manager module 8th with the host computer 3 connected, as well as via a converter 9 with eg a V24 data line connected to the two pressure devices 6a . 6b Docking. The converter 9 sets the V24 signals in DMT protocol signals of the device control network 15 around. SNMP protocol signals can be made available to a Device Manager DM as DMT protocol signals or transferred directly as SNMP protocol signals.

Druckgut 19, das in den Druckern 6a, 6b aus dem Dokumenten-Druckdatenstrom erzeugt wurde und auf dem Barcods aufgedruckt sind, kann jeweils mit einem manuell bewegbaren, funkgesteuerten Barcodleser 11a abgescannt werden. Signale werden per Funk an die Lesestation 10a übertragen und in das Gerätesteuerungsnetzwerk 15 bzw. an das Überwachungssystem 7 übermittelt.printed matter 19 that in the printers 6a . 6b is generated from the document print data stream and printed on the barcode, each with a manually movable, radio-controlled barcode reader 11a be scanned. Signals are sent by radio to the reading station 10a transferred and into the device control network 15 or to the monitoring system 7 transmitted.

In der Netzwerk-Architektur 5 werden Dokumentendaten mittels Anwenderprogrammen in Client-Computern 12, 12a erzeugt, die über ein Client-Netzwerk 13 untereinander sowie mit dem Bearbeitungscomputer (File-Server) 4 verbunden sind. Der File-Server dient damit als zentrale Verarbeitungs- und Bearbeitungsschnittstelle für Druckdaten des gesamten Druckproduktionssystems 1. Auf ihm laufen diverse Steuerungsmodule (Softwareprogramme), durch die der gesamte Druckproduktionsablauf bzw. die gesamte Dokumentenbearbeitung anwendungsspezifisch, produktionstechnisch und gerätesteuerungsseitig an die jeweiligen Gegebenheiten optimal angepasst wird. Aus der WO 2004/040432 ist es bekannt, dass am File-Server insbesondere folgenden Funktionen ausgeführt werden:

– converting, indexing, sorting Einfügen von Steuerungsinformationen
– Datenreduktion
– Extraktion zur Erzeugung eines komprimierten Datenstroms, insbesondere zur Überwachung der beteiligten Geräte in Echtzeit,
– Wiederholungsdruck (reprint)

In the network architecture 5 become document data by means of user programs in client computers 12 . 12a generated over a client network 13 with each other as well as with the processing computer (file server) 4 are connected. The file server thus serves as a central processing and editing interface for print data of the entire print production system 1 , Various control modules (software programs) run on it, through which the entire print production process or the entire document processing can be optimally adapted to the respective conditions in an application-specific, production-technical and device-control-oriented manner. From WO 2004/040432 it is known that in particular the following functions are performed on the file server:

- converting, indexing, sorting Inserting control information
- Data reduction
Extraction for generating a compressed data stream, in particular for monitoring the involved devices in real time,
- repeat print (reprint)

In der WO 2004/040432 sind diese Funktionen näher erläutert. Es wird deshalb auf die WO 2004/040432 voll inhaltlich Bezug genommen. Diese Patentanmeldung wird in die vorliegende Patentanmeldung inkorporiert.In WO 2004/040432 these functions are explained in more detail. It is therefore up WO 2004/040432 fully incorporated by reference. This patent application is incorporated in the present patent application.

Druckdaten, die vom Bearbeitungscomputer 4 fertiggestellt wurden, werden über die Druckdatenleitung 14c an einen Druckserver 16 geleitet. Dessen Aufgabe ist es im wesentlichen, den Bearbeitungscomputer 4 zu entlasten. Der Druckserver 16 weist einen Bildschirm 16a auf. Der Druckserver 16 ist in erster Linie aus Gründen der Performance (Geschwindigkeit) im Gesamtsystem integriert. Bei Systemen, deren Druckgeschwindigkeit weniger groß ist, kann auf den Druckserver 16 auch verzichtet werden.Print data from the editing computer 4 have been completed are via the print data line 14c to a print server 16 directed. Its task is essentially the processing computer 4 to relieve. The print server 16 has a screen 16a on. The print server 16 is integrated primarily in the overall system for reasons of performance (speed). For systems whose print speed is less large, the print server can 16 also be waived.

Die gedruckten Dokumente werden auf ihrem Verarbeitungsweg zwischen dem Druckgerät 6 und einem Nachbearbeitungsgerät 18 hinsichtlich verschiedener Kriterien mit einem Testsystem 17 getestet, nämlich durch ein optisches Testsystem hinsichtlich ihrer optischen Druckqualität, mit einem Barcode-Testsystem hinsichtlich ihres Vorhandenseins, ihrer Konsistenz und/oder ihrer Reihenfolge sowie mit einem MICR-Testsystem, sofern der Druck mittels magnetisch lesbarem Toner (magnetic ink character recognition toner) gedruckt wurde. Die vom Testsystem 17 gelieferten Daten werden von einem seriellen Datenerfassungsmodul an das Gerätesteuerungsnetz 15 übermittelt und dem Überwachungssystem 7 zugeführt.The printed documents become on their processing path between the printing device 6 and a post-processing device 18 in terms of various criteria with a test system 17 tested, namely by an optical test system with regard to their optical print quality, with a barcode test system with regard to their presence, their consistency and / or their sequence, and with an MICR test system, if the printing has been printed by means of magnetic ink character recognition toner. The test system 17 The data supplied is from a serial data acquisition module to the device control network 15 transmitted and the monitoring system 7 fed.

Das erfindungsgemäße Verfahren zum Umwandeln eines Eingangs-Dokumentendatenstroms mit einem oder mehreren Dokumenten in eine strukturierte Datendatei zur Erzeugung eines Ausgangs-Dokumentendatenstroms kann auf dem Host-Computer 3 ausgeführt werden, an dem der Eingangs-Dokumentendatenstrom erzeugt wird. Zweckmäßiger ist es jedoch, das erfindungsgemäße Verfahren zum Umwandeln eines Eingangs-Dokumentendatenstroms in eine strukturierte Datendatei an einem dem Host-Computer nachgeschalteten Rechner, wie z.B. dem File-Server 4 oder dem Drucksserver 16 auszuführen, da hierdurch in das bisherige System, das eine große Menge sensitiver Daten verarbeitet, nicht eingegriffen werden muss.The inventive method for converting an input document data stream with one or more documents into a structured data file to generate an output document data stream may be performed on the host computer 3 at which the input document data stream is generated. However, it is more expedient, the inventive method for converting an input document data stream into a structured data file on a computer downstream of the host computer, such as the file server 4 or the print server 16 Since this does not require intervention in the previous system, which processes a large amount of sensitive data.

Mit dem erfindungsgemäßen Verfahren wird ein Eingangs-Dokumentendatenstrom mit einem oder mehreren Dokumenten in eine strukturierte Datendatei zur Erzeugung eines Ausgangs-Dokumentendatenstroms umgewandelt. Eine aus einem Eingangs-Dokumentendatenstrom erzeugte strukturierte Datendatei ist in der deutschen Patentanmeldung 10 2004 021.269.4 beschrieben, die den Titel „Verfahren, Vorrichtung und Computerprogramm zum Erzeugen eines seiten- und/oder bereichsstrukturierten Datenstroms aus einem Zeilendatenstrom" trägt. Auf diese Patentanmeldung wird vollinhaltlich Bezug genommen und sie wird in die vorliegenden Patentanmeldung inkorporiert.With the method according to the invention becomes an input document data stream with one or more documents in a structured data file for generating an output document data stream transformed. A structured one generated from an input document data stream Data file is in the German patent application 10 2004 021.269.4 described the title "Procedure, Device and computer program for generating a page and / or area-structured data stream from a line data stream " this patent application is incorporated by reference in its entirety is incorporated in the present patent application.

2 zeigt einen Ausschnitt eines Vorlagedokumentes 20 sowie einen Ausschnitt einer Baumstruktur 21 mit Datenfeldern 22. 2 shows a part of a template document 20 as well as a detail of a tree structure 21 with data fields 22 ,

Das Vorlagedokument 20 ist ein Dokument, das so formatiert ist wie die Dokumente eines zu bearbeitenden Eingangs-Dokumentendatenstroms.The template document 20 is a document that is formatted as the documents of an input document data stream to be processed.

Dieses Vorlagedokument 20 und der entsprechende Eingangs-Dokumentendatenstrom stellen einen Zeilendatenstrom dar, der auch Zeilendaten basierter Druckdatenstrom genannt wird. Ein solcher Zeilendatenstrom umfasst lediglich Zeichen, die mittels einer oder mehrerer Zeichentabellen (Code pages) codiert (ASCII, EBCDIC, Unicode, DBCS, ...) sind und Zeilenumbrüche und Seitenumbrüche umfassen. Sie können auch noch weitere Formatierungselemente umfassen. Derartige Zeilendatenströme sind im digitalen Druckbereich vielfach verbreitet und insbesondere als Advanced Function Presentation (AFP Line data stream), der von der International Machine Corporation (IBM) entwickelt wurde oder als Line coded data stream (LCDS), der von der Xerox Corporation entwickelt wurde, ausgebildet. Die Zeilen- und Seitenumbrüche können durch eine bestimmte Zeichenfolge am Zeilenende bzw. Seitenende, Steuerzeichen am Zeilen- bzw. Seitenanfang oder durch eine fest definierte Zeichenzahl innerhalb einer Zeile bzw. Zeilenzahl innerhalb einer Seite festgelegt sein.This template document 20 and the corresponding input document data stream represent a row data stream, also called row data based print data stream. Such a line data stream includes only characters encoded by one or more code pages (ASCII, EBCDIC, Unicode, DBCS, ...) and including line breaks and page breaks. You can also include other formatting elements. Such line data streams are widely used in the digital printing industry and, in particular, as Advanced Function Presentation (AFP Line Data Stream) developed by International Machine Corporation (IBM) or as Line Coded Data Stream (LCDS) developed by Xerox Corporation. educated. The line breaks and page breaks can be defined by a specific string at the end of a line or at the bottom of the page, control characters at the top of the line or by a defined number of characters within a line or number of lines within a page.

Für das vorliegende Ausführungsbeispiel ist wesentlich, dass die Formatierung, d.h. die Anordnung der einzelnen Zeichen im Dokument, lediglich durch die Position des einzelnen Zeichens in einer Zeile, Zeilenumbrüche und Seitenumbrüche bestimmt wird. Bei derartigen Dokumenten wird eine nicht proportionale Schrift verwendet, wie z.B. Courier, bei der der Mittenabstand zweier benachbarter Zeichen immer identisch unabhängig von der Art des jeweiligen Zeichens ist.For the present embodiment it is essential that the formatting, i. the arrangement of the individual Character in the document, only by the position of the individual Character in one line, line breaks and page breaks becomes. Such documents become a non-proportional font used, such as Courier, where the center distance of two adjacent Characters always identically independent of the type of the respective character.

Die Baumstruktur 21 ist eine vom Benutzer editierbare Datei, die zumindest für ein Dokument (hier: „Invoice") alle Datenfelder 22 in einer strukturierten Anordnung enthält. Die Datei Baumstruktur dient als Vorlage zum Erzeugen einer strukturierten Datendatei. Dies bedeutet, dass in der Datei Baumstruktur keine aus dem Eingangs-Dokumentendatenstrom extrahierte Daten gespeichert werden, sondern in der strukturierten Datendatei werden die aus dem Eingangs-Dokumentendatenstrom extrahierten Daten in der gleichen Struktur wie in der Datei Baumstruktur gespeichert, wobei den extrahierten Daten die Bezeichnung des korrespondierenden Datenfeldes der Baumstruktur als Typ-Deklaration zugeordnet wird.The tree structure 21 is a user-editable file containing at least one document (here: "Invoice") all data fields 22 in a structured arrangement. The file tree structure serves as a template for creating a structured data file. This means that in the tree file file, no data extracted from the inbound document data stream is stored, but in the structured data file, the data extracted from the inbound document data stream is stored in the same structure as in the tree file file, with the extracted data being the label of the corresponding data field is assigned to the tree structure as a type declaration.

Die Baumstruktur des vorliegenden Ausführungsbeispiels ist zunächst in zwei Äste unterteilt, die mit „Value" bzw. „Count" bezeichnet sind. Der Zweig „Count" enthält lediglich ein einziges Datenfeld, das als „Count" bezeichnet wird und in dem in der strukturierten Datendatei die Nummer des Dokumentes innerhalb eines Eingangs-Dokumentendatenstroms abgespeichert wird. So ist es möglich, dass in einer strukturierten Datendatei Daten mehrerer Dokumente strukturiert abgespeichert werden können. Im Zweig „Value" sind die Datenfelder enthalten, in die die aus dem Eingangs-Dokumentendatenstrom zu extrahierenden Daten geschrieben werden. Eine Reihe von Datenfeldern 22/I sind in der Baumstruktur unmittelbar unter dem Oberbegriff „Value" angeordnet. Diese Datenfelder 22/I dienen zum Abspeichern eines Datums des Eingangs-Dokumentendatenstroms, das in jedem Dokument lediglich ein einziges Mal auftritt. In dem vorliegenden Beispiel lautet der Name der Lieferadresse im Vorlagedokument 20 „Music Box Ltd", der auf das Datenfeld „DeliveryAddrCustomerName" abgebildet wird, d.h. dass in der strukturierten Datendatei dieser Name der Lieferadresse an der entsprechenden Stelle gespeichert und mit der Typ-Deklaration „DeliveryAddrCustomerName" versehen wird.The tree structure of the present embodiment is first divided into two branches labeled "Value" and "Count", respectively. The branch "Count" contains only a single data field, which is called "Count" and in which the number of the document within an input data stream is stored in the structured data file. So it is possible that in a structured data file data of multiple documents can be stored structured. The "Value" branch contains the data fields into which the data to be extracted from the input document data stream is stored be written. A series of data fields 22 / I are arranged in the tree structure just below the generic term "Value." These data fields 22 / I serve to store a date of the input document data stream that occurs only once in each document. In this example, the name of the delivery address is in the template document 20 "Music Box Ltd", which is mapped to the "DeliveryAddrCustomerName" data field, ie that in the structured data file this name of the delivery address is stored in the appropriate place and provided with the type declaration "DeliveryAddrCustomerName".

In der Strukturebene, die die Datenfelder 22/I enthält ist ein weiterer Zweig enthalten, der als „Items" bezeichnet ist. Dieser Zweig ist wiederum in einen Zweig „Value" und in einen Zweig „Count" verzweigt. Diese Zweige dienen dazu, Gruppen von Datenfeldern 22/II zu strukturieren, auf die mehrfach Daten eines einzelnen Dokumentes abgebildet werden. So ist im vorliegenden Beispiel das Dokument eine Rechnung, in der mehrere abzurechnende Gegenstände (items) aufgeführt sind, für welche jeweils die Datenmenge, Codenummer, Beschreibung, Einzelpreis und Wert im Dokument enthalten sind. Für einen jeden solchen Gegenstand muss in der strukturierten Datendatei ein entsprechender Satz Datenfelder erzeugt werden, in den die jeweiligen Werte abgespeichert werden. Die Anzahl dieser Sätze von Datenfeldern wird im Datenfeld „Count" abgespeichert, das dem Oberbegriff „Items" untergeordnet ist.In the structural level, the data fields 22 / I contains another branch called "Items." This branch is in turn branched into a "Value" branch and a "Count" branch, which are used to group groups of data fields 22 / II to structure, on the multiple data of a single document are mapped. Thus, in the present example, the document is an invoice listing several billable items for each of which the dataset, code number, description, unit price, and value are included in the document. For each such object, a corresponding set of data fields must be generated in the structured data file, in which the respective values are stored. The number of these sets of data fields is stored in the "Count" data field, which is subordinate to the generic term "Items".

Mit dem erfindungsgemäßen Verfahren werden aus dem Eingangs-Dokumentendatenstrom Daten gemäß einem vorbestimmten Regelsatz extrahiert und in die strukturierte Datendatei gespeichert, wobei der Regelsatz derart ausgebildet ist, dass beliebige Daten aus dem Eingangs-Dokumentendatenstrom auf ein beliebiges Datenfeld der strukturierten Datendatei abbildbar sind.With the method according to the invention are from the input document data stream Data according to one extracted predetermined rule set and into the structured data file stored, wherein the rule set is designed such that any Data from the input document data stream to any data field the structured data file can be mapped.

Zum Erzeugen eines solchen Regelsatzes werden Mittel bereit gestellt, mit welchen im Vorlagedokument Quelldatenfelder 23 und Quelldatenbereiche 24 definiert werden können. In 2 sind zur Vereinfachung der Darstellung lediglich zwei Quelldatenfelder 23 und ein Quelldatenbereich 24 gezeigt.To generate such a rule set means are provided with which source data fields in the template document 23 and source data areas 24 can be defined. In 2 are only two source data fields to simplify the presentation 23 and a source data area 24 shown.

Der Inhalt der Quelldatenfelder 23 wird auf die Datenfelder 22 abgebildet und Quelldatenbereiche 24 können – müssen aber nicht – Oberbegriffen in der Baumstruktur entsprechen. Aber für jeden Oberbegriff der Baumstruktur auf dessen Datenfelder 22/II Daten mehrfach abgebildet werden, muss im Vorlagedokument ein entsprechender Quelldatenbereich 24 vorgesehen sein, der dann im tatsächlichen Dokument eines Eingangs-Dokumentendatenstroms ein- oder mehrfach zur Abbildung der Daten verwendet wird.The content of the source data fields 23 is on the data fields 22 mapped and source data areas 24 can - but does not have to - correspond to generic terms in the tree structure. But for each generic term of the tree structure on its data fields 22 / II Data must be mapped multiple times, in the template document a corresponding source data area 24 be provided, which is then used in the actual document of an input document data stream one or more times to map the data.

Wird im Eingangs-Dokumentendatenstrom innerhalb eines Dokumentes ein Quelldatenbereich 24 mehrfach erkannt, so wird in der strukturierten Datendatei entsprechend oft ein Datensatz mit den entsprechenden Datenfeldern erzeugt. Der diesen Quelldatenbereich definierende Regelsatz wird somit mehrfach auf das jeweilige Dokument angewandt, um Daten zu extrahieren und in der strukturierten Datendatei abzuspeichern.Becomes a source data area in the inbound document data stream within a document 24 Recognized multiple times, so in the structured data file correspondingly often creates a record with the corresponding data fields. The set of rules defining this source data area is thus applied multiple times to the respective document in order to extract data and store it in the structured data file.

Die Quelldatenfelder 23 und die Quelldatenbereiche 24 werden im Vorlagedokument beispielsweise durch Markieren der entsprechenden Zeichenfolge bzw. des entsprechenden Bereiches definiert. Dieses Markieren kann durch das Aufziehen von Kästchen, wie es in 2 gezeigt ist, mittels einer Computermaus erfolgen. Das Markieren dieser Quelldatenfelder 23 bzw. Quelldatenbereiche 24 kann auch, wie es von Textverarbeitungsprogrammen bekannt ist, durch Markieren der entsprechenden Zeichen im Vorlagedokument mittels Drücken einer vorbestimmten Taste und Betätigen einer entsprechenden Pfeil-Taste einer Tastatur erfolgen. Für die Erfindung ist wesentlich, dass ein Benutzer beliebige Zeichenfolgen als Quelldatenfelder 23 im Vorlagedokument markieren kann und Bereiche, die ein oder mehr Quelldatenfelder 23 enthalten, als Quelldatenbereiche 24 markieren kann.The source data fields 23 and the source data areas 24 are defined in the template document, for example by marking the corresponding string or the corresponding area. This marking can be done by pulling up small boxes as shown in 2 is shown done by means of a computer mouse. The marking of these source data fields 23 or source data areas 24 may also be, as known from word processing programs, by marking the corresponding characters in the template document by pressing a predetermined key and pressing a corresponding arrow key on a keyboard. It is essential for the invention that a user selects arbitrary strings as source data fields 23 In the template document, you can mark and areas that contain one or more source data fields 23 contained as source data areas 24 can mark.

Beim Erzeugen der Regeln erfolgt die Zuordnung der Quelldatenfelder 23 auf die korrespondierenden Datenfelder 22 beispielsweise durch aufeinanderfolgendes Anklicken eines Quelldatenfeldes 22 und eines korrespondierenden Datenfeldes 22 mit der Computermaus. Eine solche Zuordnung kann selbstverständlich auch über die Tastatur eingegeben werden.When the rules are created, the source data fields are assigned 23 to the corresponding data fields 22 for example, by successively clicking on a source data field 22 and a corresponding data field 22 with the computer mouse. Of course, such an assignment can also be entered via the keyboard.

Das erfindungsgemäße Verfahren arbeitet Seitenweise, d.h. das zum Umwandeln einer bestimmten Seite jeweils ein bestimmter Regelsatz herangezogen werden muss. Damit die Auswahl des jeweiligen Regelsatzes automatisch erfolgen kann, sind bei der Erzeugung desselben eine oder mehrere Bedingungen anzugeben, die einen bestimmten Regelsatz jeweils einer bestimmten Seite eines Dokumentes zuordnen. 3 zeigt zwei Seiten eines Vorlagedokumentes, die in ihrer Kopfzeile jeweils den Begriff „Invoice" enthalten, wobei ein paar Zeilen unterhalb die Seitennummer und die Gesamtzeilenzahl jeweils getrennt durch einen „/" angeordnet sind. Diese Elemente stellen Seitentypfelder 25 dar, die wie die Quelldatenfelder 23 im Vorlagedokument vom Benutzer definierbar sind. Gibt es z.B. für Rechnungen drei Regelsätze, einen für die erste Seite, einen für die letzte Seite und einen für weitere Seiten, so würden die Bedingungen für die erste Seite lauten, enthält die Seite in einem Seitentypfeld 25 das Datum „Invoice" und in einem weiteren Seitentypfeld die Seitennummer „1", so verwende den Regelsatz für die erste Seite. Die Bedingungen für die letzte Seite würden lauten, dass eines der Seitentypfelder 25 das Datum „Invoice" enthalten muss und dass in einem weiteren Seitentypfeld 25 die Seitenzahl und die Gesamtseitenzahl enthalten sind und wenn beide gleich sind handelt es sich um die letzte Seite, so dass der korrespondierende Regelsatz herangezogen werden muss.The method according to the invention operates on a page-by-page basis, that is to say that a specific set of rules must be used to convert a specific page. In order for the selection of the respective rule set to take place automatically, one or more conditions must be specified when generating the same, each of which assigns a specific rule set to a specific page of a document. 3 shows two pages of a template document, each containing the term "Invoice" in its header, with a few lines below the page number and the total number of lines separated by a "/" are orders. These elements represent page type fields 25 which are like the source data fields 23 in the template document are user definable. For example, if there are three rule sets for invoices, one for the first page, one for the last page, and one for more pages, the conditions for the first page would be to include the page in a page type field 25 the date "Invoice" and in another page type field the page number "1", then use the rule set for the first page. The conditions for the last page would be one of the page type fields 25 the date must contain "Invoice" and that in another page type field 25 the page number and the total page number are included and if both are the same it is the last page, so the corresponding rule set must be used.

Mit dem erfindungsgemäßen Verfahren und System werden Mittel zum Eingeben derartiger Bedingungen bereitgestellt. Diese Mittel umfassen ein Fenster auf der graphischen Benutzeroberfläche, in dem Inhalte von Seitentypfeldern 25 mittels logischer Verknüpfung verknüpfbar sind. Ist das logische Ergebnis der Verknüpfung „wahr" so bedeutet dies, dass dieser Regelsatz für die jeweilige Seite heranzuziehen ist. Die Mittel zum Eingeben der Bedingungen umfassen vorzugsweise auch typische logische Verknüpfungsstrukturen wie z.B. den Vergleich der Seitenzahl mit der Gesamtseitenzahl, wobei diesen Verknüpfungsstrukturen dann lediglich nur noch die entsprechenden Seitentypfelder 25 zuzuordnen sind, die alleine oder in Verbindung mit weiteren logischen Verknüpfungen einsetzbar sind.The method and system of the invention provide means for inputting such conditions. These means include a window on the graphical user interface, in which contents of page type fields 25 can be linked by logical linkage. If the logical result of the link is "true", this means that this rule set is to be used for the respective page. <br/> [0048] The means for entering the conditions preferably also include typical logical link structures, such as the comparison of the page number with the total page number only the corresponding page type fields 25 are assigned, which can be used alone or in conjunction with other logical links.

Da ein Eingangs-Dokumentendatenstrom mehrere Dokumente enthalten kann und eine strukturierte Datendatei für ein jedes Dokument einen vollständigen Satz Datenfelder enthalten soll, ist es zweckmäßig, den Anfang und das Ende eines jeden Dokumentes zu ermitteln, damit bei der Umwandlung automatisch der Anfang und das Ende eines Dokumentes erkannt werden.There an input document data stream may contain multiple documents and a structured data file for each document complete Set data fields, it is appropriate to the beginning and the end of each document, thus in the conversion automatically the beginning and the end of a document are recognized.

Hierzu werden Dokumentenbegrenzungsfelder 26 definiert (4) und Dokumentenbegrenzungsbedingungen eingegeben. Die Dokumentenbegrenzungsfelder sind typischerweise typische Elemente eines Briefkopfes, Seitenzahlen, oder Abschlusselemente bei Rechnungen oder dergleichen. Die Dokumentenbegrenzungsfelder 26 können die gleichen Daten wie die Seitentypfelder 25 oder die Quelldatenfelder 23 betreffen. Von diesen weiteren Feldern unterscheiden sie sich dadurch, dass sie in Bedingungen zum Bestimmen des Anfangs bzw. des Endes eines Dokumentes verwendet werden. Diese Bedingungen können gleichermaßen eingegeben werden wie die Bedingungen zum Bestimmen des Seitentyps.For this purpose, document delimitation fields 26 Are defined ( 4 ) and document boundary conditions. The document delimitation fields are typically typical elements of a letterhead, page numbers, or completion items in invoices or the like. The document delimiter fields 26 can have the same data as the page type fields 25 or the source data fields 23 affect. Of these other fields, they differ in that they are used in conditions for determining the beginning and the end of a document, respectively. These conditions can be entered as well as the conditions for determining the page type.

Innerhalb eines Eingangs-Dokumentendatenstroms können auch unterschiedliche Dokumententypen, wie z.B. Mahnungen, Lieferscheine, Rechnungen, etc., enthalten sein. Die Regelsätze der einzelnen Dokumententypen können derart ausgebildet sein, dass für einen jeden Dokumententyp eine separate strukturierte Datendatei erzeugt wird. Die Daten unterschiedlicher Dokumententypen können auch in einer gemeinsamen strukturierten Datendatei abgespeichert werden.Within An input document data stream can also be different Document types, such as Reminders, delivery notes, invoices, etc., be included. The rulesets of each document type be designed such that for Each document type has a separate structured data file is produced. The data of different document types can also stored in a common structured data file.

Die Quelldatenfelder können in Zeilendatenströmen grundsätzlich absolut adressiert werden, d.h. z.B. mittels der Zeilennummer, der Zeichennummer innerhalb der jeweiligen Zeile, und der Länge, d.h. der Anzahl der Zeichen. Eine derartige Adressierung ist einfach festlegbar und wird vom System automatisch übernommen, sobald ein Quelldatenfeld im Vorlagedokument definiert wird.The Source data fields can in line data streams in principle be addressed absolutely, i. e.g. by means of the line number, the Character number within the respective line, and the length, i. the number of characters. Such addressing is easy can be specified and is automatically adopted by the system as soon as a source data field is defined in the template document.

5 zeigt mehrere Quelldatenfelder 23, wobei auch zwei Quelldatenfelder 23/III dargestellt sind, die für eine solche absolute Adressierung nicht geeignet sind. 5 shows several source data fields 23 , where also two source data fields 23 / III are shown, which are not suitable for such an absolute addressing.

6 zeigt ein weiteres Dokument des Dokumententyps aus 5, bei dem jedoch die Quelldatenfelder 23/III bezüglich der Daten, die sie auf die Datenfelder abbilden sollen, versetzt angeordnet sind. Dies beruht darauf, dass der Ort bestimmter Daten von im Dokument enthaltenen vorhergehenden Daten abhängig ist. So hat sich in 6 z. B. die Angabe der Summe („Subtotal") gegenüber dem Dokument aus 5 verschoben, da weniger Gegenstände in dieser Rechnung enthalten sind, als dies im Vorlagedokument der Fall war. 6 shows another document of the document type 5 in which, however, the source data fields 23 / III with respect to the data that they are to map to the data fields, are arranged offset. This is because the location of certain data is dependent on previous data contained in the document. So has become in 6 z. For example, specify the sum (subtotal) to the document 5 because fewer items are included in this bill than was the case in the original document.

Zur Beseitigung dieses Problems werden Quelldatenbereiche 24 definiert, die jeweils ein Positionselement 27 enthalten, dessen Ort relativ definiert ist. Dieses Positionselement 27 ist typischerweise aber nicht notwendigerweise ein Quelldatenfeld 23. Bei dem in 7 gezeigten Vorlagedokument ist für die einzelnen Gegenstände der Rechnung jeweils ein Quelldatenbereich 24 definiert. Innerhalb eines solchen Quelldatenbereichs 24 ist der erste Posten die Anzahl der entsprechenden Gegenstände, der immer eine ganze Zahl ist. Es kann deshalb eine Bedingung eingegeben werden, gemäß der der Quelldatenbereich 24 positioniert wird, die im vorliegenden Beispiel eine Zeichenfolge sucht, indem eine ganze Zahl gesucht wird und im Bereich der Zeichen 4 bis 8 einer Zeile angeordnet ist. Wird eine solche Zeichenfolge im Dokument gefunden, so wird dieser Quelldatenbereich 24 entsprechend positioniert. Innerhalb des Quelldatenbereichs sind die einzelnen Quelldatenfelder 23 absolut adressiert. Bei diesem Beispiel ist die Anzahl der Gegenstände nicht fest vorgegeben. Daher ist es möglich, dass dieser Quelldatenbereich 24 unterschiedlich oft anzuwenden ist. Es handelt sich hiermit somit um einen wiederholt anwendbaren Quelldatenbereich 24. Dies ist in der Bedingung entsprechend festzulegen.To eliminate this problem, source data areas 24 defined, each one positional item 27 whose location is relatively defined. This position element 27 is typically but not necessarily a source data field 23 , At the in 7 The template document shown is a source data area for each item of the invoice 24 Are defined. Within such a source data area 24 the first item is the number of corresponding items, which is always an integer. Therefore, a condition may be entered according to which the source data area 24 which, in the present example, searches for a string by searching an integer and in the range of the characters 4 to 8th a line is arranged. If such a string is found in the document, it will become Source Data 24 positioned accordingly. Within the source data area are the individual source data fields 23 absolutely addressed. In this example, the number of items is not fixed. Therefore it is possible that this source data area 24 is applied differently. It is thus a reusable source data area 24 , This must be specified accordingly in the condition.

Bei diesem Beispiel sind noch zwei weitere Quelldatenbereiche 24/II und 24/III aufgeführt, die relativ adressiert sind. Die Bedingung zum Auffinden des Quelldatenbereichs 24/II lautet:
Wird an einer beliebigen Stelle auf der aktuell bearbeiteten Seite die Zeichenfolge „Subtotal" gefunden, so stellt sie ein Positionselement des Quelldatenbereichs 24/II dar, der die Zeile, in dem diese Zeichenfolge enthalten sowie alle weiteren Zeilen bis zur fünfzigsten Zeile umfasst.In this example, there are two more source data areas 24 / II and 24 / III listed, which are relatively addressed. The condition for finding the source data area 24 / II is:
If the string "Subtotal" is found anywhere on the currently edited page, it represents a position element of the source data area 24 / II which includes the line containing that string and all other lines up to the fiftieth line.

Die Bedingung für den Quelldatenbereich 24/III lautet: Wird innerhalb des Quelldatenbereichs 24/II eine Zeichenfolge im Bereich des einundsechzigsten bis siebenundsechzigsten Zeichens einer Zeile gefunden, so umfasst der Quelldatenbereich 24/III diese Zeile und alle weiteren hierauf folgenden Zeilen innerhalb des Quelldatenbereichs 24/II. Die weiteren Quelldatenfelder 23 sind innerhalb der Quelldatenbereiche 24 adressiert. Die Adressierung kann sich auf einen beliebigen Bezugspunkt, wie z.B. die erste oder letzte Zeile innerhalb des Quelldatenbereichs 24 beziehen.The condition for the source data area 24 / III is: Within the source data area 24 / II If a string is found in the range of the sixty-first to sixty-seventh characters of a line, then the source data area comprises 24 / III this line and all subsequent lines within the source data area 24 / II , The other source data fields 23 are within the source data areas 24 addressed. Addressing may be at any reference point, such as the first or last line within the source data area 24 Respectively.

Die Quelldatenbereiche 24/II und 24/III treten innerhalb eines Dokuments lediglich einmal auf, was bei der Erstellung der entsprechenden Bedingung zum Positionieren der Quelldatenbereiche 24 berücksichtigt werden kann.The source data areas 24 / II and 24 / III occur within a document only once, what when creating the appropriate condition for positioning the source data areas 24 can be taken into account.

Die mit einem solchen Regelsatz erstellbare strukturierte Datendatei enthält Daten, die z.B. wie in der Deutschen Patentanmeldung 10 2004 021.269.4 in 12 dargestellt sind und seiten- und bereichsstrukturiert sind. Erfindungsgemäß können Quelldatenfelder 23 an beliebiger Stelle im Vorlagedokument beliebigen korrespondierenden Datenfeldern 22 in der Baumstruktur 21 zugeordnet werden.The structured data file that can be created with such a rule set contains data which, for example, as described in German Patent Application 10 2004 021.269.4 in US Pat 12 are displayed and are structured page and area. According to the invention, source data fields 23 at any point in the template document any corresponding data fields 22 in the tree structure 21 be assigned.

Diese strukturierte Datendatei bildet somit eine Datenbank, deren Inhalt einfach und mit üblichen Mitteln ausgelesen und in beliebige Layouts bzw. Formulare eingetragen werden können. Die so erzeugten Ausgangsdokumente können beliebig formatiert sein und enthalten die im ursprünglichen Zeilendatenstrom aufgeführten Daten. Ein Ausschnitt eines solchen Ausgangsdokuments ist in 8 gezeigt.This structured data file thus forms a database whose content can be read out easily and with conventional means and entered into any layouts or forms. The output documents thus generated can be formatted as desired and contain the data listed in the original line data stream. A section of such a source document is in 8th shown.

Nachfolgend werden die Regeln und Bedingungen zum Extrahieren der Daten des ausschnittsweise in 9 gezeigten Dokumentes „Lieferschein" beispielhaft erläutert. Die einzelnen Regeln und Bedingungen sind in einer Anlage aufgeführt.The following are the rules and conditions for extracting the data of the clipping in 9 The individual rules and conditions are listed in an attachment.

Am Ende der Anlage ist auf Seite 10 die Baumstruktur der Abbildungselemente zum Extrahieren der Daten aus dem Dokument „Lieferschein" aufgeführt. Auf Seite 11 der Anlage ist die Baumstruktur, die als Vorlage zum Erzeugen der strukturierten Datendatei dient und der in 2 gezeigten Baumstruktur entspricht, dargestellt.At the end of the appendix, the tree structure of the mapping elements for extracting the data from the document "Delivery note" is listed on page 10. On page 11 of the attachment is the tree structure which serves as a template for generating the structured data file and which is in 2 corresponds to the tree structure shown.

Die Baumstruktur der Abbildungselemente enthält die Quelldatenfelder und Quelldatenbereiche, gemäß welcher Daten aus den Dokumenten extrahiert werden.The Tree structure of the mapping elements contains the source data fields and Source data areas, according to which Data is extracted from the documents.

Die Bedingungen und Regeln sind entsprechend der Baumstruktur der Abbildungselemente geordnet. Zunächst sind, (Seite 1 der Anlage) die Strukturelemente und Eigenschaften definiert, die im gesamten Dokument gelten, d.h., die sich auf das Abbildungselement „Dokument" beziehen.The Conditions and rules are according to the tree structure of the mapping elements orderly. First are, (page 1 of the attachment) the structural elements and properties defined throughout the document, that is, referring to the Obtain imaging element "document".

Die Strukturelemente umfassen Wiederhol-Quelldatenbereiche, Seitentypen und Steuerelemente. Als Steuerelemente werden alle Daten und sonstige Informationen bezeichnet, die bei Bedingungen logisch verknüpft werden können. Steuerelemente sind insbesondere Seitentypenfelder, Dokumentenbegrenzungsfelder und Positionselemente, die jeweils ein Datum im Dokument definieren, sowie Zeilennummern bestimmter Zeilen. Im vorliegenden Ausführungsbeispiel sind zwei Seitentypen „Lieferschein erste Seite" und „Lieferschein Folgeseite" definiert, für die jeweils ein separater Regelsatz angegeben ist. Ferner ist ein Wiederhol-Quelldatenbereich „Tabelle" definiert, der mehrfach im Dokument auftreten kann, wobei dies hier unabhängig vom Seitentyp ist, da er auf beiden Seitentypen jeweils mit dem dort definierten Quelldatenbereich „Tabellenbereich" verknüpft ist. Ein solcher Wiederhol-Quelldatenbereich enthält Quelldatenfelder und/oder Quelldatenbereiche. Er enthält jedoch keine Elemente zur eigenen Positionierung. Die Positionierung erfolgt über die mit ihm verknüpften Quelldatenbereiche (hier: „Tabellenbereich").The Structures include repeating source data ranges, page types and controls. Controls are all data and others Information refers to which are logically linked under conditions can. In particular, controls are page type fields, document delimiter fields, and Position items that each define a date in the document and line numbers of specific lines. In the present embodiment are two page types "delivery note first page "and" Delivery note following page "defines, for each a separate rule set is specified. Furthermore, a repeat source data area "table" is defined, which is multiple times in the document, and this is independent of the page type is because it is defined on both page types with the one there Source data area "Table space" is linked. Such a repeat source data area contains source data fields and / or Source data areas. He contains but no elements for your own positioning. The positioning over the ones associated with it Source data areas (here: "Table area").

Als Eigenschaften des Dokumentes sind der Zeichencode für den Zeilenumbruch, der Zeichencode für den Seitenumbruch und die Zeichentabelle sowie eine Bedienungsliste zur Erkennung von Seitentypen definiert. Der Seitentyp „Lieferschein erste Seite" wird anhand der Bedingung erkannt, dass ein Seitentypenfeld 1 26/1 (Zeile 2 der aktuell bearbeiteten Seite, Zeichen 66–88) die Zeichenfolge „L i e- f e r s c h e i n" und ein Seitentypenfeld 2 26/2 (Zeile 87 der aktuell bearbeiteten Seite, Zeichen 83–84) das Zeichen „1" enthält. Die Seitentypenfelder 26/1 und 26/2 sind auf Seite 1 und 2 von 9 eingezeichnet. Aus Gründen der Übersichtlichkeit sind in 9 lediglich eine kleine Auswahl aller Steuerelemente und Quelldatenbereiche dargestellt.Characteristics of the document are the line break character code, the page break character code, the character table, and a page list operator list. The page type "delivery note first page" is recognized by the condition that a page type field 1 1.26 (Line 2 of the page currently being edited, characters 66-88) the string "Leaflet" and a page type field 2 2.26 (Line 87 of the currently edited page, characters 83-84) contains the character "1." The page type fields 1.26 and 2.26 are on page 1 and 2 of 9 located. For clarity, in 9 only a small selection of all controls and source data areas shown.

Die Bedingung zur Erkennung des Seitentyps „Lieferschein Folgeseite" lautet, dass das Seitentypenfeld 1 26/1 die Zeichenfolge „L i e f e r s c h e i n" enthält und das Seitentypenfeld 2 ungleich „1" ist.The condition for recognizing the page type "delivery note following page" is that the page type field 1 1.26 contains the string "delivery ticket" and the page type field 2 is not equal to "1".

Die Definition der Seitentypen umfasst wieder Strukturelemente und Eigenschaften. Die Strukturelemente weisen wiederum Quelldatenbereiche, Quelldatenfelder und Steuerelemente auf. Zur ersten Seite sind drei Quelldatenbereiche „Versender" 24/1, „Versandanschrift" 24/2 und der Quelldatenbereich „Tabellenbereich" 24/3 enthalten. dieser ist mit dem im „Dokument" enthaltenen Wiederhol-Quelldatenbereich „Tabelle" verknüpft. Ferner sind eine Reihe von Quelldatenfelder, die in keinem dieser Quelldatenbereiche angeordnet sind, mittels absoluter Adressierung definiert.The definition of page types again includes structure elements and properties. The structure elements in turn have source data areas, source data fields and control elements. On the first page are three source data areas "sender" 1.24 , "Shipping address" 2.24 and the source data area "table space" 3.24 contain. this is linked to the repeat source data area "Table" contained in the "Document". Furthermore, a number of source data fields that are not located in any of these source data areas are defined by absolute addressing.

Beispielhaft sind hier die Quelldatenfelder „Kundennummer" 23/1, „Bestellnummer" 23/2, „Auftragsnummer" 23/3 und „Tel/Faxnummer" 23/4 aufgeführt. Diese Quelldatenfelder sind durch Angaben der Zeilennummern und durch Angabe der Zeichen, die sie innerhalb der jeweiligen Zeile umfassen, eindeutig definiert.Exemplary here are the source data fields "customer number" 1.23 , "Order number" 2.23 , "Order number" 3.23 and "Tel / fax number" 4.23 listed. These source data fields are uniquely defined by specifying the line numbers and specifying the characters that they include within each line.

Unter den Eigenschaften dieses Seitentyps sind Bedingungen zur Positionierung der Quelldatenbereiche und eine Bedingung zur Erkennung der Dokumentgrenze angegeben. In diesem Ausführungsbeispiel sind die Quelldatenbereiche alle absolut durch die Zeilennummer der ersten Zeile des Quelldatenbereiches positioniert, nämlich in den Zeilen 3, 9 bzw. 43. Im Rahmen der Erfindung ist es selbstverständlich auch möglich, die Position der Quelldatenbereiche auch relativ, beispielsweise durch Detektion einer Zeichenfolge, zu positionieren.Under The properties of this page type are conditions for positioning the source data areas and a document boundary detection condition specified. In this embodiment the source data areas are all absolutely by the line number positioned in the first line of the source data area, namely in lines 3, 9 and 43. In the context of the invention it goes without saying possible, the position of the source data areas also relative, for example by detecting a string, to position.

Das Ende eines Dokumentes wird dann detektiert, wenn ein Dokumentenbegrenzungsfeld 25/1, das unmittelbar anschließend an eine Seitennummer (Seitentypenfeld 26/2) angeordnet ist, das Zeichen „–„ enthält. Dies ist im vorliegenden Ausführungsbeispiel auf Seite 1 nicht der Fall, daher umfasst das Dokument mehrere Seiten.The end of a document is then detected when a document bounding box 1.25 immediately following a page number (page type field 2.26 ) containing the character "-". This is not the case in the present embodiment on page 1, therefore the document comprises several pages.

Die Definition der Folgeseiten ist ähnlich wie die Definition der ersten Seite ausgebildet, wobei sich die Folgeseiten dadurch unterscheiden, dass sie lediglich einen einzigen Quelldatenbereich, nämlich den „Tabellenbereich" 24/3 umfassen.The definition of the following pages is similar to the definition of the first page, whereby the following pages differ in that they only have one single source data area, namely the "table area". 3.24 include.

Die Wiederhol-Quelldatenbereiche sind auf Seite 4 der Anlage definiert. Im vorliegenden Anwendungsbeispiel gibt es lediglich einen Wiederhol-Quelldatenbereich „Tabelle". Dieser ist mit dem Quelldatenbereich „Tabellenbereich" verknüpft und umfasst drei Quelldatenbereiche „Anlieferung" 24/4, „Versandhinweise" 24/5 und „Lieferposten" 24/6. Dies zeigt die sehr vorteilhafte Eigenschaft des erfindungsgemäßen Verfahrens, dass mehrere Quelldatenbereiche ineinander verschachtelt anordbar sind, wobei die Positionierung eines Quelldatenbereiches, der innerhalb eines weiteren Quelldatenbereiches angeordnet ist, bezüglich des weiteren Quelldatenbereiches erfolgt, d.h. dass die Zeilennummerierung im weiteren Quelldatenbereich für die hierin angeordneten Quelldatenbereiche mit der Zahl „1" beginnt. So kann die Positionierung der Quelldatenbereiche innerhalb eines „übergeordneten Quelldatenbereiches" unabhängig vom Inhalt des Dokumentes außerhalb des übergeordneten Quelldatenbereiches erfolgen.The repeat source data ranges are defined on page 4 of the attachment. In the present application example, there is only one repeat source data area "Table", which is linked to the source data area "Table area" and contains three source data areas "Inbound". 24.4 , "Shipping Information" 5.24 and "Delivery Items" 6.24 , This shows the very advantageous property of the method according to the invention that several source data areas can be arranged in one another with the positioning of a source data area which is arranged within a further source data area with respect to the further source data area, ie the line numbering in the further source data area for the source data areas arranged therein Starting with the number "1", the positioning of the source data areas within a "higher-level source data area" can take place outside of the higher-level source data area, regardless of the content of the document.

Im Wiederhol-Quelldatenbereich „Tabelle" wird das Vorhandensein der einzelnen Quelldatenbereiche „Anlieferung" 24/4, „Versandhinweise" 24/5 und „Lieferposten" 24/6 anhand der Detektion bestimmter Zeichenfolgen wie „Anlieferung", „Anzahl" bzw. eines ganzen Zahlwertes in den Positionselementen 1 bis 3 detektiert.In the repeat source data area "Table" the existence of the individual source data areas "Delivery" 24.4 , "Shipping Information" 5.24 and "Delivery Items" 6.24 Detected on the basis of the detection of certain strings such as "delivery", "number" or an integer value in the position elements 1 to 3.

Nachfolgend wird kurz die Definition der einzelnen Quelldatenbereiche erläutert.following briefly explains the definition of the individual source data areas.

Der Quelldatenbereich „Versender" 24/1 beinhaltet vier Quelldatenfelder 23/5 bis 23/8, die innerhalb des Quelldatenbereichs „Versender" absolut adressiert sind. Ferner ist die Bedingung zur Erkennung des Quelldatenbereichsende dadurch definiert, dass die Zeilennummer gleich „4" ist. Dies bedeutet, dass der Quelldatenbereich „Versender" vier Zeilen umfasst. In ähnlicher Weise ist auch der Quelldatenbereich „Versandanschrift" 24/2 definiert, der jedoch sieben Zeilen umfasst.The source data area "Shipper" 1.24 includes four source data fields 5.23 to 8.23 Furthermore, the condition for recognizing the source data area end is defined by the line number being equal to "4". This means that the source data area "sender" comprises four lines, similarly, the source data area is "shipping address". 2.24 defined, but which includes seven lines.

Ein Quelldatenbereich „Tabellenbereich" 24/3 ist mit dem Wiederhol-Quelldatenbereich „Tabelle" verknüpft und enthält die Bedingung zur Erkennung des Quelldatenbereichsende.A source data area "table space" 3.24 is associated with the repeating source data area "Table" and contains the condition for detecting the source data area end.

Der Quelldatenbereich „Anlieferung" 24/4 umfasst lediglich eine einzige Zeile, hier nämlich die erste Zeile des Quelldatenbereiches „Tabellenbereich" 24/3 mit zwei Quelldatenfeldern „Lieferdatum" 23/9 und „Lieferuhrzeit" 23/10.The source data area "delivery" 24.4 includes only a single line, here the first line of the source data area "table area" 3.24 with two source data fields "delivery date" 9.23 and "delivery time" 23/10 ,

Der Quelldatenbereich „Versandhinweis" enthält eine Reihe von Quelldatenfeldern, in welchen beispielhaft auf der letzten Seite in 9 als Felddatenfeld „Anzahl Packstücke" 23/11 und das Quelldatenfeld „Auftragsabwicklung" 23/12 eingezeichnet sind.The source data area "shipping notice" contains a number of source data fields, in which, for example, on the last page in FIG 9 as field data field "Number of packages" 11.23 and the source data field "order processing" 23/12 are drawn.

Der Quelldatenbereich „Lieferposten" 24/6 umfasst weitere Quelldatenbereiche „Postenbeschreibung" 24/8 und „Unterposten" 24/9. Im Quelldatenbereich „Lieferposten" ist eine Bedingungsliste zur Erkennung der beinhalteten Quelldatenbereiche „Postenbeschreibung" 24/8 und „Unterposten" 24/9 aufgeführt. Der Quelldatenbereich „Postenbeschreibung" 24/8 beginnt in der zweiten Zeile des übergeordneten Quelldatenbereiches „Lieferposten" 24/6. Der Quelldatenbereich „Postenbeschreibung" ist somit absolut adressiert. Der Quelldatenbereich „Unterposten" 24/9 ist relativ adressiert, wobei das Positionselement 27/1 mit dem Positionselement 27/2 verglichen wird und bei Übereinstimmung festgestellt wird, dass der Quelldatenbereich „Unterposten" 24/9 vorliegt. Die Erkennung dieser Quelldatenbereiche definiert auch den Anfang dieser Quelldatenbereiche.The source data area "Delivery Item" 6.24 includes further source data areas "item description" 8.24 and "sub items" 9.24 , In the source data area "Delivery Item" is a condition list for the recognition of the included source data areas "Item Description" 8.24 and "sub items" 9.24 listed. The source data area "Item description" 8.24 starts in the second line of the higher-level source data area "Delivery Item" 6.24 , The source data area "Item description" is therefore absolutely addressed. 9.24 is relatively addressed, with the position element 1.27 with the position element 2.27 is compared and if it is found that the source data area is "sub-item" 9.24 is present. The detection of these source data areas also defines the beginning of these source data areas.

Weiterhin ist eine Bedingung zur Erkennung des Endes des Quelldatenbereiches „Lieferposten" angegeben, mit welchen das Ende durch Detektion eines weiteren Lieferpostens oder durch Detektion des Tabellenendes erkannt wird.Farther a condition is specified for detecting the end of the source data area "Delivery Item" with which the end by detection of another delivery item or by Detection of the end of the table is detected.

Weiterhin sind die Quelldatenbereiche „Postenbeschreibung" 24/8 und „Unterposten" 24/9 näher definiert, wobei der Quelldatenbereich „Unterposten" einen weiteren Quelldatenbereich „Unterpostenbeschreibung" 24/10 beinhaltet.Furthermore, the source data areas "item description" 8.24 and "sub items" 9.24 further defined, wherein the source data area "sub-item" another source data area "sub-item description" 24/10 includes.

Das obige Ausführungsbeispiel zeigt, wie die Quelldatenfelder 23 in einem Eingangsdokument mittels absoluter und relativer Adressierung, die auch mittels der Quelldatenbereiche beliebig kombiniert und verschachtelt werden kann, positioniert werden, um die in dem Eingangsdokument enthaltenen Daten zu extrahieren. Diese extrahierten Daten werden automatisch in einer strukturierten Datendatei entsprechend der auf Seite 11 der Anlage dargestellten Baumstruktur abgespeichert.The above embodiment shows how the source data fields 23 in an input document by means of absolute and relative addressing, which can also be arbitrarily combined and interleaved by means of the source data areas, positioned to extract the data contained in the input document. This extracted data is automatically stored in a structured data file according to the tree structure shown on page 11 of the attachment.

Das oben gezeigte Ausführungsbeispiel zeigt die Regelsätze für die beiden Seitentypen und die Bedingungen zum Detektieren der Dokumenten- bzw. Seitengrenzen. Die grundsätzliche Struktur zur Definition der einzelnen Elemente, wie Dokument, Seitentyp und Quelldatenbereich umfassen Quelldatenbereiche, Quelldatenfelder und Steuerelemente. Lediglich das Element „Dokument" enthält die Definition von Wiederhol-Quelldatenbereiche, Seitentypen und Definitionen zu grundsätzlichen Eigenschaften des Dokumentes. Im Rahmen der vorliegenden Erfindung können die Seitentypen auch als Quelldatenbereiche betrachtet werden, da sie mit der gleichen Struktur definiert werden, wie der eigentliche Quelldatenbereich.The Embodiment shown above shows the rulesets for the both page types and the conditions for detecting the document or side boundaries. The fundamental Structure for defining the individual elements, such as document, page type and source data area include source data areas, source data fields and controls. Only the element "document" contains the definition of repeat source data areas, Page types and definitions of basic properties of the Document. In the context of the present invention, the Page types are also considered as source data areas, as they are be defined with the same structure as the actual one Source data area.

Weiterhin zeigt das obige Ausführungsbeispiel, dass bestimmten Typen von Quelldatenbereichen, wie zum Beispiel dem Quelldatenbereich „Tabelle" bestimmte weitere Quelldatenbereiche, wie z. B. die Quelldatenbereiche „Anlieferung", „Versandhinweise" und „Lieferposten" derart zugeordnet sind, dass die weiteren Quelldatenbereiche nur im übergeordneten Quelldatenbereich (hier „Tabelle") auftreten.Farther shows the above embodiment, that certain types of source data areas, such as the source data area "table" certain others Source data areas, such. For example, the source data areas "delivery", "shipping information" and "delivery items" are assigned in this way are that the other source data areas only in the parent Source data area (here "table") occur.

Beim Extrahieren der Daten wird mittels eines Quelldatenbereichszeigers erfasst, aus welchen Quelldatenbereiche aktuell Daten extrahiert werden. Dieser Zeiger entspricht somit auch einer Anzeige der Ebene der Baumstruktur der Abbildungselemente (Seite 10 der Anlage). Der größte Quellbereich entspricht hierbei dem gesamten Dokument. Am Ende einer Seite wird der Quelldatenbereichszeiger derart verändert, dass er auf das gesamte Dokument zeigt. Falls ein Quelldatenbereich, der mit einem Wiederhol-Quelldatenbereich verknüpft ist und sich somit über ein Seitenende hinaus auf eine nachfolgende Seite erstrecken kann, d.h., dass dieser Quelldatenbereich sich über das Seitenende hinaus auf eine Nachfolgeseite erstreckt, wird in einem zusätzlichen Seitenwechselzeiger der Wert des Quelldatenbereichszeigers gespeichert, mit welchem dieser auf diesen Quelldatenbereich gezeigt hat. Beim Abarbeiten der Folgeseite wird bei Erreichen dieses Quelldatenbereiches, d.h., dass der Quelldatenbereichszeiger wieder den gleichen Wert, wie der Seitenwechselzeiger annimmt, der entsprechende Datensatz in der strukturierten Datendatei ergänzt und kein neuer Datensatz für diesen Quelldatenbereich angefangen.At the Extracting the data is done by means of a source data area pointer records from which source data areas data is currently extracted become. This pointer thus also corresponds to an indication of the level of Tree structure of the imaging elements (page 10 of the appendix). The largest source area corresponds to the entire document. At the end of a page will be the source data area pointer is changed so that it affects the entire Document shows. If a source data area associated with a repeating source data area and thus over a footer can extend to a subsequent page, that is, this source data area extends beyond the bottom of the page extends a successor page, is in an additional page change pointer the value of the source data area pointer is stored, with which this has pointed to this source data area. When processing The following page will be displayed upon reaching this source data area, i. that the source data area pointer returns to the same value as the paging pointer assumes the corresponding record in the structured data file added and no new record for this Source data area started.

Die Erfindung ist oben anhand eines Beispieles näher erläutert, bei welchem sich die Quelldatenbereiche immer über die gesamte Seitenbreite erstrecken. Im Rahmen der Erfindung ist es jedoch auch möglich, Quelldatenbereiche zu definieren, die sich lediglich über einen Teil einer oder mehrerer aufeinanderfolgender Zeilen erstrecken. Diese Quelldatenbereiche bilden somit Spalten im jeweiligen Dokument, wobei mehrere derartige spaltenförmige Quelldatenbereiche nebeneinander angeordnet werden können. Diese spaltenförmigen Quelldatenbereiche sind vor allem zum Auslesen von Tabellen geeignet.The The invention is explained above with reference to an example in which the Source data areas always over extend the entire page width. Within the scope of the invention however, it is also possible to source data areas to define only a part of one or more extending successive lines. These source data areas thus form columns in each document, with several such columnar Source data areas can be arranged side by side. These columnar Source data areas are particularly suitable for reading tables.

Nachfolgend wird die Erfindung kurz zusammengefasst:
Mit dem erfindungsgemäßen Verfahren werden Quelldatenfelder im Eingangs-Dokumentendatenstrom zum Auslesen von zu extrahierenden Daten automatisch positioniert, wobei deren Positionierung mittels absoluter oder relativer Adressierung erfolgt. Insbesondere können die Quelldatenfelder mittels Quelldatenbereiche positioniert werden, mit welchen Abschnitte der einzelnen Dokumente erfasst werden. Diese Quelldatenbereiche können verschachtelt angeordnet sein und selbst wiederum absolut oder relativ positioniert werden.The invention is briefly summarized below:
With the method according to the invention, source data fields are automatically positioned in the input document data stream for reading out data to be extracted, their positioning taking place by means of absolute or relative addressing. In particular, the source data fields can be positioned by means of source data areas with which sections of the individual documents are acquired. These source data areas can be nested and in turn be positioned absolutely or relatively.

Die entsprechenden Regeln können einfach durch Markieren der entsprechenden Quelldatenbereiche und Quelldatenfelder in einem Vorlagedokument erstellt werden.The appropriate rules can simply by marking the appropriate source data areas and Source data fields are created in a template document.

11: Dokumenten-ProduktionssystemDocument production system
22: Main-Frame-ArchitekturMainframe architecture
33: Host-ComputerHost computer
44: Bearbeitungscomputer (File-Server)processing computer (File server)
55: NetzwerkarchitekturNetwork architecture
6a, 6b6a, 6b: Druckgerätprinting device
77: Überwachungssystemmonitoring system
7a7a: Überwachungscomputermonitoring computer
7b7b: DatenbankDatabase
7c7c: ComputerprogrammmodulComputer program module
88th: Print Manager-ModulPrint Manager module
99: Konverterconverter
10a, 10b10a, 10b: Lesestationreading station
11a, 11b11a, 11b: Barcodleserbarcode reader
12, 12a12 12a: Client Computerclient computer
1313: Client Netzwerkclient network
14a, b, c, d14a, b, c, d: DruckdatenleitungPressure data line
1515: GerätesteuerungsnetzwerkDevice Control Network
1616: Druckserverprint server
16a16a: Bildschirmscreen
1717: Testsystemtest system
1818: Nachbearbeitungsgerätepost-processing equipment
1919: Druckgutprinted matter
2020: Vorlagedokumentstyle file
2121: BaumstrukturThreaded
2222: Datenfelddata field
2323: QuelldatenfeldSource data field
2424: QuelldatenbereichSource Data
2525: SeitentypfeldPage type field
2626: DokumentenbegrenzungsfeldDocuments bounding box
2727: Positionselementposition member

Strukturierung der Datei „Lieferschein.txt"Structuring the file "Lieferschein.txt"

• Dokument

• Document

• Seitentypen• Page types

1. 'Lieferschein erste Seite'

1. 'delivery note first page'

2. 'Lieferschein Folgeseite'

2. 'delivery note following page'

• Wiederhol-Quelldatenbereiche• Repeat source data areas

1. 'Tabelle'

1. 'Table'

• Quelldatenbereiche• Source data areas

1 'Versender'

1 'shipper'

2. 'Versandanschrift'

2. 'Shipping address'

3. 'Tabellenbereich'

3. 'Table area'

4. 'Anlieferung'

4. 'delivery'

5. 'Versandhinweise'

5. 'Shipping instructions'

6. 'Lieferposten'

6. 'Delivery Item'

7. 'Postenbeschreibung'

7. 'Lot description'

8. 'Unterposten'

8. 'Sub-item'

9. 'Unterpostenbeschreibung'

9. 'Sub-item description'

• Baumstruktur der Abbildungselemente:

• Tree structure of the mapping elements:

Baumstruktur (21, Vorlage zum Erzeugen einer strukturierten Datendatei):

Tree structure (21, template for creating a structured data file):

Claims

A method of converting an input document data stream with one or more documents in a structured data file for generating an output document data stream, in which from the input document data stream data according to a extracted predetermined rule set and into the structured data file stored in the structured data file the individual Data fields are assigned field names and the data fields are multiple Data layers are structurable, and the rule set is designed in this way is that any data from the input document data stream can be mapped onto any data field of the structured data file are.

Method according to claim 1, characterized in that individual rules of the rule set are created by displaying a template document in a window in one window and data fields in a tree structure in another window and by marking data in the template document respectively defining a source data field and when linking such a source data field of the original document with a data field, a rule is automatically created to read a source data field from the input document data stream and its content according to the structured data file in store the corresponding data field.

Method according to claim 1 or 2, characterized that the input document data stream is divided into several documents is, where for each document in the structured data file is a structured one Record is saved.

Method according to one of claims 1 to 3, characterized that the documents can span multiple pages, with the data being page by page be extracted.

Method according to one of claims 1 to 4, characterized that the input document data stream only characters that by means of one or more character tables (ASCII, EBCDIC, Unicode, DBCS, ...), including line breaks and page breaks.

Method according to one of claims 1 to 4, characterized that the input document data stream only characters that by means of encoded in a single code page (ASCII, EBCDIC, Unicode, DBCS, ...), including line breaks and page breaks.

Method according to claim 5 or 6, characterized that the line and / or page breaks each by a specific String are encoded.

Method according to one of claims 5 to 7, characterized that the line and / or page breaks each by a specific Number of characters or lines are encoded.

Method according to one of claims 1 to 8, characterized that data is extracted from the input document data stream, those in the input document data stream in specific source fields are arranged, the source fields being represented by the line number (s) in the respective page and the character numbers in the respective or the respective Lines are defined.

Method according to one of claims 1 to 9, characterized that data is extracted from the input document data stream, those in the input document data stream in specific source fields are arranged, the source fields being represented by the line number (s) and the character numbers in the respective lines within a specific source data area in a document.

Method according to claim 10, characterized in that that at least one position element of the source data area in the document or defined in another source data area.

Method according to claim 11, characterized in that that a plurality of position elements, in particular a start and end position element, the source data area in the respective page or in another Source data area are defined.

Method according to claim 11 or 12, characterized that the position element (s) of the source data area is absolute Place by specifying the number of lines and number of characters within each Line in the respective page or in the further source data area are defined.

Method according to one of claims 11 to 13, characterized that the position element (s) of the source data area is relative Location of a specific string in the respective page or in the further source data area are defined.

Method according to claim 14, characterized in that that the string is either location independent, in a certain sense Range is arranged or at one by the row number and the Number of characters within the line (s) in the respective page or is arranged in a further source data area defined place.

Method according to one of claims 10 to 15, characterized that multiple source fields are arranged in a source data area.

Method according to one of Claims 10 to 16, characterized that multiple source data areas in another source data area are arranged.

Method according to one of claims 11 to 17, characterized that a first source data area can be defined, which corresponds to another, second source data area is assigned such that the first source data area occurs only in the second source data area.

Method according to one of claims 12 to 18, characterized that when extracting using a source data area pointer detected which source data area is currently being extracted from, where the largest source data area corresponds to the entire document, and at the end of a page the source data area pointer such that it points to the entire document, and if an area with an end condition at the bottom of the page not fully processed should be in a page break pointer on this source data area stored value, so when processing a subsequent page after processing page-typical lines, this source data area is continued until the end condition is reached.

Method according to one of Claims 11-19, characterized that within an inbound document a particular source data area is recognized multiple times, and this defining the source data area Rule set correspondingly often for extracting data and saving the data is applied in the structured data file.

Method according to one of claims 1-20, characterized that a rule set is defined by means of source data fields that in the input document data stream to the data to be extracted be positioned, the positioning by means of absolute or relative addressing takes place.

Method according to claim 21, characterized that the positioning of the source data fields by means of source data areas, in which one or more source data fields or more Source data areas are arranged takes place.

Method according to claim 22, characterized in that the source data areas as structural elements contain further source data areas, Source data fields and / or controls include, by means of logically linked Controls Conditions for detecting the document and / or Page boundaries and / or conditions for positioning source data areas are defined.

Computer program product for creating a rule set for the Method according to one of the claims 1 to 22, comprising a graphical user interface with several windows, with a template document displayed in a window This is the format of the documents contained in the inbound document data stream corresponds, and in another window, the data fields in one Tree structure, which may include multiple levels, are arrangeable, and Means for defining source data fields and linking them with the data fields, with such a link automatically a rule is created to read out a source data field the input document data stream and storing its contents into the structured data file in the corresponding data field.

Computer program product according to claim 24, characterized characterized in that the means for defining source data fields is a means for marking corresponding data.

Computer program product according to claim 24 or 25, characterized in that the means for linking Source data fields with data fields means for dragging a connection line between the respective source data field the corresponding data field.

Computer program product according to one of claims 24 to 26, characterized in that means for marking source data areas provided in the original document.

System to run of the method according to any one of claims 1 to 23, comprising a Computer, a display device and input means, being on the Computer a computer program product according to claims 23 to 26 stored and executable is.