DE202024100090U1

DE202024100090U1 - Online system for layout analysis of handwritten documents

Info

Publication number: DE202024100090U1
Application number: DE202024100090.0U
Authority: DE
Original assignee: Sunia Pte Ltd
Current assignee: Sunia Pte Ltd
Priority date: 2024-01-09
Filing date: 2024-01-09
Publication date: 2024-02-08
Anticipated expiration: 2034-01-10

Abstract

Ein Online-System zur Analyse des Layouts handschriftlicher Dokumente, wobei das System ausgebildet ist, ein Dokument in mehrere grobkörnige Objekte zu segmentieren, um zu bestimmen, ob es sich bei einem Typ jedes der grobkörnigen Objekte um einen Absatz, eine Liste, eine Tabelle, ein Diagramm oder eine Anmerkung handelt, wobei das System ausgebildet ist, jedes der grobkörnigen Objekte in eine Vielzahl feinkörniger Objekte aufzuteilen und zu bestimmen, ob es sich bei einem Typ jedes der feinkörnigen Objekte um eine Textzeile, eine Formel, eine Grundform, ein Graffiti oder einen Satz falsch geschriebener Striche handelt; wobei die grobkörnigen Objekte und die feinkörnigen Objekte eine Baumstruktur bilden; wobei das System umfasst:
einen Mainframe, der einen Prozessor und einen Speicher umfasst; wobei der Prozessor zur Ausführung erforderlicher Vorgänge des Systems dient und der Speicher zum Speichern von Daten, Programmen und zugehörigen Betriebsergebnissen des Systems dient;
eine Vorverarbeitungseinheit, die ausgebildet ist, das Dokument zu empfangen, um eine Vorverarbeitung an dem Dokument durchzuführen, wobei das Dokument ein Online-Handschriftdokument ist und aus einer Vielzahl von Strichen besteht, die zeitlich oder räumlich nahe beieinander liegen;
wobei die Vorverarbeitungseinheit ausgebildet ist, bei der Vorverarbeitung einen ungerichteten Graphen, der mehrere Knoten und mehrere Kanten enthält, zur Darstellung von Beziehungen zwischen verschiedenen Strichen des Dokuments zu erzeugen; wobei jeder der Knoten einem jeweiligen Strich entspricht, der eine gerichtete Folge ist, die aus den Punkten des Strichs gemäß einer Schreibreihenfolge besteht, wobei jede der Kanten einem Paar jeweiliger Striche entspricht, die zeitlich oder räumlich nahe beieinander liegen, wobei das Paar jeweiliger Striche, die zeitlich oder räumlich nahe beieinander liegen, jeweils einer Kante entspricht; wobei jeder der Striche mit sich selbst außerdem einen Satz bildet, um jeweils eine Kante in Form einer Schleife zu bilden, wobei eine Schleife eine Kante ist, die einen Strich mit sich selbst verbindet;
wobei das System ausgebildet ist, im ungerichteten Diagramm davon auszugehen, dass jeder der Striche zeitlich nahe an den meisten N_T-Strichen liegt, die danach geschrieben werden; für jeden der Striche einen nächstgelegenen Punktabstand zwischen dem Strich und den anderen Strichen zu berechnen, um die meisten N_S -Striche zu bestimmen, die dem Strich räumlich am nächsten liegen, wobei jeder N_T und N_S jeweils ein vorgegebener Wert ist;
eine bidirektionale rekursive neuronale Netzwerkeinheit, die mit der Vorverarbeitungseinheit verbunden ist, um einen Merkmalsvektor jedes Knotens im ungerichteten Graphen zu initialisieren und einen Merkmalsvektor jeder Kante im ungerichteten Graphen unter Verwendung rekursiver neuronaler Netzwerke (RNN) zu initialisieren; wobei das System ausgebildet ist, in der bidirektionalen rekursiven neuronalen Netzwerkeinheit den Merkmalsvektor jeder der Kanten mit Nullwerten zu initialisieren;
eine grafische neuronale Netzwerkeinheit, die mit der bidirektionalen rekursiven neuronalen Netzwerkeinheit verbunden ist, wobei die grafische neuronale Netzwerkeinheit ausgebildet ist, den Merkmalsvektor jedes Knotens und den Merkmalsvektor jeder Kante zu aktualisieren, um einen aktualisierten Merkmalsvektor jedes Knotens und einen aktualisierten Merkmalsvektor jeder Kante unter Verwendung eines grafischen neuronalen Netzwerks (GNN) zu erhalten, das auf einer Nachrichtenübermittlung basiert;
eine vollständig verbundene neuronale Netzwerkeinheit, die mit der grafischen neuronalen Netzwerkeinheit verbunden ist, wobei die vollständig verbundene neuronale Netzwerkeinheit ausgebildet ist, den Typ des grobkörnigen Objekts und des feinkörnigen Objekts für die Striche vorherzusagen, die jedem der Knoten entsprechen, und ferner vorherzusagen, ob das Strichpaar, das jeder der Kanten entspricht, zu dem demselben grobkörnigen Objekt oder demselben feinkörnigen Objekt gehört; wobei das System ausgebildet ist, bei der Vorhersage der vollständig verbundenen neuronalen Netzwerkeinheit eine grobkörnige Objektklassifizierung und eine feinkörnige Objektklassifizierung für jeden der Knoten und Kanten unter Verwendung vollständig verbundener neuronaler Netzwerke auf der Grundlage der aktualisierten Merkmalsvektoren der Knoten und der Kanten von der grafischen neuronalen Netzwerkeinheit durchzuführen;
eine Einheit zur Wiederherstellung des Dokuments, die mit der vollständig verbundenen neuronalen Netzwerkeinheit verbunden ist, für die Wiederherstellung der Baumstruktur des Dokuments, wobei die Einheit zur Wiederherstellung des Dokuments ausgebildet ist, alle Striche zu gruppieren, um die entsprechenden feinkörnigen Objekte zu erhalten, und dabei eine Analyse verbundener Komponenten gemäß den Vorhersageergebnissen der Striche zu verwenden, die zu demselben feinkörnigen Objekt in der vollständig verbundenen neuronalen Netzwerkeinheit gehören, wobei die Einheit zur Wiederherstellung des Dokuments ausgebildet ist, den Typ jedes der entsprechenden feinkörnigen Objekte durch eine Summe der Konfidenzen zu bestimmen, dass die Striche im feinkörnigen Objekt voraussichtlich zu einem bestimmten Typ des feinkörnigen Objekts zu der vollständig verbundenen neuronalen Netzwerkeinheit gehören; und
die Einheit zur Wiederherstellung des Dokuments ferner ausgebildet ist, die entsprechenden feinkörnigen Objekte zu gruppieren, um die entsprechenden grobkörnigen Objekte zu erhalten und dabei eine Analyse verbundener Komponenten gemäß den vorhergesagten Ergebnissen der Striche zu verwenden, die zu demselben grobkörnigen Objekt in der vollständig verbundenen neuronalen Netzwerkeinheit gehören; wobei die Einheit zur Wiederherstellung des Dokuments ausgebildet ist, den Typ jedes der entsprechenden grobkörnigen Objekte durch eine Summe der Konfidenzen zu bestimmen, dass die Striche im grobkörnigen Objekt voraussichtlich zu einem bestimmten Typ des grobkörnigen Objekts in der vollständig verbundenen neuronalen Netzwerkeinheit gehören.

An online system for analyzing the layout of handwritten documents, the system being configured to segment a document into a plurality of coarse-grained objects to determine whether a type of each of the coarse-grained objects is a paragraph, a list, a table, a diagram or an annotation, the system being configured to divide each of the coarse-grained objects into a plurality of fine-grained objects and to determine whether a type of each of the fine-grained objects is a line of text, a formula, a basic shape, a graffiti or is a set of misspelled dashes; wherein the coarse-grained objects and the fine-grained objects form a tree structure; wherein the system includes:
a mainframe that includes a processor and memory; wherein the processor is used to perform required operations of the system and the memory is used to store data, programs and associated operating results of the system;
a pre-processing unit configured to receive the document to perform pre-processing on the document, the document being an online handwriting document and consisting of a plurality of strokes that are close to each other in time or space;
wherein the preprocessing unit is configured to generate, in the preprocessing, an undirected graph containing a plurality of nodes and a plurality of edges for representing relationships between different strokes of the document; wherein each of the nodes corresponds to a respective stroke that is a directed sequence consisting of the points of the stroke according to a writing order, each of the edges corresponds to a pair of respective strokes that are close to each other in time or space, the pair of respective strokes, which are close together in time or space, each corresponds to an edge; each of the strokes also forming a set with itself to each form an edge in the form of a loop, wherein one Loop is an edge that connects a stroke to itself;
wherein the system is designed to assume in the undirected diagram that each of the strokes is close in time to most of the N _T strokes that are written thereafter; for each of the strokes, calculate a nearest point distance between the stroke and the other strokes to determine the most N _S strokes that are spatially closest to the stroke, each N _T and N _S each being a predetermined value;
a bidirectional recursive neural network unit connected to the preprocessing unit for initializing a feature vector of each node in the undirected graph and initializing a feature vector of each edge in the undirected graph using recursive neural networks (RNN); wherein the system is designed to initialize the feature vector of each of the edges with zero values in the bidirectional recursive neural network unit;
a graphical neural network unit connected to the bidirectional recursive neural network unit, the graphical neural network unit being configured to update the feature vector of each node and the feature vector of each edge to produce an updated feature vector of each node and an updated feature vector of each edge using a graphical neural network (GNN) based on messaging;
a fully connected neural network unit connected to the graphical neural network unit, the fully connected neural network unit being configured to predict the type of the coarse-grained object and the fine-grained object for the strokes corresponding to each of the nodes, and further predict whether the Pair of strokes corresponding to each of the edges belonging to the same coarse-grained object or the same fine-grained object; wherein the system is configured, in predicting the fully connected neural network unit, to perform a coarse-grained object classification and a fine-grained object classification for each of the nodes and edges using fully connected neural networks based on the updated feature vectors of the nodes and the to perform edges by the graphical neural network unit;
a document recovery unit, connected to the fully connected neural network unit, for recovering the tree structure of the document, the document recovery unit being adapted to group all the strokes to obtain the corresponding fine-grained objects, and thereby to use connected component analysis according to the prediction results of the strokes belonging to the same fine-grained object in the fully connected neural network unit, the document recovery unit being adapted to determine the type of each of the corresponding fine-grained objects by a sum of the confidences, that the strokes in the fine-grained object are expected to belong to a particular type of fine-grained object to the fully connected neural network unit; and
the document recovery unit is further configured to group the corresponding fine-grained objects to obtain the corresponding coarse-grained objects using connected component analysis according to the predicted results of the strokes belonging to the same coarse-grained object in the fully connected neural network unit belong; wherein the document recovery unit is configured to determine the type of each of the corresponding coarse-grained objects by a sum of the confidences that the strokes in the coarse-grained object are expected to belong to a particular type of the coarse-grained object in the fully connected neural network unit.

Description

GEBIET DER ERFINDUNGFIELD OF THE INVENTION

Die Erfindung bezieht sich auf ein Layout-Analysesystem und insbesondere auf ein Online-Layout-Analysesystem für Handschriftdokumente.The invention relates to a layout analysis system and in particular to an online layout analysis system for handwritten documents.

HINTERGRUND DER ERFINDUNGBACKGROUND OF THE INVENTION

Mithilfe der Online-Layoutanalyse von Handschriftdokumenten werden Striche in verschiedene Sätze segmentiert und wird der Inhaltstyp jedes Strichsatzes bestimmt, z. B. Tabellen, Anmerkungen usw. Es gibt bereits relativ ausgereifte Techniken zum Erkennen jeweils von Textzeilen, mathematischen Formeln und Formen. Durch die Aufteilung des handschriftlichen Inhalts in erkennbare Objekte hilft das System den aus mehreren Inhaltstypen bestehenden handschriftlichen Inhalt zu verstehen, wodurch das freie Schreiberlebnis von herkömmlichem Papier mit der Durchsuchbarkeit elektronischer Informationen kombiniert wird.Online layout analysis of handwriting documents segments strokes into different sets and determines the content type of each stroke sentence, such as: B. Tables, notes, etc. There are already relatively sophisticated techniques for recognizing lines of text, mathematical formulas and shapes. By breaking down handwritten content into recognizable objects, the system helps understand handwritten content composed of multiple content types, combining the free writing experience of traditional paper with the searchability of electronic information.

Im Stand der Technik basieren herkömmliche Layout-Analysealgorithmen für Online-Handschriftdokumente im Allgemeinen auf künstlichen neuronalen Netzen, insbesondere auf rekurrenten neuronalen Netzwerken oder grafischen neuronalen Netzwerken. Der Nachteil des Ansatzes rekurrenter neuronaler Netzwerke besteht jedoch darin, dass es schwierig ist, die zweidimensionalen räumlichen Informationen im Dokument effektiv zu nutzen, Der Nachteil des grafischen neuronalen Netzwerkansatzes besteht darin, dass er das Feature-Engineering der Informationen nicht vollständig nutzen kann.In the prior art, conventional layout analysis algorithms for online handwriting documents are generally based on artificial neural networks, particularly recurrent neural networks or graphical neural networks. However, the disadvantage of the recurrent neural network approach is that it is difficult to effectively use the two-dimensional spatial information in the document. The disadvantage of the graphical neural network approach is that it cannot fully utilize the feature engineering of the information.

Darüber hinaus können für verschiedene Layoutanalysearbeiten wie Text- oder Nichttextklassifizierung, Textzeilensegmentierung, Diagrammerkennung und -identifizierung, Tabellenerkennung und - identifizierung sowie Erkennung mathematischer Formeln usw., Die traditionellen Methoden können nur zur Analyse einiger der oben genannten Analysearbeiten verwendet werden und können nicht alle oben genannten Werke gleichzeitig zu analysieren. Der eigentliche Inhalt des Dokuments ist jedoch recht komplex, da er Tabellen, Diagramme, Texte und eine Vielzahl unterschiedlicher Inhaltstypen enthalten kann. Jede der herkömmlichen Methoden kann nur einige wenige Arten von Dokumenteninhalten identifizieren, die den Anforderungen der tatsächlichen Verwendung nicht gerecht werden.In addition, for various layout analysis work such as text or non-text classification, text line segmentation, graph detection and identification, table detection and identification, and mathematical formula recognition, etc., the traditional methods can only be used to analyze some of the above analysis work and cannot use all of the above Analyze works simultaneously. However, the actual content of the document is quite complex as it can contain tables, diagrams, text and a variety of different content types. Each of the traditional methods can only identify a few types of document content that do not meet the needs of actual use.

ZUSAMMENFASSUNG DER ERFINDUNGSUMMARY OF THE INVENTION

Um die oben genannten Mängel im Stand der Technik zu beheben, besteht die Aufgabe der Erfindung daher darin, ein Online-System zur Analyse des Layouts von Handschriftdokumenten bereitzustellen, wobei die Vorteile der Erfindung darin bestehen, dass das Online-System zur Analyse des Layouts von Handschriftdokumenten der Erfindung in der Lage ist zur Unterstützung der Segmentierung und Klassifizierung über eine Reihe von Objekten mit mehreren Granularitäten, die gleichzeitig eine Text-/Nichttextklassifizierung, Textzeilensegmentierung, Diagrammerkennung und -identifizierung, Tabellenerkennung und - identifizierung sowie die Erkennung mathematischer Formeln durchführen können. Die Erfindung ist in der Lage, verschiedene feinkörnige Objektsegmentierungen und -klassifizierungen gleichzeitig durchzuführen, Striche in mehrere grobkörnige Objekte aufzuteilen und zu bestimmen, ob es sich um einen Absatz, eine Liste, eine Tabelle, ein Diagramm oder eine Anmerkung handelt. Die Erfindung kann auch die Striche jedes grobkörnigen Objekts in mehrere feinkörnige Objekte unterteilen und bestimmen, ob es sich um eine Textzeile, eine Formel, eine Grundform, ein Graffiti oder eine Reihe falsch geschriebener Striche handelt. Daher können die Klassifizierung von Text/Nicht-Text, die Segmentierung von Textzeilen, die Erkennung und Identifizierung von Diagrammen, die Erkennung und Identifizierung von Tabellen sowie die Erkennung mathematischer Formeln alle mit dem System der Erfindung analysiert werden.In order to remedy the above-mentioned deficiencies in the prior art, the object of the invention is therefore to provide an online system for analyzing the layout of handwritten documents, the advantages of the invention being that the online system for analyzing the layout of Handwritten documents of the invention are capable of supporting segmentation and classification across a range of objects at multiple granularities that can simultaneously perform text/non-text classification, text line segmentation, graph recognition and identification, table recognition and identification, and mathematical formula recognition. The invention is capable of performing various fine-grained object segmentation and classification simultaneously, dividing strokes into multiple coarse-grained objects, and determining whether it is a paragraph, a list, a table, a chart, or an annotation. The invention can also divide the strokes of any coarse-grain object into multiple fine-grain objects and determine whether it is a line of text, a formula, a basic shape, graffiti, or a series of misspelled strokes. Therefore, text/non-text classification, text line segmentation, graph recognition and identification, table recognition and identification, and mathematical formula recognition can all be analyzed with the system of the invention.

KURZE BESCHREIBUNG DER ZEICHNUNGENBRIEF DESCRIPTION OF THE DRAWINGS

1 is a structured block diagram showing the main elements of the invention.
2 is a structured block diagram showing the elements of the invention.
3 is a structured block diagram showing the elements of the mainframe of the invention.
4 is a flowchart showing processing of the document of the invention.
5 is a schematic view showing the tree structure of the document of the invention.
6 is a schematic view showing processing of the document of the invention
7 is a schematic view showing the updating and classification of the feature vectors of the nodes and edges of the invention.

DETAILLIERTE BESCHREIBUNG EINES BEVORZUGTEN AUSFÜHRUNGSBEISPIELSDETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

Die Erfindung stellt ein Online-Layout-Analysesystem 1 für Handschriftdokumente zur Durchführung einer Segmentierung und Klassifizierung über einen Bereich von Objekten mit mehreren Granularitäten bereit. Bezugnehmend auf 5 dient das System 1 zum Segmentieren eines Dokuments 300 in mehrere grobkörnige Objekte, um zu bestimmen, ob es sich bei einem Typ jedes der grobkörnigen Objekte um einen Absatz, eine Liste, eine Tabelle, ein Diagramm oder eine Anmerkung handelt. Jedes der grobkörnigen Objekte wird in eine Vielzahl feinkörniger Objekte aufgeteilt. Das System 1 dient auch zur Bestimmung, ob es sich bei jedem der feinkörnigen Objekte um eine Textzeile, eine Formel, eine Grundform, ein Graffiti oder eine Reihe von Tippfehlern handelt. Die grobkörnigen Objekte und die feinkörnigen Objekte bilden eine Baumstruktur, die in 5 dargestellt ist.The invention provides an online handwriting document layout analysis system 1 for performing segmentation and classification over a range of objects at multiple granularities would be ready. Referring to 5 The system 1 is used to segment a document 300 into multiple coarse-grain objects to determine whether a type of each of the coarse-grain objects is a paragraph, a list, a table, a diagram, or an annotation. Each of the coarse-grained objects is divided into a multitude of fine-grained objects. System 1 is also used to determine whether each of the fine-grained objects is a line of text, a formula, a basic shape, graffiti, or a series of typos. The coarse-grained objects and the fine-grained objects form a tree structure, which is in 5 is shown.

Gemäß den 1 bis 3 umfasst das System 1 einen Mainframe 200, der einen Prozessor 2 und einen Speicher 6 (siehe 3). Der Prozessor 2 dient zur Ausführung erforderlicher Vorgänge des Systems 1. Der Speicher 6 dient zur Speicherung von Daten, Programmen undAccording to the 1 to 3 The system 1 includes a mainframe 200, which has a processor 2 and a memory 6 (see 3 ). The processor 2 is used to carry out the necessary operations of the system 1. The memory 6 is used to store data, programs and

Unter Bezugnahme auf die 1 und 2 umfasst das System 1 außerdem die folgenden Elemente.With reference to the 1 and 2 System 1 also includes the following elements.

Eine Vorverarbeitungseinheit 10 dient zur Aufnahme des Dokuments 300 zur Durchführung einer Vorverarbeitung an dem Dokument 300. Das Dokument 300 ist ein Online-Handschriftdokument und besteht aus mehreren Strichen. Die Anzahl der Striche beträgt m Striche, die zeitlich oder räumlich nahe beieinander liegen. Die m Striche werden als v₁,v₂, ···, v_m dargestellt und jeder der Striche wird als v_i dargestellt, wobei 1≤i≤m ist. Jeder Strich v_i der Striche besteht aus n_i Punkten, die als (x_i,1, y_i,1), ..., (x_i,ni, y_i,ni) dargestellt werden. Die m Striche bilden eine Menge V = {v₁, v2, ···, v_m}..A pre-processing unit 10 is used to record the document 300 in order to carry out pre-processing on the document 300. The document 300 is an online handwriting document and consists of several lines. The number of strokes is m strokes that are close to each other in time or space. The m bars are represented as v ₁ ,v ₂ , ···, v _m and each of the bars is represented as v _i where 1≤i≤m. Each line v _i of the lines consists of n _i points, which are denoted as (x _i,1 , y _i,1 ), ..., (x _i,n _i , y _i,n _i ) being represented. The m lines form a set V = {v ₁ , v2, ···, v _m }..

Bei der Vorverarbeitung der Vorverarbeitungseinheit 10 dient die Vorverarbeitungseinheit 10 dazu, einen ungerichteten Graphen G=(V,E) zur Darstellung von Beziehungen zwischen verschiedenen Strichen des Dokuments 300 (Schritt 100 in 4) zu erzeugen, wobei der ungerichtete Graph G eine Vielzahl von Knoten und eine Vielzahl von Kanten enthält, V ist die oben definierte Menge (V = {v₁, v₂, ···,v_m}) und E ist eine durch die Kanten gebildete Menge. Jeder der Knoten entspricht einem jeweiligen Strich, der eine gerichtete Folge ist, die aus den Punkten des Strichs gemäß einer Schreibreihenfolge besteht. Jede der Kanten entspricht einem Paar jeweiliger Striche, die zeitlich oder räumlich nahe beieinander liegen. Das Paar jeweiliger Striche, die zeitlich oder räumlich nahe beieinander liegen, entspricht jeweils einer Kante. Jeder der m Striche und sich selbst bilden weiterhin eine Menge, um jeweils eine Kante zu bilden, die als Schleife bezeichnet wird. Das heißt, die Schleife ist eine Kante, die einen Strich mit sich selbst verbindet.During the preprocessing of the preprocessing unit 10, the preprocessing unit 10 serves to generate an undirected graph G=(V,E) to represent relationships between different lines of the document 300 (step 100 in 4 ), where the undirected graph G contains a plurality of vertices and a plurality of edges, V is the set defined above (V = {v ₁ , v ₂ , ···,v _m }) and E is a set defined by the Quantity formed by edges. Each of the nodes corresponds to a respective stroke, which is a directed sequence consisting of the points of the stroke according to a writing order. Each of the edges corresponds to a pair of respective strokes that are close to each other in time or space. The pair of respective strokes that are close to each other in time or space each correspond to an edge. Each of the m strokes and itself continue to form a set to form an edge each, called a loop. That is, the loop is an edge that connects a stroke to itself.

Im ungerichteten Graphen G wird davon ausgegangen, dass jeder der m Striche zeitlich nahe an den meisten N_T Strichen liegt, die danach geschrieben werden. Für jeden der m Striche wird ein nächstgelegener Punktabstand d zwischen dem Strich und den anderen Strichen berechnet, um meisten Ns Striche zu bestimmen, die dem Strich räumlich am nächsten liegen. N_T und Ns sind jeweils ein vorgegebener Wert. Der kleinste Punktabstand d von jeweils zwei Strichen v_i, v_i' der m Striche wird dargestellt als $d (v_{i}, v_{i'}) = min_{j, j'} \sqrt{{(x_{i, j} - x_{i', j'})}^{2} + {(y_{i, j} - y_{i', j'})}^{2}} .$

Der kleinste Punktabstand d ist ein minimaler Abstand der beiden Striche v_i, v_i'. Das min_j,j' ist eine Funktion zum Ermitteln des Mindestabstands als nächstgelegener Punktabstand d aus den Abständen zwischen jedem Punkt des Strichs v_i und allen Punkten des Strichs v_i'.In the undirected graph G, each of the m strokes is assumed to be close in time to most of the N _T strokes written after it. For each of the m strokes, a nearest point distance d between the stroke and the other strokes is calculated to determine most of the N strokes that are spatially closest to the stroke. N _T and Ns are each a given value. The smallest point spacing d of two lines v _i , v _i' of the m lines is represented as

d (v_{i}, v_{i'}) = min_{j, j'} \sqrt{{(x_{i, j} - x_{i', j'})}^{2} + {(y_{i, j} - y_{i', j'})}^{2}} .

The smallest point distance d is a minimum distance between the two lines _vi , v _i' . The min _j,j' is a function for finding the minimum distance as the nearest point distance d from the distances between each point of the line v _i and all points of the line v _i' .

Im ungerichteten Graphen G=(V,E), E = E_T ∪ E_S. E_T = {{v_i, v_j}|1 ≤ i ≤ j ≤ min{i + N_T, m}}. N_T ist eine Menge bestehend aus den m Strichen v₁, v₂, ···, v_m und den N_T Strichen, die jedem der m Striche v₁, v₂, ···, v_m entsprechen. Jeder der Striche und die entsprechenden N_T -Striche in zeitlicher Nähe dazu sind durch eine entsprechende Kante verbunden.In the undirected graph G=(V,E), E = E _T ∪ E _S . E _T = {{ _vi , v _j }|1 ≤ i ≤ j ≤ min{i + N _T , m}}. N _T is a set consisting of the m lines v ₁ , v ₂ , ···, v _m and the N _T lines corresponding to each of the m lines v ₁ , v ₂ , ···, v _m . Each of the bars and the corresponding N _T bars in temporal proximity thereto are connected by a corresponding edge.

E_S = {{v_i, v_jk}|(j₁,···) = argsort_jd(v_i, v_j), 1 ≤ k ≤ min{m,N_S}}. E_S ist eine Menge bestehend aus den m Strichen v₁, v₂, ···, v_m und den N_S-Strichen, die jedem der m Striche v₁, v₂, ···, v_m entsprechen. Jeder der Striche und die entsprechenden N_S-Striche, die ihm räumlich am nächsten liegen, sind durch eine entsprechende Kante verbunden. argsort_j ist eine Funktion zum Sortieren von Werten von d(v_i, v_j) in absteigender Reihenfolge und zum Zurückgeben einer entsprechenden Folge von j_k (bezogen auf das NumPy-Modul in der Programmiersprache Python).E _S = {{v _i , v _j _k }|(j ₁ ,···) = argsort _j d( _vi , v _j ), 1 ≤ k ≤ min{m,N _S }}. E _S is a set consisting of the m strokes v ₁ , v ₂ , ···, v _m and the N _S strokes corresponding to each of the m strokes v ₁ , v ₂ , ···, v _m . Each of the strokes and the corresponding N _S strokes that are spatially closest to it are connected by a corresponding edge. argsort _j is a function for sorting values of d( _vi , v _j ) in descending order and returning a corresponding sequence of j _k (related to the NumPy module in the Python programming language).

Eine bidirektionale rekursive neuronale Netzwerkeinheit 500 ist mit der Vorverarbeitungseinheit 10 verbunden, um einen Merkmalsvektor jedes Knotens im ungerichteten Graphen G zu initialisieren und einen Merkmalsvektor jeder Kante im ungerichteten Graphen G zu initialisieren, indem rekursive neuronale Netzwerke (RNN) verwendet werden. Der Merkmalsvektor stellt einen Vektor, die Merkmale eines entsprechenden Absatzes, eine Liste, eine Tabelle, eines Diagramms und eine Anmerkung usw. dar. In der bidirektionalen rekursiven neuronalen Netzwerkeinheit 500 wird der Merkmalsvektor jeder Kante mit Nullwerten initialisiert.A bidirectional recursive neural network unit 500 is connected to the preprocessing unit 10 to initialize a feature vector of each node in the undirected graph G and initialize a feature vector of each edge in the undirected graph G using recursive neural networks (RNN). The feature vector represents a vector, the features of a corresponding paragraph, a list, a table, a graph and an annotation, etc. In the bidirectional recursive neural network unit 500, the feature vector of each edge is initialized with zero values.

Bezug nehmend auf 2 umfasst die bidirektionale rekursive neuronale Netzwerkeinheit 500 die folgenden Elemente:

Eine erste BLSTM-Einheit (Bidirektionaler Lang-Kurzzeitspeicher) 15 ist mit der Vorverarbeitungseinheit 10 verbunden, um den Merkmalsvektor jedes Knotens und jeder Kante des ungerichteten Graphen G zu initialisieren (Schritt 110 in 4). Da eine räumliche Positionsbeziehung zwischen einem Strichpaar v_i, v_i' aus dem Strichpaar v_i, v_i' selbst erhalten werden kann, wird der Merkmalsvektor jeder Kante auf einen Nullvektor initialisiert und wird als $E_{i, i'}^{(0)}$
dargestellt. Dem Merkmalsvektor jeder Kante entspricht ein Vektor, der dem Strichpaar v_i, v_i' entspricht.

Referring to 2 The bidirectional recursive neural network unit 500 includes the following elements:

A first BLSTM unit (bidirectional long-short-term memory) 15 is responsible for pre-processing device 10 connected to initialize the feature vector of each node and edge of the undirected graph G (step 110 in 4 ). Since a spatial positional relationship between a stroke pair _vi , vi _' can be obtained from the stroke pair _vi , vi _' itself, the feature vector of each edge is initialized to a zero vector and is called $E_{i, i'}^{(0)}$
shown. The feature vector of each edge corresponds to a vector that corresponds to the line pair _vi , vi _' .

Die erste BLSTM-Einheit 15 dient außerdem dazu, ein Punktmerkmal P zu extrahieren, das Kontextinformationen enthält. Das Punktmerkmal P wird als $P - {(p_{i, j; k})}_{i = 1, \dots, m, j = 1, \dots n_{i}; k = 1, \dots N_{F}} = BLSTM (I)$

dargestellt, wobei

\begin{array}{l} I \\ = (\begin{array}{l} x_{1,1} & \dots & x_{1, n_{1} - 1} & x_{1, n_{1}} & \dots & x_{m, 1} & \dots & x_{m, n_{m} - 1} & x_{m, n_{m}} \\ y_{1,1} & \dots & y_{1, n_{1} - 1} & y_{1, n_{1}} & \dots & t_{m, 1} & \dots & y_{m, n_{m} - 1} & y_{m, n_{m}} \\ 0 & \dots & 0 & 1 & \dots & 0 & \dots & 0 & 1 \end{array}) \end{array}

wie in 6 gezeigt.The first BLSTM unit 15 also serves to extract a point feature P that contains context information. The point feature P is given as

P - {(p_{i, j; k})}_{i = 1, \dots, m, j = 1, \dots n_{i}; k = 1, \dots N_{F}} = BLSTM (I)

shown, where

\begin{array}{l} I \\ = (\begin{array}{l} x_{1.1} & \dots & x_{1, n_{1} - 1} & x_{1, n_{1}} & \dots & x_{m, 1} & \dots & x_{m, n_{m} - 1} & x_{m, n_{m}} \\ y_{1.1} & \dots & y_{1, n_{1} - 1} & y_{1, n_{1}} & \dots & t_{m, 1} & \dots & y_{m, n_{m} - 1} & y_{m, n_{m}} \\ 0 & \dots & 0 & 1 & \dots & 0 & \dots & 0 & 1 \end{array}) \end{array}

as in 6 shown.

Eine Pooling-Einheit 20 ist mit der ersten BLSTM-Einheit 15 verbunden, um das Punktmerkmal P mithilfe eines durchschnittlichen Poolings zu einem Strichmerkmal zusammenzufassen (Schritt 120 in 4). Das Strichmerkmal wird durch Beziehungen zwischen verschiedenen Strichen gebildet. Das Strichmerkmal wird als $S = {(s_{i, k})}_{i = 1, \dots m; k = 1, \dots, N_{L}}$

dargestellt, wobei

s_{i, k} = \frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} p_{i, j; k}

(wie in 6 gezeigt) ist.A pooling unit 20 is connected to the first BLSTM unit 15 to combine the point feature P into a stroke feature using average pooling (step 120 in 4 ). The stroke feature is formed by relationships between different strokes. The stroke feature is called

S = {(s_{i, k})}_{i = 1, \dots m; k = 1, \dots, N_{L}}

shown, where

s_{i, k} = \frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} p_{i, j; k}

(as in 6 shown).

Eine zweite BLSTM-Einheit (Bidirektionaler Lang-Kurzzeitspeicher) 25 ist mit der Pooling-Einheit 20 verbunden und dient dazu, das Strichmerkmal von der Pooling-Einheit 20 zu empfangen, um die initialisierten Merkmalsvektoren der Knoten zu erhalten, die den Strichen v₁, v₂, ···, v_m entsprechen (Schritt 130 in 4). Die initialisierten Merkmalsvektoren der Knoten werden als $V_{1}^{(0)}, V_{2}^{(0)}, \dots, V_{m}^{(0)}$

dargestellt (wie in 6 gezeigt), wobei

(V_{1}^{(0)} V_{2}^{(0)} \dots V_{m}^{(0)}) = BLSTM (S)

ist.A second BLSTM (Bidirectional Long Short Term Memory) unit 25 is connected to the pooling unit 20 and is used to receive the stroke feature from the pooling unit 20 to obtain the initialized feature vectors of the nodes corresponding to the strokes v ₁ , v ₂ , ···, v _m correspond (step 130 in 4 ). The initialized feature vectors of the nodes are given as

v_{1}^{(0)}, v_{2}^{(0)}, \dots, v_{m}^{(0)}

shown (as in 6 shown), where

(v_{1}^{(0)} v_{2}^{(0)} \dots v_{m}^{(0)}) = BLSTM (S)

is.

Eine grafische neuronale Netzwerkeinheit 30 ist mit der zweiten BLSTM-Einheit 25 der bidirektionalen rekursiven neuronalen Netzwerkeinheit 500 verbunden. Die grafische neuronale Netzwerkeinheit 30 dient dazu, den Merkmalsvektor jedes der Knoten und den Merkmalsvektor jeder der Kanten zu aktualisieren, um einen aktualisierten Merkmalsvektors für jede der Knoten und einen aktualisierten Merkmalsvektors für jede der Kanten unter Verwendung eines grafischen neuronalen Netzwerks (GNN) zu erhalten, das auf einer Nachrichtenübermittlung basiert (Schritt 140 in 4). Für jede der Kanten wird der Merkmalsvektor der Kante unter Verwendung eines aktuellen Merkmalsvektors der Kante aktualisiert. Für jede der Kanten wird der Merkmalsvektor der Kante aktualisiert, indem ein aktueller Merkmalsvektor der Kante, die Merkmalsvektoren von Knoten, die den Strichen in der Kante entsprechen, und das grafische neuronale Netzwerk verwendet werden. Auf ähnlicher Weise wird der Merkmalsvektor des Knotens für jeden der Knoten aktualisiert, indem der Merkmalsvektor der Kante, die dem Knoten entspricht, die Merkmalsvektoren von Knoten, die zeitlich nahe oder räumlich benachbart sind, und das grafische Neuronale Netzwerk verwendet werden. Die initialisierten Merkmalsvektoren $V_{1}^{(0)}, V_{2}^{(0)}, \dots, V_{m}^{(0)}$

der in Schritt 130 erhaltenen Knoten werden zum Erhalten der aktualisierten Merkmalsvektoren der Knoten in ein Graph Aufmerksamkeitsnetzwerk (GAT) 311 eingegeben. Der initialisierte Merkmalsvektor

E_{i, i'}^{(0)}

jeder der Kanten wird in ein Feedforward-Neuronales Netzwerk (FNN) 312 eingegeben, um die aktualisierten Merkmalsvektoren der Kanten zu erhalten. Bezugnehmend auf 7 werden die Merkmalsvektoren der Knoten und der Kanten abwechselnd eine vorbestimmte Anzahl von Malen aktualisiert, um die aktualisierten Merkmalsvektoren der Knoten und der Kanten zu erhalten. Die aktualisierten Merkmalsvektoren der Knoten und Kanten enthalten die Kontextinformationen zur Klassifizierung.A graphical neural network unit 30 is connected to the second BLSTM unit 25 of the bidirectional recursive neural network unit 500. The graphical neural network unit 30 is to update the feature vector of each of the nodes and the feature vector of each of the edges to obtain an updated feature vector for each of the nodes and an updated feature vector for each of the edges using a graphical neural network (GNN). which is based on messaging (step 140 in 4 ). For each of the edges, the edge's feature vector is updated using a current edge's feature vector. For each of the edges, the edge's feature vector is updated using a current edge feature vector, the feature vectors of nodes corresponding to the strokes in the edge, and the graph neural network. Similarly, the feature vector of the node is updated for each of the nodes by using the feature vector of the edge corresponding to the node, the feature vectors of nodes that are close in time or space, and the graph neural network. The initialized feature vectors

v_{1}^{(0)}, v_{2}^{(0)}, \dots, v_{m}^{(0)}

of the nodes obtained in step 130 are input to a graph attention network (GAT) 311 to obtain the updated feature vectors of the nodes. The initialized feature vector

E_{i, i'}^{(0)}

each of the edges is input to a feedforward neural network (FNN) 312 to obtain the updated feature vectors of the edges. Referring to 7 The feature vectors of the nodes and the edges are alternately updated a predetermined number of times to obtain the updated feature vectors of the nodes and the edges. The updated feature vectors of the nodes and edges contain the contextual information for classification.

Bezugnehmend auf 7 ist das grafische neuronale Netzwerk der grafischen neuronalen Netzwerkeinheit 30 ein grafisches neuronales Netzwerk mit L-Schichten. Der aktualisierte Merkmalsvektor jedes Knotens wird als $V_{i}^{(l)} = U_{V}^{(l)} {(V_{i'}^{(l-1)} | (v_{i}, v_{i'}) \in E}, {E_{_{i, i'}}^{(l - 1)} | (v_{i}, v_{i'}) \in E})$

dargestellt, wobei das Graph-Aufmerksamkeitsnetzwerk 311 wird als

U_{V}^{(l)}

und 1 ≤ l ≤ L dargestellt.. Der aktualisierte Merkmalsvektor jeder der Kanten wird als

E_{_{i, i'}}^{(l)} = U_{E}^{(l)} (E_{_{i, i'}}^{(l - 1)}, V_{i}^{(l - 1)}, V_{i'}^{(l - 1)})

dargestellt, wobei das Feedforward-Neuronales-Netzwerk 312 als

U_{E}^{(l)}

und 1 ≤ l ≤ L dargestellt wird.Referring to 7 the graphical neural network of the graphical neural network unit 30 is a graphical neural network with L layers. The updated feature vector of each node is given as

v_{i}^{(l)} = U_{v}^{(l)} {(v_{i'}^{(l-1)} | (v_{i}, v_{i'}) \in E}, {E_{_{i, i'}}^{(l - 1)} | (v_{i}, v_{i'}) \in E})

shown, where the graph attention network 311 is shown as

U_{v}^{(l)}

and 1 ≤ l ≤ L. The updated feature vector of each of the edges is shown as

E_{_{i, i'}}^{(l)} = U_{E}^{(l)} (E_{_{i, i'}}^{(l - 1)}, v_{i}^{(l - 1)}, v_{i'}^{(l - 1)})

shown, with the feedforward neural network 312 as

U_{E}^{(l)}

and 1 ≤ l ≤ L is shown.

Eine vollständig verbundene neuronale Netzwerkeinheit 35 ist mit der grafischen neuronalen Netzwerkeinheit 30 verbunden. Die vollständig verbundene neuronale Netzwerkeinheit 35 dient zur Vorhersage des Typs des grobkörnigen Objekts und des feinkörnigen Objekts für die Striche, die jedem der Knoten entsprechen und um vorherzusagen, ob das Strichpaar, das jeder der Kanten entspricht, zu demselben grobkörnigen Objekt oder zu demselben feinkörnigen Objekt gehört. Bei der Vorhersage der vollständig verbundenen neuronalen Netzwerkeinheit 35 werden eine grobkörnige Objektklassifizierung und eine feinkörnige Objektklassifizierung für jeden der Knoten und Kanten unter Verwendung der vollständig verbundenen neuronalen Netzwerke auf der Grundlage der aktualisierten Merkmalsvektoren der Knoten und die Kanten von der grafischen neuronalen Netzwerkeinheit 30 durchgeführt (Schritt 150 in 4).A fully connected neural network unit 35 is connected to the graphical neural network unit 30. The fully connected neural network unit 35 serves to predict the type of the coarse-grained object and the fine-grained object for the strokes corresponding to each of the nodes and to predict whether the pair of strokes corresponding to each of the edges belongs to the same coarse-grained object or to the same belongs to fine-grained object. In predicting the fully connected neural network unit 35, coarse-grained object classification and fine-grained object classification are performed for each of the nodes and edges using the fully connected neural networks based on the updated feature vectors of the nodes and the edges by the graphical neural network unit 30 (step 150 in 4 ).

Bezugnehmend auf 2 und 7, umfasst die vollständig verbundene neuronale Netzwerkeinheit 35 die folgenden Elemente:

Ein erster Klassifikator 351 und ein zweiter Klassifikator 352 dienen dazu, die aktualisierten Merkmalsvektoren der Knoten zu empfangen, um den Typ des grobkörnigen Objekts und den Typ des feinkörnigen Objekts vorherzusagen, denen jeder der Striche jedem der Knoten gehört, jeweils entsprach.
Ein dritter Klassifikator 353 und ein vierter Klassifikator 354 dienen dazu, die aktualisierten Merkmalsvektoren der Kanten zu empfangen, um jeweils vorherzusagen, ob das Strichpaar v_i, v_(i^'), das jeder der Kanten entspricht, zu einem selben grobkörnigen Objekt oder einem selben feinkörnigen Objekt gehört.
Der erste Klassifikator 351, der zweite Klassifikator 352, der dritte Klassifikator 353 und der vierte Klassifikator 354 bestehen jeweils aus einem vollständig verbundenen neuronalen Netzwerk und einer Aktivierungsfunktion. Die Aktivierungsfunktion kann eine Softmax-Funktion oder eine Sigmoidfunktion sein.
Der erste Klassifikator 351 ist mit dem Graph-Aufmerksamkeitsnetzwerk 311 verbunden und dient dazu, eine feinkörnige Konfidenz für jeden Knoten v_i auszugeben. Die feinkörnige Knotenkonfidenz wird als $c_{i}^{fine} = softmax (C_{V}^{fine} (V_{i}^{(L)}))$

dargestellt, wobei $C_{V}^{fine}$
ein vollständig verbundenes neuronales Netzwerk und softmax eine normalisierte Exponentialfunktion ist. Im ersten Klassifikator 351 wird der aktualisierte Merkmalsvektor $V_{i}^{(L)}$
in das vollständig verbundene neuronale Netzwerk $C_{V}^{fine}$
eingegeben und an die Softmax-Funktion übergeben, um die feinkörnige Knotenkonfidenz $c_{i}^{fine}$
zu erhalten. Die feinkörnige Knotenkonfidenz gibt die Konfidenz an, dass der Strich v_i zu einem Text, einer Formel, einer Grundform, einem Graffiti oder einer Reihe versehentlich berührter Striche gehört (das sind die Typen der feinkörnigen Objekte).

Referring to 2 and 7 , the fully connected neural network unit 35 includes the following elements:

A first classifier 351 and a second classifier 352 serve to receive the updated feature vectors of the nodes to predict the type of coarse-grained object and the type of fine-grained object to which each of the strokes belongs to each of the nodes, respectively.
A third classifier 353 and a fourth classifier 354 serve to receive the updated feature vectors of the edges to respectively predict whether the stroke pair v_i, v_(i^') corresponding to each of the edges belongs to the same coarse-grained object or the same belongs to fine-grained object.
The first classifier 351, the second classifier 352, the third classifier 353 and the fourth classifier 354 each consist of a fully connected neural network and an activation function. The activation function can be a softmax function or a sigmoid function.
The first classifier 351 is connected to the graph attention network 311 and serves to output a fine-grained confidence for each node _vi . The fine-grained node confidence is given as $c_{i}^{fine} = softmax (C_{v}^{fine} (v_{i}^{(L)}))$

shown, where $C_{v}^{fine}$
is a fully connected neural network and softmax is a normalized exponential function. In the first classifier 351 the updated feature vector $v_{i}^{(L)}$
into the fully connected neural network $C_{v}^{fine}$
entered and passed to the softmax function to get the fine-grained node confidence $c_{i}^{fine}$
to obtain. The fine-grained node confidence indicates the confidence that the stroke _vi belongs to a text, a formula, a basic shape, a graffiti, or a series of accidentally touched strokes (these are the types of fine-grained objects).

Der zweite Klassifikator 352 ist mit dem Graphenaufmerksamkeitsnetzwerk 311 verbunden und dient dazu, eine grobkörnige Knotenkonfidenz für jeden Strich v_i auszugeben. Die grobkörnige Konfidenz des Knotens wird als $c_{i}^{coarse} = softmax (C_{V}^{coarse} (V_{i}^{(L)}))$

dargestellt, wobei

C_{V}^{coarse}

ein vollständig verbundenes neuronales Netzwerk und Softmax eine normalisierte Exponentialfunktion ist. Im zweiten Klassifikator 352 wird der aktualisierte Merkmalsvektor

V_{i}^{(L)}

in das vollständig verbundene neuronale Netzwerk

C_{V}^{coarse}

eingegeben und an die Softmax-Funktion übergeben, um die grobkörnige Konfidenz

c_{i}^{coarse}

des Knotens zu erhalten. Die grobkörnige Konfidenz des Knotens gibt die Konfidenz an, dass der Strich v_i zu einem Absatz, einer Liste, einer Tabelle, einem Diagramm oder einer Anmerkung gehört (das sind die Typen der grobkörnigen Objekte).The second classifier 352 is connected to the graph attention network 311 and serves to output a coarse-grained node confidence for each stroke _vi . The coarse-grained confidence of the node is given as

c_{i}^{coarse} = softmax (C_{v}^{coarse} (v_{i}^{(L)}))

shown, where

C_{v}^{coarse}

is a fully connected neural network and softmax is a normalized exponential function. In the second classifier 352 the updated feature vector

v_{i}^{(L)}

into the fully connected neural network

C_{v}^{coarse}

entered and passed to the softmax function to get the coarse-grained confidence

c_{i}^{coarse}

of the node. The coarse-grained confidence of the node indicates the confidence that the stroke _vi belongs to a paragraph, a list, a table, a diagram, or an annotation (these are the types of coarse-grained objects).

Der dritte Klassifikator 353 ist mit dem Feedforward-Neuronalen Netzwerk 312 verbunden und dient dazu, eine feinkörnige Kantenkonfidenz für das Strichpaar v_i, v_(i^') auszugeben, das jeder der Kanten entspricht. Das feinkörnige Kantenkonfident wird als $c_{i, i'}^{fine} = sigmoid (C_{E}^{fine} (E_{i, i'}^{(L)}))$

dargestellt, wobei

C_{E}^{fine}

ein vollständig verbundenes neuronales Netzwerk ist. Im dritten Klassifikator 353 wird der aktualisierte Merkmalsvektor

E_{i, i'}^{(L)}

in das vollständig verbundene neuronale Netzwerk

C_{E}^{fine}

eingegeben und an die Sigmoidfunktion übergeben, um die feinkörnige Kantenkonfidenz

c_{i, i'}^{fine}

zu erhalten. Die feinkörnige Kantenkonfidenz gibt die Konfidenz an, dass das Strichpaar v_i, v_i' zu demselben feinkörnigen Objekt gehört.The third classifier 353 is connected to the feedforward neural network 312 and serves to output a fine-grained edge confidence for the stroke pair v_i, v_(i^') corresponding to each of the edges. The fine-grained edge confidence is called

c_{i, i'}^{fine} = sigmoid (C_{E}^{fine} (E_{i, i'}^{(L)}))

shown, where

C_{E}^{fine}

is a fully connected neural network. In the third classifier 353 the updated feature vector

E_{i, i'}^{(L)}

into the fully connected neural network

C_{E}^{fine}

entered and passed to the sigmoid function to obtain the fine-grained edge confidence

c_{i, i'}^{fine}

to obtain. The fine-grained edge confidence indicates the confidence that the stroke pair _vi , vi _' belongs to the same fine-grained object.

Der vierte Klassifikator 354 ist mit dem Feedforward-Neuronalen Netzwerk 312 verbunden und dient dazu, eine grobkörnige Kantenkonfidenz für das Strichpaar v_i, v_i' auszugeben, die jeder der Kanten entspricht. Das grobkörnige Kantenkonfidenz wird als $c_{i, i'}^{coarse} = sigmoid (C_{E}^{coarse} (E_{i, i'}^{(L)}))$

dargestellt, wobei

C_{E}^{coarse}

ein vollständig verbundenes neuronales Netzwerk ist. Im vierten Klassifikator 354 wird der aktualisierte Merkmalsvektor

E_{i, i'}^{(L)}

in das vollständig verbundene neuronale Netzwerk

C_{E}^{coarse}

eingegeben und an die Sigmoidfunktion übergeben, um die grobkörnige Kantenkonfidenz

c_{i, i'}^{coarse}

zu erhalten. Die grobkörnige Kantenkonfidenz gibt die Konfidenz an, dass das Strichpaar v_i, v_i' zu demselben grobkörnigen Objekt gehört.The fourth classifier 354 is connected to the feedforward neural network 312 and serves to output a coarse-grained edge confidence for the stroke pair _vi , vi _' corresponding to each of the edges. The coarse-grained edge confidence is called

c_{i, i'}^{coarse} = sigmoid (C_{E}^{coarse} (E_{i, i'}^{(L)}))

shown, where

C_{E}^{coarse}

is a fully connected neural network. In the fourth classifier 354, the updated feature vector

E_{i, i'}^{(L)}

into the fully connected neural network

C_{E}^{coarse}

entered and passed to the sigmoid function to get the coarse-grained edge confidence

c_{i, i'}^{coarse}

to obtain. The coarse-grained edge confidence indicates the confidence that the stroke pair _vi , vi _' belongs to the same coarse-grained object.

Eine Einheit zur Wiederherstellung des Dokuments 40 ist mit der vollständig verbundenen neuronalen Netzwerkeinheit 35 verbunden, um eine Baumstruktur des Dokuments 300 wiederherzustellen (Schritt 160 in 4), wie in 5 zeigt. Die Einheit zur Wiederherstellung des Dokuments 40 dient zum Gruppieren aller Striche, um die entsprechenden feinkörnigen Objekte zu erhalten, indem eine Analyse verbundener Komponenten gemäß den vorhergesagten Ergebnissen (die feinkörnigen Kantenkonfidenzen) der Striche verwendet wird, die zu demselben feinkörnigen Objekt in der vollständig verbundenen neuronalen Netzwerkeinheit 35 gehören. Der Typ jedes einzelnen der entsprechenden feinkörnigen Objekte wird durch eine Summe der Konfidenzen (die feinkörnigen Knotenkonfidenzen) bestimmt, dass die Striche im feinkörnigen Objekt voraussichtlich zu einem bestimmten Typ des feinkörnigen Objekts im vollständig verbundenen Objekt neuronale Netzwerkeinheit 35 gehören.A document recovery unit 40 is fully connected neural network unit 35 connected to restore a tree structure of the document 300 (step 160 in 4 ), as in 5 shows. The document recovery unit 40 serves to group all the strokes to obtain the corresponding fine-grained objects by using connected component analysis according to the predicted results (the fine-grained edge confidences) of the strokes belonging to the same fine-grained object in the fully connected neural Network unit 35 belongs. The type of each of the corresponding fine-grained objects is determined by a sum of the confidences (the fine-grained node confidences) that the strokes in the fine-grained object are expected to belong to a particular type of fine-grained object in the fully connected neural network unit 35.

Die Einheit zur Wiederherstellung des Dokuments 40 dient weiterhin dazu, die entsprechenden feinkörnigen Objekte zu gruppieren, um die entsprechenden grobkörnigen Objekte zu erhalten, indem sie die Analyse verbundener Komponenten gemäß den vorhergesagten Ergebnissen (den grobkörnigen Kantenkonfidenzen) der Striche verwendet, die zu denselben grobkörnigen Objekten in der vollständig verbundenen neuronalen Netzwerkeinheit 35 gehören. Der Typ jedes der entsprechenden grobkörnigen Objekte wird durch eine Summe der Konfidenzen (der grobkörnigen Knoten- konfidenzen) bestimmt, dass die Striche im grobkörnigen Objekt voraussichtlich zu einem bestimmten Typ des grobkörnigen Objekts in der vollständig verbundenen neuronalen Netzwerkeinheit 35 gehören.The document recovery unit 40 further serves to group the corresponding fine-grained objects to obtain the corresponding coarse-grained objects by using connected component analysis according to the predicted results (the coarse-grained edge confidences) of the strokes belonging to the same coarse-grained objects in the fully connected neural network unit 35. The type of each of the corresponding coarse-grained objects is determined by a sum of the confidences (the coarse-grained node confidences) that the strokes in the coarse-grained object are expected to belong to a particular type of the coarse-grained object in the fully connected neural network unit 35.

Wenn die feinkörnige Kantenkonfidenz weniger als 0,5 $(c_{i, i'}^{fine} < 0.5)$

beträgt, entfernt die Einheit zur Wiederherstellung des Dokuments 40 die Kante, die dem Paar entsprechender Striche v_i, v_i' entspricht, von den ungerichteten Graphen G und führt die Analyse der verbundenen Komponenten erneut durch, um eine Vielzahl von Verbindungszweigen zu erhalten. Jeder der Verbindungszweige entspricht einem jeweiligen feinkörnigen Objekt, dessen Typ durch eine Summe der feinkörnigen Knotenkonfidenzen

c_{i}^{fine}

bestimmt wird, die den Knoten im feinkörnigen Objekt entsprechen.If the fine-grained edge confidence is less than 0.5

(c_{i, i'}^{fine} < 0.5)

is, the document recovery unit 40 removes the edge corresponding to the pair of corresponding strokes _vi , v _i' from the undirected graphs G and re-performs the analysis of the connected components to obtain a plurality of connecting branches. Each of the connection branches corresponds to a respective fine-grained object, whose type is determined by a sum of the fine-grained node confidences

c_{i}^{fine}

is determined that correspond to the nodes in the fine-grained object.

Wenn die grobkörnige Kantenkonfidenz größer oder gleich 0,5 $(c_{i, i'}^{coarse} \geq 0.5)$

ist, sind die feinkörnigen Objekte, zu denen das Strichpaar v_i, v_i' gehört, in ein gleiches grobkörniges Objekt kategorisiert. Die Einheit zur Wiederherstellung des Dokuments 40 führt die die Analyse verbundener Komponenten erneut durch, um die entsprechenden grobkörnigen Objekte zu erhalten. Der Typ jedes der entsprechenden grobkörnigen Objekte wird durch eine Summe der grobkörnigen Knotenkonfidenzen

c_{i}^{coarse}

bestimmt, die den Knoten im grobkörnigen Objekt entsprechen.If the coarse-grained edge confidence is greater than or equal to 0.5

(c_{i, i'}^{coarse} \geq 0.5)

is, the fine-grained objects to which the line pair _vi , vi _' belongs are categorized into the same coarse-grained object. The document recovery unit 40 re-performs the connected component analysis to obtain the corresponding coarse-grained objects. The type of each of the corresponding coarse-grained objects is determined by a sum of the coarse-grained node confidences

c_{i}^{coarse}

determined that correspond to the nodes in the coarse-grained object.

Nachdem die Erfindung so beschrieben wurde, ist es offensichtlich, dass sie auf vielfältige Weise variiert werden kann. Solche Variationen sind nicht als Abweichung vom Geist und Umfang der Erfindung zu betrachten, und alle derartigen Modifikationen, die für einen Fachmann offensichtlich wären, sollen im Umfang der folgenden Ansprüche enthalten sein.Having thus described the invention, it is obvious that it can be varied in many ways. Such variations are not to be considered a departure from the spirit and scope of the invention, and all such modifications that would be apparent to one skilled in the art are intended to be included within the scope of the following claims.

Zusammenfassend betrifft die Erfindung ein Online-Layout-Analysesystem für Handschriftdokumente, das umfasst: eine Vorverarbeitungseinheit, die dazu dient, ein Dokument zu empfangen, das aus mehreren Strichen besteht, und einen ungerichteten Graphen zu erzeugen, der mehrere Knoten und mehrere Kanten enthält, um die Beziehungen zwischen verschiedenen Strichen darzustellen. Eine bidirektionale rekursive neuronale Netzwerkeinheit zum Initialisieren eines Merkmalsvektors jedes Knotens und eines Merkmalsvektors jeder Kante. Eine grafische neuronale Netzwerkeinheit dient dazu, die Merkmalsvektoren der Knoten und Kanten zu aktualisieren, um aktualisierte Merkmalsvektoren zu erhalten. Eine vollständig verbundene neuronale Netzwerkeinheit dient zur Durchführung einer grobkörnigen Objektklassifizierung und einer feinkörnigen Objektklassifizierung für jeden der Knoten und Kanten auf der Grundlage der aktualisierten Merkmalsvektoren. Eine Einheit zur Wiederherstellung des Dokuments dient zur Wiederherstellung einer Baumstruktur des Dokuments.In summary, the invention relates to an online layout analysis system for handwriting documents, comprising: a pre-processing unit for receiving a document consisting of a plurality of strokes and generating an undirected graph containing a plurality of nodes and a plurality of edges represent the relationships between different lines. A bidirectional recursive neural network unit for initializing a feature vector of each node and a feature vector of each edge. A graphical neural network unit is used to update the feature vectors of the nodes and edges to obtain updated feature vectors. A fully connected neural network unit is used to perform coarse-grained object classification and fine-grained object classification for each of the nodes and edges based on the updated feature vectors. A document restore unit is used to restore a tree structure of the document.

Claims

An online system for analyzing the layout of handwritten documents, the system being configured to segment a document into a plurality of coarse-grained objects to determine whether a type of each of the coarse-grained objects is a paragraph, a list, a table, a diagram or an annotation, the system being configured to divide each of the coarse-grained objects into a plurality of fine-grained objects and to determine whether a type of each of the fine-grained objects is a line of text, a formula, a basic shape, a graffiti or is a set of misspelled dashes; wherein the coarse-grained objects and the fine-grained objects form a tree structure; wherein the system comprises: a mainframe comprising a processor and a memory; wherein the processor is used to perform required operations of the system and the memory is used to store data, programs and associated operating results of the system; a pre-processing unit which is designed receive the document to perform preprocessing on the document, the document being an online handwriting document and consisting of a plurality of strokes that are close together in time or space; wherein the preprocessing unit is configured to generate, in the preprocessing, an undirected graph containing a plurality of nodes and a plurality of edges for representing relationships between different strokes of the document; wherein each of the nodes corresponds to a respective stroke that is a directed sequence consisting of the points of the stroke according to a writing order, each of the edges corresponds to a pair of respective strokes that are close to each other in time or space, the pair of respective strokes, which are close together in time or space, each corresponds to an edge; each of the bars further forming a set with itself to each form an edge in the form of a loop, a loop being an edge connecting a bar to itself; wherein the system is designed to assume in the undirected diagram that each of the strokes is close in time to most of the N _T strokes that are written thereafter; for each of the strokes, calculate a nearest point distance between the stroke and the other strokes to determine the most N _S strokes that are spatially closest to the stroke, each N _T and N _S each being a predetermined value; a bidirectional recursive neural network unit connected to the preprocessing unit for initializing a feature vector of each node in the undirected graph and initializing a feature vector of each edge in the undirected graph using recursive neural networks (RNN); wherein the system is designed to initialize the feature vector of each of the edges with zero values in the bidirectional recursive neural network unit; a graphical neural network unit connected to the bidirectional recursive neural network unit, the graphical neural network unit being configured to update the feature vector of each node and the feature vector of each edge to produce an updated feature vector of each node and an updated feature vector of each edge using a graphical neural network (GNN) based on messaging; a fully connected neural network unit connected to the graphical neural network unit, the fully connected neural network unit being configured to predict the type of the coarse-grained object and the fine-grained object for the strokes corresponding to each of the nodes, and further predict whether the Pair of strokes corresponding to each of the edges belonging to the same coarse-grained object or the same fine-grained object; wherein the system is configured to perform coarse-grained object classification and fine-grained object classification for each of the nodes and edges using fully connected neural networks based on the updated feature vectors of the nodes and the edges from the graphical neural network unit in predicting the fully connected neural network unit ; a document recovery unit, connected to the fully connected neural network unit, for recovering the tree structure of the document, the document recovery unit being adapted to group all the strokes to obtain the corresponding fine-grained objects, and thereby to use connected component analysis according to the prediction results of the strokes belonging to the same fine-grained object in the fully connected neural network unit, the document recovery unit being adapted to determine the type of each of the corresponding fine-grained objects by a sum of the confidences, that the strokes in the fine-grained object are expected to belong to a particular type of fine-grained object to the fully connected neural network unit; and the document recovery unit is further configured to group the corresponding fine-grained objects to obtain the corresponding coarse-grained objects using connected component analysis according to the predicted results of the strokes corresponding to the same coarse-grained object in the fully connected neural network unit include; wherein the document recovery unit is configured to determine the type of each of the corresponding coarse-grained objects by a sum of the confidences that the strokes in the coarse-grained object are expected to belong to a particular type of the coarse-grained object in the fully connected neural network unit.

The system according to Claim 1 , wherein the bidirectional recursive neural network unit comprises: wherein a first BLSTM (Bidirectional Long Short Term Memory) unit is connected to the preprocessing unit to initialize the feature vector of each node and each edge; wherein the feature vector is a vector representing features of a corresponding paragraph, list, table, diagram and note; a spatial positional relationship between a pair of lines, which can be determined from the pair of lines itself, whereby the BLSTM unit is formed to initialize the feature vector of each of the edges to a zero vector; where the feature vector of each of the edges corresponds to a set corresponding to the pair of bars; wherein the first BLSTM unit is further configured to obtain a point feature that contains context information. a pooling unit connected to the first BLSTM unit and configured to aggregate the point feature into a stroke feature using average pooling and to form the stroke feature through relationships between different strokes; and a second BLSTM (Bidirectional Long Short Term Memory) unit connected to the pooling unit and configured to receive the stroke feature from the pooling unit to obtain the initialized feature vectors of the nodes corresponding to the strokes.

The system according to Claim 2 , wherein the system is configured to update, in the graphical neural network unit, for each of the edges, the feature vector of the edge using a current feature vector of the edge, the feature vectors of nodes corresponding to the strokes in the edge, and in the graphical neural network; for each of the nodes, update the feature vector of the node using the feature vector of the edge corresponding to the node, the feature vectors of nodes that are close in time or spatially adjacent, and the graphical neural network; and alternately update the feature vectors of the nodes and the edges a predetermined number of times to obtain the updated feature vectors of the nodes and the edges; where the updated feature vectors of the nodes and edges contain the contextual information for classification.

The system according to Claim 3 , the fully connected neural network unit includes: a first classifier and a second classifier configured to receive the updated feature vectors of the nodes to predict the type of the coarse-grained object and the type of the fine-grained object to which each of the strokes belongs correspond to each of the nodes respectively; a third classifier and a fourth classifier configured to receive the updated feature vectors of the edges to respectively predict whether the stroke pair corresponding to each of the edges belongs to the same coarse-grained object or the same fine-grained object; wherein the first classifier, the second classifier, the third classifier and the fourth classifier each consist of a fully connected neural network and an activation function.

The system according to Claim 4 wherein the first classifier is connected to the graph attention network and is configured to output a fine-grained confidence for each node, the fine-grained confidence of the node indicating the confidence that the stroke belongs to a text, a formula, a basic shape, a graffiti or a Series of accidentally touched strokes heard; wherein the second classifier is connected to the graph attention network and is configured to output a coarse-grained node confidence for each stroke; where the node's coarse-grained confidence indicates the confidence that the stroke belongs to a paragraph, list, table, chart, or note; wherein the third classifier is connected to the feedforward neural network and is configured to output a fine-grained edge confidence for the pair of strokes corresponding to each of the edges; where the fine-grained edge confidence indicates the confidence that the stroke pair belongs to the same fine-grained object; and wherein the fourth classifier is connected to the feedforward neural network and is configured to output a coarse-grained edge confidence for the pair of strokes corresponding to each of the edges; where the coarse-grained edge confidence indicates the confidence that the stroke pair belongs to the same coarse-grained object.

The system according to Claim 5 , wherein the system is configured, when the fine-grained edge confidence is less than 0.5, to remove the edge corresponding to the pair of corresponding strokes from the undirected graph by means of the document recovery unit, and to perform the connected component analysis again, to obtain a variety of the connection branches; each of the connection branches corresponding to a respective fine-grained object, the type of which is determined by a sum of the fine-grained node confidences corresponding to the nodes in the fine-grained object; and wherein, if the coarse-grained edge confidence is greater than or equal to 0.5, categorize the fine-grained objects to which the stroke pair _vi , vi _' belongs into the same coarse-grained object; wherein the document recovery unit is configured to re-perform the analysis of the connected components to obtain the corresponding coarse-grained objects and to assign the type of each of the corresponding coarse-grained objects by a sum of the coarse-grained node confidences determine which correspond to the nodes in the coarse-grained object.