LU503005B1 - A layout-unconstrained method based on graph reasoning network for reading text block - Google Patents

A layout-unconstrained method based on graph reasoning network for reading text block Download PDF

Info

Publication number
LU503005B1
LU503005B1 LU503005A LU503005A LU503005B1 LU 503005 B1 LU503005 B1 LU 503005B1 LU 503005 A LU503005 A LU 503005A LU 503005 A LU503005 A LU 503005A LU 503005 B1 LU503005 B1 LU 503005B1
Authority
LU
Luxembourg
Prior art keywords
character
graph
network
layout
text
Prior art date
Application number
LU503005A
Other languages
French (fr)
Inventor
Ziyan Li
Lianwen Jin
Original Assignee
Univ South China Tech
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ South China Tech filed Critical Univ South China Tech
Priority to LU503005A priority Critical patent/LU503005B1/en
Application granted granted Critical
Publication of LU503005B1 publication Critical patent/LU503005B1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/045Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19187Graphical models, e.g. Bayesian networks or Markov models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Library & Information Science (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a layout-unconstrained method based on graph reasoning network for reading text block, which belongs to the technical field of pattern recognition and artificial intelligence; and comprises the following steps: Acquiring text pictures with unconstrained layout, and constructing a convolution network; extracting the visual feature map of the text picture through the convolution network, and performing character recognition on the visual feature map pixel by pixel; based on the output value of the convolution network, optimizing the convolution network by aggregating the cross-entropy loss function to obtain an unordered character set; constructing a graph reasoning network, and reasoning the relationship between characters in the character set through the graph reasoning network to obtain a character connection set; integrating the character connection set, and translating the integrated character connection set into reading order to obtain the recognition result of the text picture. The invention can predict a more accurate character sequence; through the character connection prediction model of graph inference network, the language information in independent corpus can be dug in depth.

Description

A LAYOUT-UNCONSTRAINED METHOD BASED ON GRAPH
REASONING NETWORK FOR READING TEXT BLOCK
TECHNICAL FIELD
The invention belongs to the technical field of pattern recognition and artificial intelligence, and in particular to a layout-unconstrained method based on graph reasoning network for reading text block.
BACKGROUND
Natural scene character recognition has a wide range of application scenes, and it is a research hotspot in the field of artificial intelligence. At present, there are two kinds of scene text recognition problems-regular text recognition and irregular text recognition. Compared with the former, irregular text recognition has aroused more research interest of scholars, because in an open scene, the problem of irregular text recognition is more challenging and the corresponding method has more practical value. With the society's increasing pursuit of spiritual culture, text instances in natural scenes become more and more abundant, and the arrangement of characters 1s no longer limited to regular linear expressions, irregular and artistic character layout design forms tend to be broadly observable. In recent years, although scholars have explored several important directions for irregular text recognition, including arbitrary shape text, perspective deformed text, low-quality text, disturbing text and so on, they have made remarkable progress. However, there is no relevant literature to explore the text instance recognition of irregular character layout (text block reading).
For the existing mainstream text recognition methods, if it is directly applied to the problem of text block reading, it will not achieve ideal results. The reasons are as follows: First, the implicit language model of RNN-based method does not clearly define the linkage relationship between characters, and its language model is also limited by the dictionary capacity of training samples (synthetic text images); secondly, in the method based on sequence modeling, the height information of CNN feature map is usually compressed, which makes it lose the recognition ability of text HUS03005 instances with two-dimensional layout; thirdly, the existing open training set samples’ character position clues are too single, which leads to the poor generalization of the method based on attention mechanism for text instances with unconstrained character layout; fourthly, the segmentation-based method lacks an explicit language model, so it can't solve the complicated reading order problem in text block examples.
SUMMARY
The purpose of the present invention is to provide a layout-unconstrained method based on graph reasoning network for reading text block, so as to solve the problems existing in the prior art.
To achieve the above purpose, the present invention provides a layout-unconstrained method based on graph reasoning network for reading text block, which comprises:
Acquiring text pictures with unconstrained layout, and constructing a convolution network; extracting the visual feature map of the text picture through the convolution network, and performing character recognition on the visual feature map pixel by pixel; based on the output value of the convolution network, optimizing the convolution network by aggregating the cross-entropy loss function to obtain an unordered character set; constructing a graph reasoning network, and reasoning the relationship between characters in the character set through the graph reasoning network to obtain a character connection set; integrating the character connection set, and translating the integrated character connection set into reading order to obtain the recognition result of the text picture.
Preferably, the process of performing character recognition on the visual feature map pixel by pixel comprises:
Pre-processing the text picture, taking the pre-processed text picture as input,
extracting the visual feature map through the convolution network, and converting the HUS03005 depth dimension into the category number of the alphabet through the full connection layer in the convolution network.
Preferably, the process of obtaining the unordered character set comprises:
Based on the output value of the convolution network and the number of categories, obtaining a probability matrix of character categories, and respectively summing, sorting and counting the probability matrix to obtain an unordered character set.
Preferably, the process of summing, sorting and counting the probability matrix respectively comprises:
Based on the probability matrix, summing the probability matrix through the time step dimension to obtain an aggregated probability vector; optimizing the convolution network through the aggregation cross-entropy loss function, and sorting the aggregation probability vectors by the probability values of character categories to obtain sorted probability vectors; based on the sorted probability vector, performing character counting by integer corresponding probability values element by element to obtain a character sequence length N, and intercepting the first N counted characters to obtain an unordered character set.
Preferably, the character connection set comprises a character global linkage prediction set and a character local linkage prediction set.
Preferably, the process of obtaining the character global linkage prediction set comprises:
Based on the unordered character set, taking each character in the unordered character set as a graph node, where the graph node is characterized by embedding and splicing the category embedding and serial number embedding of corresponding characters; through the graph attention layer in the graph reasoning network, performing relationship modeling for each graph node feature to obtain global modeling features, performing nonlinear activation for the global modeling features by the Softmax function, and taking the corresponding index of the maximum activation value by the HUS03005 argmax function to obtain the global linkage prediction set of characters.
Preferably, the process of obtaining the character local linkage prediction set comprises:
Taking each character in the unordered character set as a composition anchor point, and taking several characters adjacent to the composition anchor point in the local graph as graph nodes, and obtaining the normalized node features by subtracting the original features of the composition anchor point from the original features of all graph nodes; through graph attention layer in graph reasoning network, performing relationship modeling for the node features of local graphs to obtain local modeling features, and performing nonlinear activation for the local modeling features by
Sigmoid function to obtain a local linkage prediction set based on composition anchor points.
Preferably, the process of obtaining the recognition result of the text picture comprises:
Selecting a corresponding linkage prediction set based on the global linkage prediction set of characters and the local linkage prediction set of characters according to the confidence degree of connection classification;
Based on the linkage prediction set, using each character in the linkage prediction set as the star node of the linked list in turn, obtaining the connection prediction of the nodes through recursion and constructing a new node of the linked list, and a unidirectional connection linked list is constructed based on the new node; calculating the length of all unidirectional linked lists, and obtaining the recognition result of text pictures by taking the corresponding linked list with the longest length as the reading order of the character set.
The invention has the following technical effects:
According to the invention, acquiring text pictures with unconstrained layout, and constructing a convolution network; extracting the visual feature map of the text picture through the convolution network, and performing character recognition on the visual feature map pixel by pixel, based on the output value of the convolution HUS03005 network, optimizing the convolution network by aggregating the cross-entropy loss function to obtain an unordered character set; constructing a graph reasoning network, and reasoning the relationship between characters in the character set through the 5 graph reasoning network to obtain a character connection set; integrating the character connection set, and translating the integrated character connection set into reading order to obtain the recognition result of the text picture.
The invention can effectively alleviate the problem of scale sensitivity through the aggregation cross-entropy character recognition, and can predict a more accurate character sequence through the convolution network probability matrix; through the character connection prediction model of graph reasoning network, it can break away from the dictionary limitation of training text pictures and dig the language information in independent corpus.
BRIEF DESCRIPTION OF THE FIGURES
The drawings that form a part of this application are used to provide a further understanding of this application. The illustrative embodiments of this application and their descriptions are used to explain this application, and do not constitute undue limitations on this application. In the attached drawings:
FIG. 1 is a flowchart of a text block recognition method in an embodiment of the present invention;
FIG. 2 is a flow chart of cross-entropy character recognition based on the first K bits aggregation in the embodiment of the present invention;
FIG. 3 is a flowchart of an integration module in an embodiment of the present invention.
DESCRIPTION OF THE INVENTION
It should be noted that the embodiments in this application and the features in the embodiments can be combined with each other without conflict. The application will be described in detail with reference to the drawings and examples.
It should be noted that the steps shown in the flowchart of the figure can be HUS03005 executed in a computer system such as a set of computer-executable instructions, and, although a logical sequence is shown in the flowchart, in some cases, the steps shown or described can be executed in a sequence different from that here.
Embodiment 1
As shown in FIG. 1, the embodiment provides a layout-unconstrained method based on graph reasoning network for reading text block, which comprises:
Acquiring text pictures with unconstrained layout, and constructing a convolution network; extracting the visual feature map of the text picture through the convolution network, and performing character recognition on the visual feature map pixel by pixel; based on the output value of the convolution network, optimizing the convolution network by aggregating the cross-entropy loss function to obtain an unordered character set; constructing a graph reasoning network, and reasoning the relationship between characters in the character set through the graph reasoning network to obtain a character connection set; integrating the character connection set, and translating the integrated character connection set into reading order to obtain the recognition result of the text picture.
In this embodiment, the specific process of the text block recognition method comprises:
Model input
Using the synthetic text block and the public text line as training samples; taking the text blocks collected in the real scene as test samples; preprocessing each sample, assigning corresponding character category labels, and global and local connection relationship labels; the process of preprocessing each sample comprises: under the condition of keeping the aspect ratio fixed, scaling the longest side of the text picture in the training sample and the test sample to 256 pixels, and scaling the other side length
. . . . LU503005 according to the corresponding aspect ratio to obtain the preprocessed sample.
Character recognition module
S2.1, adopting the full volume network as the backbone network to extract the visual features, in which the full volume network adopts the fourth layer to the antepenultimate layer of ResNet-50;
S2.2, connecting the last layer feature map of the backbone network to a fully connected layer, with the purpose of converting the depth dimension of the last layer feature map into the number of prediction categories, and obtaining the probability prediction matrix y, of characters, where, / ={x, |1<i< HW}, t represents the time step dimension, H and W are the number of rows and columns of the probability matrix y; , respectively; where Æ is the category dimension, k={x |1<i<|C |}, C, represents the alphabet set.
Unordered character set: as shown in FIG. 2,
S3.1, giving the probability matrix y, , firstly summing in the time step dimension to obtain the aggregate probability vector y, , and optimizing it by using the aggregate cross-entropy (ACE) loss function, where the expression of aggregate cross-entropy is:
T
IC. | N y; 1k
L(1,S)=—}, —*In Zt em T T where, /, S and N, are the input sample, the sequence label corresponding to the sample, and the frequency of the character k appearing in the sequence label S respectively.
S3.2, based on the probability value, after descending sorting y,, a sorting
T
— f ! vector zu FO) = F(y,),k EC, is obtained, where F(g is a descending t=1 sorting function, and each element of z,, consists of a character category and its corresponding probability value z/;
S3.3, calculating the length of character sequence N =7—|C,|, where 7 isthe total number of time steps and |C_ | is the predicted number of blank symbol;
S3.4, counting the characters by category based on each element of z,, in which the probability value z“ is integer by function H(g), and the integer value is the predicted number of the corresponding category C in the sample, finally, by intercepting the first N characters counted, an unordered character set S” is obtained, and its expression is:
NU AK (NE
SS" =M,(H(z,)*C,) where M (2 isthe merge function, whicn is used to merge the counting results into a character list in sequence and intercept the first N elements in it. In addition,
H(g is an integer function, and its expression is: 1 if2<x<0.5,
H(x)= . round(x) otherwise. where round(-) is rounding function, À is tolerance factor which can improve the regression performance of character recognition.
Linkage reasoning module
S4.1, composing the global graph network, taking each character in the above
S” as a graph node, where the features of each graph node are spliced by embedding the category and serial number of the corresponding characters, and are denoted as h={h,h,..,h,}, where h ={c,,s,}, c, and s, are the category features and serial number features of node 1 respectively;
S42, the stacked M-layer Graph Attention Layer is used to model the relationship of each node feature h of the global graph, and the modeling features x ={x,x,,...,x,} are obtained. The expression of the attention layer of the graph is:
M
I+1 _ /
Vi = oD a, Wy,) j=1
Where y! is the characteristic of node i in the I-th layer, o() is the nonlinear activation function, œ, is the value of the adjacent matrix in the i-th row and the j-th column, where œ, is learned by the attention mechanism. In addition, the input characteristic of the first layer of attention in the graph is 4.
S4.3, after the feature x is activated by the Softmax function, the corresponding index of the maximum activation is obtained by the argmax function to obtain the global connection prediction O, ={n,rn,...,r,} of each character, where r =argmax(p,), p, is the global connection probability of the character 1, and the expression is: exp(Wx,)
PTE
> ; exp(Wx,) where W is a learnable linear transformation.
S4.4, composing the local graph network, taking each character of the set S” as an anchor point in turn, looking for K characters adjacent to the anchor point as graph nodes, and marking them as v={v,,v,,...,v.}, where the original features of graph nodes (including anchor points) are obtained by embedding and splicing the corresponding character categories and serial numbers. For each local graph, the original features of all nodes A, are subtracted from the original features h, of anchor points to obtain normalized node features, marked as
W={h —h,h —h,. .h_—h}, where Kis set to 3 by experiment;
S4.5, the attention layer of stacked M-layer graph with the same structure as S4.2 is used to model the relationship of each node feature A’ of the local graph to obtain the modeling feature x’, where the weight of the attention layer of the graph is
. . . . LU503005 independent and does not share the weight with the attention layer of S4.2.
S4.6, after activating the Sigmoid function of the features, the connection prediction O,(q)={p|, ps... Pr} based on the anchor point q can be obtained, where p; represents the connection probability of the node i in the local graph is represented, and its expression is: , 1
OT exp(-Wx) p i finally, the linkage prediction set O, ={0,(q,),0(q,),...,0(q,)} of N local graphs can be obtained. Giving the prediction O,(q,) of node 1 in the local graph, it can be further calculated using the following formula:
SD if max(O,(q, ))<0.5 ! = . f = \argmax(p') otherwise. xe[1,Æ] therefore, the connection set of local graph network can be expressed as , , ,
O, = {oo
Model output
As shown in FIG. 3, the integration module integrates the connection prediction of the global graph and the local graph, which comprises the following steps:
SS, first, judging whether the local connection 7’ of node i is an empty set. (1)
If yes, directly using the global graph to connect and predict 7. (2) If not, the confidence of the two connection predictions is compared for selection, where 4 is the adjustable significant factor, represents the significance of the local graph prediction compared with the global graph prediction, and is set by experiment.
According to the predicted integration result, the character reading order is obtained through the following steps: firstly, each character is taken as the root node of a new linked list in turn, and a new node can be recursively constructed according to the connection prediction to obtain a linked list. Then, taking the corresponding connection sequence with the longest length in all linked lists as the character reading 705008 order to obtain the final text block recognition result.
Technical effects of this embodiment are as follows:
In this embodiment, According to the invention, acquiring text pictures with unconstrained layout, and constructing a convolution network; extracting the visual feature map of the text picture through the convolution network, and performing character recognition on the visual feature map pixel by pixel; based on the output value of the convolution network, optimizing the convolution network by aggregating the cross-entropy loss function to obtain an unordered character set; constructing a graph reasoning network, and reasoning the relationship between characters in the character set through the graph reasoning network to obtain a character connection set; integrating the character connection set, and translating the integrated character connection set into reading order to obtain the recognition result of the text picture.
The invention can effectively alleviate the problem of scale sensitivity through the aggregation cross-entropy character recognition, and can predict a more accurate character sequence through the convolution network probability matrix; through the character connection prediction model of graph reasoning network, it can break away from the dictionary limitation of training text pictures and dig the language information in independent corpus.
The above are only the preferred embodiments of this application, but the scope of protection of this application is not limited to this. Any changes or substitutions that can be easily thought of by those skilled in the technical field within the technical scope disclosed in this application should be covered by the scope of protection of this application. Therefore, the scope of protection of this application should be based on the scope of protection of the claims.

Claims (8)

CLAIMS LU503005
1. À layout-unconstrained method based on graph reasoning network for reading text block, characterized by comprising the following steps: acquiring text pictures with unconstrained layout, and constructing a convolution network; extracting the visual feature map of the text picture through the convolution network, and performing character recognition on the visual feature map pixel by pixel; based on the output value of the convolution network, optimizing the convolution network by aggregating the cross-entropy loss function to obtain an unordered character set: constructing a graph reasoning network, and reasoning the relationship between characters in the character set through the graph reasoning network to obtain a character connection set; integrating the character connection set, and translating the integrated character connection set into reading order to obtain the recognition result of the text picture.
2. The layout-unconstrained method based on graph reasoning network for reading text block according to claim 1, characterized in that, the process of performing character recognition on the visual feature map pixel by pixel comprises: pre-processing the text picture, taking the pre-processed text picture as input, extracting the visual feature map through the convolution network, and converting the depth dimension into the category number of the alphabet through the full connection layer in the convolution network.
3. The layout-unconstrained method based on graph reasoning network for reading text block according to claim 2, characterized in that, the process of obtaining the unordered character set comprises: based on the output value of the convolution network and the number of categories, obtaining a probability matrix of character categories, and respectively summing, sorting and counting the probability matrix to obtain an unordered character set.
4. The layout-unconstrained method based on graph reasoning network for HUS03005 reading text block according to claim 3, characterized in that, the process of summing, sorting and counting the probability matrix respectively comprises: based on the probability matrix, summing the probability matrix through the time step dimension to obtain an aggregated probability vector; optimizing the convolution network through the aggregation cross-entropy loss function, and sorting the aggregation probability vectors by the probability values of character categories to obtain sorted probability vectors; based on the sorted probability vector, performing character counting by integer corresponding probability values element by element to obtain a character sequence length N, and intercepting the first N counted characters to obtain an unordered character set.
5. The layout-unconstrained method based on graph reasoning network for reading text block according to claim 1, characterized in that, the character connection set comprises a character global linkage prediction set and a character local linkage prediction set.
6. The layout-unconstrained method based on graph reasoning network for reading text block according to claim 5, characterized in that, the process of obtaining the character global linkage prediction set comprises: based on the unordered character set, taking each character in the unordered character set as a graph node, where the graph node is characterized by embedding and splicing the category embedding and serial number embedding of corresponding characters; through the graph attention layer in the graph reasoning network, performing relationship modeling for each graph node feature to obtain global modeling features, performing nonlinear activation for the global modeling features by the Softmax function, and taking the corresponding index of the maximum activation value by the argmax function to obtain the global linkage prediction set of characters.
7. The layout-unconstrained method based on graph reasoning network for reading text block according to claim 6, characterized in that, the process of obtaining the character local linkage prediction set comprises: HUS03005 taking each character in the unordered character set as a composition anchor point, and taking several characters adjacent to the composition anchor point in the local graph as graph nodes, and obtaining the normalized node features by subtracting the original features of the composition anchor point from the original features of all graph nodes; through graph attention layer in graph reasoning network, performing relationship modeling for the node features of local graphs to obtain local modeling features, and performing nonlinear activation for the local modeling features by Sigmoid function to obtain a local linkage prediction set based on composition anchor points.
8. The layout-unconstrained method based on graph reasoning network for reading text block according to claim 7, characterized in that, the process of obtaining the recognition result of the text picture comprises: selecting a corresponding linkage prediction set based on the global linkage prediction set of characters and the local linkage prediction set of characters according to the confidence degree of connection classification; based on the linkage prediction set, using each character in the linkage prediction set as the star node of the linked list in turn, obtaining the connection prediction of the nodes through recursion and constructing a new node of the linked list, and a unidirectional connection linked list is constructed based on the new node; calculating the length of all unidirectional linked lists, and obtaining the recognition result of text pictures by taking the corresponding linked list with the longest length as the reading order of the character set.
LU503005A 2022-11-05 2022-11-05 A layout-unconstrained method based on graph reasoning network for reading text block LU503005B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
LU503005A LU503005B1 (en) 2022-11-05 2022-11-05 A layout-unconstrained method based on graph reasoning network for reading text block

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
LU503005A LU503005B1 (en) 2022-11-05 2022-11-05 A layout-unconstrained method based on graph reasoning network for reading text block

Publications (1)

Publication Number Publication Date
LU503005B1 true LU503005B1 (en) 2023-05-05

Family

ID=86270931

Family Applications (1)

Application Number Title Priority Date Filing Date
LU503005A LU503005B1 (en) 2022-11-05 2022-11-05 A layout-unconstrained method based on graph reasoning network for reading text block

Country Status (1)

Country Link
LU (1) LU503005B1 (en)

Similar Documents

Publication Publication Date Title
CN110866140B (en) Image feature extraction model training method, image searching method and computer equipment
Ghaderizadeh et al. Hyperspectral image classification using a hybrid 3D-2D convolutional neural networks
US20220092351A1 (en) Image classification method, neural network training method, and apparatus
CN110334705B (en) Language identification method of scene text image combining global and local information
WO2021022521A1 (en) Method for processing data, and method and device for training neural network model
CN106909924A (en) A kind of remote sensing image method for quickly retrieving based on depth conspicuousness
CN111832546B (en) Lightweight natural scene text recognition method
Li et al. A new method of image detection for small datasets under the framework of YOLO network
CN111612051B (en) Weak supervision target detection method based on graph convolution neural network
CN112308115B (en) Multi-label image deep learning classification method and equipment
CN110321805B (en) Dynamic expression recognition method based on time sequence relation reasoning
CN110599502B (en) Skin lesion segmentation method based on deep learning
Saha et al. A Lightning fast approach to classify Bangla Handwritten Characters and Numerals using newly structured Deep Neural Network
US20230334829A1 (en) Hyperspectral image classification method based on context-rich networks
Agarwal et al. Image understanding using decision tree based machine learning
CN112307982A (en) Human behavior recognition method based on staggered attention-enhancing network
CN111738169A (en) Handwriting formula recognition method based on end-to-end network model
CN115131698B (en) Video attribute determining method, device, equipment and storage medium
CN113033321A (en) Training method of target pedestrian attribute identification model and pedestrian attribute identification method
Ye et al. A joint-training two-stage method for remote sensing image captioning
CN115346071A (en) Image classification method and system for high-confidence local feature and global feature learning
CN114298233A (en) Expression recognition method based on efficient attention network and teacher-student iterative transfer learning
Gao et al. Deep transformer network for hyperspectral image classification
CN113642602A (en) Multi-label image classification method based on global and local label relation
LU503005B1 (en) A layout-unconstrained method based on graph reasoning network for reading text block